SQL with PythonΒΆ

There are several tools available for interacting with relational databases from within a Python script. Below we show how to do this using SQLAlchemy and pandas libraries.

SQLAlchemy can be installed by running the following code in Jupyter Notebook:

[ ]:
%pip install sqlalchemy

We will use SQLAlchemy to connect to an SQLite database movie_data.sqlite. In our example it is a new database which will be created by the code below. However, the same code would work to connect to an existing database:

[1]:
import sqlalchemy
engine = sqlalchemy.create_engine(f'sqlite:///movie_data.sqlite')

At the moment the database is empty. Instead of using SQL to create a table in it, we will use pandas. First, we define a pandas dataframe with table data:

[2]:
import pandas as pd

df = pd.DataFrame({'title': ['Casablanca', 'Forrest Gump', 'Goodfellas',
                              'Chinatown', 'Jaws', 'Wall-E'],
                   'director': ['M. Curtiz', 'R. Zemeckis', 'M. Scorsese',
                                 'R. Polanski', 'S. Spielberg', 'A. Stanton'],
                   'year': [1942, 1994, 1990, 1974, 1975, 2008]})
df
[2]:
title director year
0 Casablanca M. Curtiz 1942
1 Forrest Gump R. Zemeckis 1994
2 Goodfellas M. Scorsese 1990
3 Chinatown R. Polanski 1974
4 Jaws S. Spielberg 1975
5 Wall-E A. Stanton 2008

Next, we use a dataframe method to_sql to write the content of the dataframe to a database table:

[3]:
df.to_sql('movies',     # name of the table
          con=engine,   # database connection
          index=False,  # do not include the index of the dataframe in the table
          dtype={       # type of data in each column
                 'title': sqlalchemy.types.INT,
                 'director': sqlalchemy.types.TEXT,
                 'year': sqlalchemy.types.INT})

A table movies is now a part of the database.

To execute SQL code in the database we can use pandas read_sql_query function. The return value of this function is a dataframe with results of the SQL code:

[4]:
query = '''
        SELECT *
        FROM movies
        WHERE year > 1980
        ORDER BY year DESC
        '''
result = pd.read_sql_query(query,con=engine)
result
[4]:
title director year
0 Wall-E A. Stanton 2008
1 Forrest Gump R. Zemeckis 1994
2 Goodfellas M. Scorsese 1990