SQL basics¶
SQL is a programming language for working with data stored in relational databases. There are many flavors of SQL used by various database systems (MySQL, Postgres, Oracle, SQL Server etc.), but they share the same basic syntax and functionality. Here we show how to use SQL while working with databases created using SQLite software.
SQL in Jupyter Notebook¶
In order to execute SQL code in Jupyter Notebook, we will use the ipython-sql
extension of the notebook. The following command, executed in a notebook, will install this extension:
[1]:
%pip install ipython-sql
Next, we need to load the extension:
[2]:
%load_ext sql
From now on, when the first line of a code cell is %%sql
, the rest of the cell will be evaluated as an SQL (and not Python) code.
Creating tables¶
To access data stored in a database, we need to connect to the database first. Typically this requires connecting to a remote database server, but SQLite is special in this respect - its databases are saved in files stored locally on a computer. The command below connects to a database stored in a file tiny.sqlite
. If such file does not exist, it will be automatically created:
[3]:
%%sql
sqlite:///tiny.sqlite
[3]:
'Connected: @tiny.sqlite'
The newly created database is empty. The code below creates in it a table named courses
with three columns named course_id
, course_name
, and credits
:
[ ]:
%%sql
CREATE TABLE courses (
course_id TEXT,
course_name TEXT,
credits INT);
The code
course_id TEXT,
course_name TEXT,
credits INT
specifies that the columns course_id
and course_name
will be used to store text, and the credits
column will contain integer values. While many database systems enforce such data type specifications, SQLite treats them merely as suggestions - e.g. it will let us store text values in the credits
column without any problems.
SQL does not provide a standard way of listing tables existing in a database. However, every SQLite database contains a special table sqlite_master
which records all database components. The code below retrieves the whole content of sqlite_master
. It shows that the courses
table now exists in the database:
[5]:
%%sql
SELECT *
FROM sqlite_master;
* sqlite:///tiny.sqlite
Done.
[5]:
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | courses | courses | 2 | CREATE TABLE courses ( course_id TEXT, course_name TEXT, credits INT) |
Deleting tables¶
In order to delete a table my_table
from a database we use the statement
DROP TABLE my_table;
This will generate an error if there is no table with the specified name:
[6]:
%%sql
DROP TABLE my_table;
* sqlite:///tiny.sqlite
(sqlite3.OperationalError) no such table: my_table
[SQL: DROP TABLE my_table;]
(Background on this error at: http://sqlalche.me/e/e3q8)
This error can be avoided by adding a check if the table exists prior to deleting it:
[ ]:
%%sql
DROP TABLE IF EXISTS my_table;
Creating table records¶
The statement
SELECT *
FROM my_table
retrieves all content of my_table
. Applied to the courses
table, it will give just column headers, since at the moment this table is empty:
[8]:
%%sql
SELECT *
FROM courses;
* sqlite:///tiny.sqlite
Done.
[8]:
course_id | course_name | credits |
---|
Records can be added to a table using the syntax
INSERT INTO my_table (column1, column2 ,..)
VALUES (value1, value2 ,...);
This adds a new record to my_table
with value1
placed in column1
etc.:
[9]:
%%sql
INSERT INTO courses (course_id, course_name, credits)
VALUES ('MTH 141','Calculus I', 4);
* sqlite:///tiny.sqlite
1 rows affected.
[9]:
[]
We can verify that a record has been added to the courses
table:
[10]:
%%sql
SELECT *
FROM courses;
* sqlite:///tiny.sqlite
Done.
[10]:
course_id | course_name | credits |
---|---|---|
MTH 141 | Calculus I | 4 |
Several records can be added to a table at once:
[11]:
%%sql
INSERT INTO courses (course_id, course_name, credits)
VALUES
('MTH 142', 'Calculus II', 4),
('MTH 448', 'Data Oriented Computing', 3),
('MTH 462', 'Topics in Analysis', 2);
SELECT *
FROM courses;
* sqlite:///tiny.sqlite
3 rows affected.
Done.
[11]:
course_id | course_name | credits |
---|---|---|
MTH 141 | Calculus I | 4 |
MTH 142 | Calculus II | 4 |
MTH 448 | Data Oriented Computing | 3 |
MTH 462 | Topics in Analysis | 2 |
Modifying records¶
Existing records can be modified using the statement
UPDATE my_table
SET
column_1 = new_value_1,
column_2 = new_value_2
WHERE
condition;
The condition given after WHERE
specifies that only the records for which this condition holds true are to be changed. For example, the code below changes the number of credits of the MTH 141 course to 6:
[12]:
%%sql
UPDATE courses
SET
credits = 6
WHERE
course_id = 'MTH 141';
SELECT *
FROM courses;
* sqlite:///tiny.sqlite
1 rows affected.
Done.
[12]:
course_id | course_name | credits |
---|---|---|
MTH 141 | Calculus I | 6 |
MTH 142 | Calculus II | 4 |
MTH 448 | Data Oriented Computing | 3 |
MTH 462 | Topics in Analysis | 2 |
Deleting records¶
The statement
DELETE FROM my_table
WHERE
condition;
deletes from my_table
all records for which the specified condition is satisfied. For example, we can use it to delete all courses with less than 4 credits from the courses
table:
[13]:
%%sql
DELETE FROM courses
WHERE
credits < 4;
SELECT *
FROM courses;
* sqlite:///tiny.sqlite
2 rows affected.
Done.
[13]:
course_id | course_name | credits |
---|---|---|
MTH 141 | Calculus I | 6 |
MTH 142 | Calculus II | 4 |
Note
Here are a few comments on the general SQL syntax:
Indentations and newlines in SQL code are ignored, but they can be used to increase code readability.
While it is customary to capitalize keywords in SQL code, this is not necessary.
select * from courses
is equivalent toSELECT * FROM courses
.The semicolon
;
marks the end of an SQL statement and separates each statement from the next one. If we are executing only one statement, its ending semicolon can be omitted.In general, names of tables, columns, and other database objects can be used without quotes. The exception from this rule are names which contain special characters (e.g. spaces) or coincide with SQL keywords. Such names must be enclosed in “double quotes”. Good practice is to avoid names which require this treatment.
All other strings in SQL code (data in table records etc.) must be enclosed in ‘single quotes’.