SQL joins, union, intersection¶
We frequently need to combine data from several database tables and queries to extract information we want. The operators UNION
, INTERSECT
, EXCEPT
and various types of joins are tools that help us accomplish such tasks.
In the examples below we will be using the SQLite database courses_data.sqlite
. It can be downloaded here:
To execute SQL code in Jupyter Notebook we will use the ipython-sql
notebook extension. Assuming that this extension is already installed (see SQL Basics for installation instructions), the code below will load it:
[1]:
%load_ext sql
Next, we connect to the courses_data.sqlite
database:
[2]:
%%sql
sqlite:///courses_data.sqlite
This database consists of a three tables: students
, mth141
, and eng101
. The students
table contains information (names, id numbers, majors) of some college students:
[3]:
%%sql
SELECT *
FROM students
LIMIT 5;
* sqlite:///courses_data.sqlite
Done.
[3]:
student_id | first_name | last_name | major |
---|---|---|---|
196 | Rosina | Hovey | biology |
165 | Stanton | Mcgaugh | biology |
123 | Jonnie | Hillock | mathematics |
136 | Chu | Flax | mathematics |
122 | Rosalinda | Trippe | biology |
The tables mth141
and eng101
contain student grades from two courses: MTH 141 and ENG 101, respectively. Each grade is associated to an id number of a student:
[4]:
%%sql
SELECT *
FROM mth141
ORDER BY student_id;
* sqlite:///courses_data.sqlite
Done.
[4]:
student_id | grade |
---|---|
109 | C |
122 | A |
127 | A |
140 | F |
145 | C |
157 | F |
188 | C |
[5]:
%%sql
SELECT *
FROM eng101
ORDER BY student_id;
* sqlite:///courses_data.sqlite
Done.
[5]:
student_id | grade |
---|---|
122 | A |
123 | C |
140 | D |
165 | F |
188 | A |
189 | A |
INTERSECT¶
Lets say that we want to know id numbers of students who took both courses, MTH 141 and ENG 101. We can accomplish this using the INTERSECT
operator. Given two queries, it returns records which are selected by both of them:
[6]:
%%sql
SELECT student_id
FROM mth141
INTERSECT
SELECT student_id
FROM eng101;
* sqlite:///courses_data.sqlite
Done.
[6]:
student_id |
---|
122 |
140 |
188 |
UNION¶
The UNION
operator combines two queries, returning records selected by either one of them. Identical records selected by both queries are returned only once.
As an example, we can use UNION
to get a list of id numbers of students who took either MTH 141 or ENG 101, or both:
[7]:
%%sql
SELECT student_id
FROM mth141
UNION
SELECT student_id
FROM eng101;
* sqlite:///courses_data.sqlite
Done.
[7]:
student_id |
---|
109 |
122 |
123 |
127 |
140 |
145 |
157 |
165 |
188 |
189 |
EXCEPT¶
Given two queries, the EXCEPT
operator returns these records selected by the first query, which are not selected by the second query.
Below we use it to get a list of id numbers of students who took MTH 141 but not ENG 101:
[8]:
%%sql
SELECT student_id
FROM mth141
EXCEPT
SELECT student_id
FROM eng101;
* sqlite:///courses_data.sqlite
Done.
[8]:
student_id |
---|
109 |
127 |
145 |
157 |
Note
The operators INTERSECT
, UNION
and EXCEPT
can be used to combine only queries which return the same number of columns.
Joins¶
Let say that we want to combine records from the mth141
and eng101
tables so that we get student id numbers in one column, MTH 141 grades in the second column, and ENG 101 grades in the last column. This can be done in a few different ways depending on what precisely we want to accomplish:
Case 1. We can get records of only these students who took both courses and have a grade for both MTH 141 and ENG 101.
Case 2. We can get records of all students who took MTH 141. The column with ENG 101 grades will contain either a grade, for students who took ENG 101, or NULL otherwise.
Case 3. We can get records of all students who took ENG 101, and populate the MTH 141 column with either grades or NULL values depending on whether a student took MTH 141 or not.
Case 4. We can get records of students who took either course. The MTH 141 and ENG 101 columns will be populated with grades wherever they exist, and with NULL values elsewhere.
SQL lets us implement each of these scenarios using different types of table joins: the inner join in case 1, the left and right joins in cases 2 and 3, and the full outer join in case 4.
Inner join¶
The INNER JOIN
clause is used as follows:
SELECT column1, column2, ...
FROM
left_table INNER JOIN right_table
ON condition;
This clause returns combinations of every record of left_table
with every record of right_table
for which the specified condition is satisfied. Records of either table which do not have a match are omitted.
For example, the query below returns combined records from tables mth141
and eng101
which have the same student id number:
[9]:
%%sql
SELECT *
FROM
mth141 INNER JOIN eng101
ON mth141.student_id = eng101.student_id;
* sqlite:///courses_data.sqlite
Done.
[9]:
student_id | grade | student_id_1 | grade_1 |
---|---|---|---|
122 | A | 122 | A |
140 | F | 140 | D |
188 | C | 188 | A |
Notice that to identify which column comes from which table we need to prefix each column name with the name of a table. E.g. mth141.student_id
denotes column student_id
in table mth141
.
The above query returns student_id
column twice since this column appears in both tables. Also, names of columns with grades are not informative. We can fix this by specifying which columns we want to retrieve and using aliases:
[10]:
%%sql
SELECT
mth141.student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
mth141 INNER JOIN eng101
ON mth141.student_id = eng101.student_id;
* sqlite:///courses_data.sqlite
Done.
[10]:
student_id | mth141_grade | eng101_grade |
---|---|---|
122 | A | A |
140 | F | D |
188 | C | A |
The condition in INNER JOIN
(and other types of joins) does not need to test just equality of column values. For example, below we return records of students who took both courses, and received a different grade in each course:
[12]:
%%sql
SELECT
mth141.student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
mth141 INNER JOIN eng101
ON mth141.student_id = eng101.student_id
AND mth141.grade != eng101.grade;
* sqlite:///courses_data.sqlite
Done.
[12]:
student_id | mth141_grade | eng101_grade |
---|---|---|
140 | F | D |
188 | C | A |
Left join¶
The LEFT JOIN
clause is used as follows:
SELECT column1, column2, ...
FROM
left_table LEFT JOIN right_table
ON condition;
It combines every record of left_table
with every record of right_table
for which the given condition is satisfied. Records in the left_table
that not have a match in right_table
are combined with NULL values.
For example, we can use LEFT JOIN
to list MTH 141 students, and include their grades in ENG 101 if they exist:
[13]:
%%sql
SELECT
mth141.student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
mth141 LEFT JOIN eng101
ON mth141.student_id = eng101.student_id;
* sqlite:///courses_data.sqlite
Done.
[13]:
student_id | mth141_grade | eng101_grade |
---|---|---|
122 | A | A |
109 | C | None |
127 | A | None |
140 | F | D |
145 | C | None |
157 | F | None |
188 | C | A |
Right join¶
RIGHT JOIN
is similar to LEFT JOIN
, but with roles of the right_table
and left_table
reversed: if a record of right_table
does not have a matching record in the left_table
, then it gets matched with NULL values.
While RIGHT JOIN
is a standard part of SQL, it is not implemented by SQLite. This is a minor inconvenience since we can get the same results by switching tables and using LEFT JOIN
. Below we do this to get grades of ENG 101 students, supplements by MTH 141 grades if they exist:
[14]:
%%sql
SELECT
eng101.student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
eng101 LEFT JOIN mth141
ON mth141.student_id = eng101.student_id;
* sqlite:///courses_data.sqlite
Done.
[14]:
student_id | mth141_grade | eng101_grade |
---|---|---|
165 | None | F |
123 | None | C |
122 | A | A |
189 | None | A |
140 | F | D |
188 | C | A |
Full outer join¶
FULL OUTER JOIN
is a combination of LEFT JOIN
and RIGHT JOIN
. It matches all records of left_table
with all records of right_table
based on a specified condition. If a record of either of the two tables does not have a match in the other table, it is matched with NULL values.
Again, while FULL OUTER JOIN
is a standard SQL feature, SQLite does not implement it. However, we can get the same results using LEFT JOIN
and UNION
. For example, below we compute the full outer join of the mth141
and eng101
tables for records with the same student id number:
[18]:
%%sql
SELECT
mth141.student_id AS student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
mth141 LEFT JOIN eng101
ON mth141.student_id = eng101.student_id
UNION
SELECT
eng101.student_id AS student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
eng101
LEFT JOIN mth141
ON mth141.student_id = eng101.student_id;
* sqlite:///courses_data.sqlite
Done.
[18]:
student_id | mth141_grade | eng101_grade |
---|---|---|
109 | C | None |
122 | A | A |
123 | None | C |
127 | A | None |
140 | F | D |
145 | C | None |
157 | F | None |
165 | None | F |
188 | C | A |
189 | None | A |
Multiple joins¶
We can combine more that two tables by iterating the join operation. For example, below we use INNER JOIN
twice to get grades of students who took both MTH 141 and ENG 101 combined with student names and majors:
[16]:
%%sql
SELECT
students.*,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
students INNER JOIN mth141 INNER JOIN eng101
ON students.student_id = mth141.student_id
AND students.student_id = eng101.student_id;
* sqlite:///courses_data.sqlite
Done.
[16]:
student_id | first_name | last_name | major | mth141_grade | eng101_grade |
---|---|---|---|---|---|
122 | Rosalinda | Trippe | biology | A | A |
140 | Ludivina | Hoskin | computer science | F | D |
188 | Randall | Lindbloom | mathematics | C | A |
Here is a more complex example. We retrieve grades of students who took either of the two courses and combine them with student names and majors. Notice that the first FROM
is followed by a join of a query getting all grades and the students
table. To identify columns of the query getting grades, we give it a name combined_grades
:
[19]:
%%sql
SELECT
students.*,
combined_grades.mth141_grade,
combined_grades.eng101_grade
FROM
(SELECT
mth141.student_id AS student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
mth141 LEFT JOIN eng101
ON mth141.student_id = eng101.student_id
UNION
SELECT
eng101.student_id AS student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
eng101 LEFT JOIN mth141
ON mth141.student_id = eng101.student_id
) AS combined_grades
INNER JOIN students
ON combined_grades.student_id = students.student_id;
* sqlite:///courses_data.sqlite
Done.
[19]:
student_id | first_name | last_name | major | mth141_grade | eng101_grade |
---|---|---|---|---|---|
165 | Stanton | Mcgaugh | biology | None | F |
123 | Jonnie | Hillock | mathematics | None | C |
122 | Rosalinda | Trippe | biology | A | A |
109 | Shawnna | Coate | mathematics | C | None |
189 | Marcia | Hinkel | mathematics | None | A |
127 | Adelina | Brummitt | biology | A | None |
140 | Ludivina | Hoskin | computer science | F | D |
145 | Ilona | Rabe | computer science | C | None |
157 | Wilbert | Haskell | computer science | F | None |
188 | Randall | Lindbloom | mathematics | C | A |
Views¶
As the last example shows queries can get long and complicated if they involve multiple joins, unions etc. Views provide a tool for splitting such queries into simpler pieces. A view is a virtual table with results of a given query. This virtual table can be then used to construct other queries. For example, here we construct a view combined_grades
listing all MTH 141 and ENG 101 grades:
[15]:
%%sql
CREATE VIEW combined_grades
AS
SELECT
mth141.student_id AS student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
mth141 LEFT JOIN eng101
ON mth141.student_id = eng101.student_id
UNION
SELECT
eng101.student_id AS student_id,
mth141.grade AS mth141_grade,
eng101.grade AS eng101_grade
FROM
eng101
LEFT JOIN mth141
ON mth141.student_id = eng101.student_id;
SELECT *
FROM combined_grades
* sqlite:///courses_data.sqlite
Done.
Done.
[15]:
student_id | mth141_grade | eng101_grade |
---|---|---|
109 | C | None |
122 | A | A |
123 | None | C |
127 | A | None |
140 | F | D |
145 | C | None |
157 | F | None |
165 | None | F |
188 | C | A |
189 | None | A |
We can now obtain records of students who took either of these two courses, listing their names and majors as follows:
[16]:
%%sql
SELECT
students.*,
combined_grades.mth141_grade,
combined_grades.eng101_grade
FROM
combined_grades INNER JOIN students
ON combined_grades.student_id = students.student_id;
* sqlite:///courses_data.sqlite
Done.
[16]:
student_id | first_name | last_name | major | mth141_grade | eng101_grade |
---|---|---|---|---|---|
165 | Stanton | Mcgaugh | biology | None | F |
123 | Jonnie | Hillock | mathematics | None | C |
122 | Rosalinda | Trippe | biology | A | A |
109 | Shawnna | Coate | mathematics | C | None |
189 | Marcia | Hinkel | mathematics | None | A |
127 | Adelina | Brummitt | biology | A | None |
140 | Ludivina | Hoskin | computer science | F | D |
145 | Ilona | Rabe | computer science | C | None |
157 | Wilbert | Haskell | computer science | F | None |
188 | Randall | Lindbloom | mathematics | C | A |
A view is automatically saved in the database so it can be used during subsequent database connections. In order to delete a view we can use the DROP VIEW
statement:
[17]:
%%sql
DROP VIEW combined_grades
* sqlite:///courses_data.sqlite
Done.
[17]:
[]
We can also create temporary views, which exist only during the current database connection and will not be saved:
[28]:
%%sql
CREATE TEMP VIEW mth141_students
AS
SELECT students.*, mth141.grade AS mth141_grade
FROM
mth141 INNER JOIN students
ON
mth141.student_id = students.student_id;
SELECT *
FROM mth141_students;
* sqlite:///courses_data.sqlite
Done.
Done.
[28]:
student_id | first_name | last_name | major | mth141_grade |
---|---|---|---|---|
122 | Rosalinda | Trippe | biology | A |
109 | Shawnna | Coate | mathematics | C |
127 | Adelina | Brummitt | biology | A |
140 | Ludivina | Hoskin | computer science | F |
145 | Ilona | Rabe | computer science | C |
157 | Wilbert | Haskell | computer science | F |
188 | Randall | Lindbloom | mathematics | C |