Week 4 (2/21-2/27)¶
Notebook¶
Download the notebook file: week_4_class.ipynb.
Weekly digest¶
k-means¶
The elbow method.
Dimensionality reduction.
Project¶
Pandas¶
Series and DataFrames
Selecting data
Sorting
Aggregations
Resources¶
1. MNIST data download¶
[ ]:
from pathlib import Path
import requests
import numpy as np
import gzip
mnist_url = "http://yann.lecun.com/exdb/mnist/"
img_file = "train-images-idx3-ubyte.gz"
labels_file = "train-labels-idx1-ubyte.gz"
for fname in [img_file, labels_file]:
if Path(fname).is_file() :
print(f"Found: {fname}")
continue
print(f"Downloading: {fname}")
r = requests.get(mnist_url + fname)
with open(fname, 'wb') as foo:
foo.write(r.content)
with gzip.open(img_file, 'rb') as foo:
f = foo.read()
images = np.array([b for b in f[16:]]).reshape(-1, 28*28)
with gzip.open(labels_file, 'rb') as foo:
f = foo.read()
labels = np.array([b for b in f[8:]])
2. Planets data¶
[12]:
planets = ["Mercury", "Venus", "Earth", "Mars", "Jupyter", "Saturn", "Uranus", "Neptune"]
diameters = [4879, 12104, 12756, 6792, 142984, 120536, 51118, 49528]
temperatures = [167, 464, 15, -65, -110, -140, -195, -200]
gravity = [3.7, 8.9, 9.8, 3.7, 23.1, 9.0, 8.7, 11.0]
Exercises¶
All exercises below use data on passengers of Titanic. A DataFrame with this data can be created as follows:
[1]:
import seaborn as sns
df = sns.load_dataset("titanic")
df.head(5)
[1]:
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
Note. This is partial data on Titanic passengers. It consists of 891 records, while the total number of Titanic passengers was over 2000.
Exercise 1¶
Create a DataFrame with records of all males.
Check: The DataFrame should consist of 577 rows.
Exercise 2¶
Create a DataFrame with records of all males who survived the sinking of Titanic.
Check: The DataFrame should consist of 109 rows.
Exercise 3¶
Find the age of the oldest Titanic passenger.
Check: You should get 80.
Exercise 4¶
Get the record of the oldest passenger.
Check: The passenger was a male, embarked in Southampton, traveled in the first class and survived.
Exercise 5¶
Create a DataFrame with records of the 5 oldest females.
Check: The ages of the women should be: 63, 63, 62, 60, and 58.
Exercise 6¶
Find the record of the oldest female who did not survive.
Check: She was 57, embarked in Southampton and traveled in the second class.
Exercise 7¶
Find the number of people who survived the sinking and the number of people who died.
Check: 342 survived, 549 died.
Exercise 8¶
What was the average age of people who survived?
Check: 28.34 years.
Exercise 9¶
There were three classes of passengers aboard Titanic: “First”, “Second” and “Third”. Compute what fraction of passengers traveling in each class survived.
Check: First: 0.629, Second: 0.473, Third: 0.242.