Week 4 (2/21-2/27)

Notebook

Weekly digest

k-means

  • The elbow method.

  • Dimensionality reduction.

Project

Pandas

  • Series and DataFrames

  • Selecting data

  • Sorting

  • Aggregations

Resources

1. MNIST data download

[ ]:
from pathlib import Path
import requests
import numpy as np
import gzip

mnist_url = "http://yann.lecun.com/exdb/mnist/"
img_file = "train-images-idx3-ubyte.gz"
labels_file = "train-labels-idx1-ubyte.gz"

for fname in [img_file, labels_file]:
    if Path(fname).is_file() :
        print(f"Found: {fname}")
        continue
    print(f"Downloading: {fname}")
    r = requests.get(mnist_url + fname)
    with open(fname, 'wb') as foo:
        foo.write(r.content)

with gzip.open(img_file, 'rb') as foo:
    f = foo.read()
images = np.array([b for b in f[16:]]).reshape(-1, 28*28)

with gzip.open(labels_file, 'rb') as foo:
    f = foo.read()
labels = np.array([b for b in f[8:]])

2. Planets data

[12]:
planets = ["Mercury", "Venus", "Earth", "Mars", "Jupyter", "Saturn", "Uranus", "Neptune"]
diameters = [4879, 12104, 12756, 6792, 142984, 120536, 51118, 49528]
temperatures = [167, 464, 15, -65, -110, -140, -195, -200]
gravity = [3.7, 8.9, 9.8, 3.7, 23.1, 9.0, 8.7, 11.0]

Exercises

All exercises below use data on passengers of Titanic. A DataFrame with this data can be created as follows:

[1]:
import seaborn as sns
df = sns.load_dataset("titanic")
df.head(5)
[1]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Note. This is partial data on Titanic passengers. It consists of 891 records, while the total number of Titanic passengers was over 2000.

Exercise 1

Create a DataFrame with records of all males.

Check: The DataFrame should consist of 577 rows.

Exercise 2

Create a DataFrame with records of all males who survived the sinking of Titanic.

Check: The DataFrame should consist of 109 rows.

Exercise 3

Find the age of the oldest Titanic passenger.

Check: You should get 80.

Exercise 4

Get the record of the oldest passenger.

Check: The passenger was a male, embarked in Southampton, traveled in the first class and survived.

Exercise 5

Create a DataFrame with records of the 5 oldest females.

Check: The ages of the women should be: 63, 63, 62, 60, and 58.

Exercise 6

Find the record of the oldest female who did not survive.

Check: She was 57, embarked in Southampton and traveled in the second class.

Exercise 7

Find the number of people who survived the sinking and the number of people who died.

Check: 342 survived, 549 died.

Exercise 8

What was the average age of people who survived?

Check: 28.34 years.

Exercise 9

There were three classes of passengers aboard Titanic: “First”, “Second” and “Third”. Compute what fraction of passengers traveling in each class survived.

Check: First: 0.629, Second: 0.473, Third: 0.242.