Numpy Basics

Numpy is a standard module for doing numerical computations in Python. It provides tools for writing code which is both easier to develop and usually a lot faster than it would be without numpy.

[2]:
import numpy as np

The main objects provided by numpy are numpy arrays, that in their simplest form are similar to lists. A list can be converted into a numpy array using the np.array() function:

[2]:
a = np.array([1, 2, 3])
a
[2]:
array([1, 2, 3])

Vectorized computations

One advantage of using numpy arrays is that many array computations can be vectorized i.e. can be performed on all array elements at once:

[3]:
# multiplication by a number
2*a
[3]:
array([2, 4, 6])
[4]:
# exponentiation
a**3
[4]:
array([ 1,  8, 27])
[5]:
# incrementing elements of an array
a += 10
a
[5]:
array([11, 12, 13])

Similarly, if we add, multiply etc. two arrays of the same size, the computations will be performed on all corresponding pairs of array elements:

[6]:
a = np.array([1, 2, 3])
b = np.array([400, 500, 600])

print(f"a = {a}\nb = {b}")
a = [1 2 3]
b = [400 500 600]
[7]:
# addition of arrays
a+b
[7]:
array([401, 502, 603])
[8]:
# multiplication of arrays
a*b
[8]:
array([ 400, 1000, 1800])

The numpy module contains implementations of many mathematical functions (trigonometric, logarithmic etc.) that can be applied to whole array:

[9]:
# compute sine of every element of an array
np.sin(a)
[9]:
array([0.84147098, 0.90929743, 0.14112001])

Numpy arrays vs lists

In some respects numpy arrays behave the same way as lists. For example, indexing and slicing work as usual:

[10]:
x = np.array([1,2,3,4,5,6])
x[1:4]
[10]:
array([2, 3, 4])
[11]:
x[5]
[11]:
6

On the other hand, there are several differences between lists and numpy arrays:

1. Lists can contain objects of different types, but in numpy arrays all objects must be of the same, fixed type (integer, float, string, boolean etc).

[12]:
a = np.array([100, 200, 300])  # a is an array of integers
a[0] = 'hello'                 # assigning a string as an array element results in an error
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-1829de9be60c> in <module>
      1 a = np.array([100, 200, 300])  # a is an array of integers
----> 2 a[0] = 'hello'                 # assigning a string as an array element results in an error

ValueError: invalid literal for int() with base 10: 'hello'

2. Lists can be shortened and extended (e.g. using append). The size of a numpy array is fixed when the array is created and it can’t be changed.

3. Lists slicing produces a new list, independent of the original list. For numpy arrays, slicing produces a view of an array. Changing a slice changes the original array:

[13]:
a = np.array([1, 2, 3, 4])

b = a[:3]    # b is a view of a
b[0] = 999   # by changing b we change a as well

print(f"b = {b}")
print(f"a = {a}")
b = [999   2   3]
a = [999   2   3   4]

We can use the np.copy() function to get an independent copy of an array or its slice:

[14]:
a = np.array([1, 2, 3, 4])

b = np.copy(a[:3]) # create a copy of a slice of a
b[0] = 999         # changing b does not affect a

print(f"b = {b}")
print(f"a = {a}")
b = [999   2   3]
a = [1 2 3 4]

Creating numpy arrays

Numpy arrays can be created in several ways:

1. The np.array() function converts a list into a numpy array:

[15]:
a = np.array([1,2,3])
a
[15]:
array([1, 2, 3])

2. The np.full(n, fill_value=x) produces an array of length n with all entries equal to x:

[16]:
np.full(5, fill_value=3.14)
[16]:
array([3.14, 3.14, 3.14, 3.14, 3.14])

3. The functions np.zeros() and np.ones() are special cases of np.full(). They create arrays of a given length filled with zeros and ones, respectively:

[17]:
a0 = np.zeros(5)
a0
[17]:
array([0., 0., 0., 0., 0.])
[18]:
a1 = np.ones(7)
a1
[18]:
array([1., 1., 1., 1., 1., 1., 1.])

4. np.empty() creates an array of a given length with unitialized entries (more precisely, values of the array will be equal to whatever is in the computer memory in the region allocated to the array) . This is useful if we want to set values of the array at some later point. For large arrays np.empty() works faster than np.full(), np.zeros() and np.ones():

[19]:
c = np.empty(4)
print(c)
[4.9e-324 9.9e-324 1.5e-323 2.0e-323]

Note. By default np.zeros(), np.ones(), and np.empty() create arrays of floats, but we can use the dtype argument to specify a different data type:

[20]:
a = np.zeros(5, dtype=int)   # create an array of integers
a
[20]:
array([0, 0, 0, 0, 0])

5. np.arange() is similar to the range() function, but it produces a numpy array:

[21]:
numbers = np.arange(10)
numbers
[21]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
[22]:
evens = np.arange(10, 20, 2) # start at 10, stop at 20, increment by 2
evens
[22]:
array([10, 12, 14, 16, 18])

6. np.linspace(a, b, n) creates an array of n evenly spaced values between the numbers a and b:

[23]:
x = np.linspace(1, 2, 5) # array of 5 numbers, starting at 1 and ending at 2
x
[23]:
array([1.  , 1.25, 1.5 , 1.75, 2.  ])

Multidimensional numpy arrays

Numpy arrays can have more than one dimension. One way to create such an array is to start with a 1-dimensional array and use the numpy reshape() function that rearranges elements of that array into a new shape.

[24]:
a = np.arange(12)   # the array to be reshaped
b = a.reshape(3,4)  # reshape a into 3 rows and 4 columns
b
[24]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

We can access elements of a multidimensional array by specifying index for each dimension:

[25]:
# get the element in row 0 and column 2
b[0,2]
[25]:
2

The functions np.full(), np.zeros(), np.ones() and np.empty() can be used to create arrays with more than one dimension:

[26]:
# create an array 4 rows and 5 columns
c = np.ones((4,5))
c
[26]:
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

Mathematical operations on multidimensional arrays

Mathematical operations on multidimensional arrays work similarly as for 1-dimensional arrays.

[27]:
a = np.arange(1, 5).reshape(2,2)
b = np.full((2,2), fill_value = 10)

print(f"a = \n{a}\n")
print(f"b = \n{b}")
a =
[[1 2]
 [3 4]]

b =
[[10 10]
 [10 10]]
[28]:
# multiplication by a number
5*a
[28]:
array([[ 5, 10],
       [15, 20]])
[29]:
# addition of two arrays of the same dimensions
a+b
[29]:
array([[11, 12],
       [13, 14]])
[30]:
# multiplication of two arrays of the same dimensions
a*b
[30]:
array([[10, 20],
       [30, 40]])

Notice that array multiplication multiplies corresponding elements of arrays. In order to perform matrix multiplication of 2-dimensional arrays we can use the numpy @ operator:

[31]:
# martix product of 2-dimensional arrays
a@b
[31]:
array([[30, 30],
       [70, 70]])

Mathematical functions defined by numpy can be applied to multidimensional arrays:

[33]:
# compute cosine of all elements of the array a
np.cos(a)
[33]:
array([[ 0.54030231, -0.41614684],
       [-0.9899925 , -0.65364362]])

Slicing multidimensional arrays

In order to create a slice of a multidimensional array we need to specify which part of each dimension we want to select:

[34]:
# create a 5x6 array
a = np.arange(30).reshape(5,6)
a
[34]:
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])
[35]:
#select elements in rows 1-3 and columns 0-1
b = a[1:4, 0:2]
b
[35]:
array([[ 6,  7],
       [12, 13],
       [18, 19]])
[36]:
#select elements in rows 0-2 and columns 2-3
c = a[:3, 2:4]
c
[36]:
array([[ 2,  3],
       [ 8,  9],
       [14, 15]])
[37]:
# select all elements in column 0
d = a[:, 0]
d
[37]:
array([ 0,  6, 12, 18, 24])

Note. a[i] is equivalent to a[i,:] i.e. it selects the i-th row of an array:

[38]:
# select all elements in row 1
a[1]
[38]:
array([ 6,  7,  8,  9, 10, 11])

Similarly as for 1-dimensional arrays, slicing produces a view of the original array, and changing a slice changes the original array:

[39]:
b = a[:3, :3]   # create a slice
b[0,0] = 1000   # changing a slice changes the original array as well

print(f"b = \n{b}\n")
print(f"a = \n{a}")
b =
[[1000    1    2]
 [   6    7    8]
 [  12   13   14]]

a =
[[1000    1    2    3    4    5]
 [   6    7    8    9   10   11]
 [  12   13   14   15   16   17]
 [  18   19   20   21   22   23]
 [  24   25   26   27   28   29]]

We can use this to change several entries of an array at once:

[40]:
a[:4, :4] = 0  # set all entries of the slice to 0
a
[40]:
array([[ 0,  0,  0,  0,  4,  5],
       [ 0,  0,  0,  0, 10, 11],
       [ 0,  0,  0,  0, 16, 17],
       [ 0,  0,  0,  0, 22, 23],
       [24, 25, 26, 27, 28, 29]])

Sorting

Numpy has several functions for creating arrays with randomly selected entries. For example, the function np.random.randint(low, high, size) returns an array of integers the entries of which are greater or equal to low and strictly smaller than high. The size argument specifies the size of the matrix:

[6]:
# 1-dimensional array of 5 elements with entries 0 <= n < 10
rng = np.random.default_rng(0)
a = rng.integers(0, 10, size=5)
a
[6]:
array([8, 6, 5, 2, 3])
[7]:
# 2-dimensional array of size (4, 5) with entries -10 <= n < 11
b = rng.integers(-10, 11, size=(4, 5))
b
[7]:
array([[-10,  -9, -10,  -7,   7],
       [  3,   9,   0,   2,  10],
       [  5,   3,   1,   1,   9],
       [ -5,   7,   4, -10,  -2]])

For a 1-dimensional array a the function np.sort(a) sorts elements of the array from the smallest to the largest:

[8]:
np.sort(a)
[8]:
array([2, 3, 5, 6, 8])

For multidimensional arrays, np.sort() takes an additional axis argument which specifies the coordinate axis along which the sort is to be performed:

[9]:
# sort b along axis 0, i.e. sort values of each column
np.sort(b, axis=0)
[9]:
array([[-10,  -9, -10, -10,  -2],
       [ -5,   3,   0,  -7,   7],
       [  3,   7,   1,   1,   9],
       [  5,   9,   4,   2,  10]])
[10]:
# sort b along axis 1, i.e. sort values of each row
np.sort(b, axis=1)
[10]:
array([[-10, -10,  -9,  -7,   7],
       [  0,   2,   3,   9,  10],
       [  1,   1,   3,   5,   9],
       [-10,  -5,  -2,   4,   7]])

Using np.sort() without the axis argument sorts the array along its last axis:

[11]:
# the same as sorting along axis 1
np.sort(b)
[11]:
array([[-10, -10,  -9,  -7,   7],
       [  0,   2,   3,   9,  10],
       [  1,   1,   3,   5,   9],
       [-10,  -5,  -2,   4,   7]])

The function np.argsort(a) is similar to np.sort, but instead of returning sorted values of a it produces an array of indices showing how the entries of the array should be rearranged to be sorted:

[12]:
print("a:")
print(a)
print("\nnp.argsort(a):")
print(np.argsort(a))
a:
[8 6 5 2 3]

np.argsort(a):
[3 4 2 1 0]
[13]:
print("b:")
print(b)
print("\nnp.argsort(b, axis=1):")
print(np.argsort(b, axis=1))
b:
[[-10  -9 -10  -7   7]
 [  3   9   0   2  10]
 [  5   3   1   1   9]
 [ -5   7   4 -10  -2]]

np.argsort(b, axis=1):
[[0 2 1 3 4]
 [2 3 0 1 4]
 [2 3 1 0 4]
 [3 0 4 2 1]]

Aggregation functions

Numpy includes several aggregation functions that summarize, in various ways, data contained in an array.

[14]:
# create a 3x4 array of randomly selected integers from the range 0 <= x < 20
a = rng.integers(0, 20, size=(3, 4))
a
[14]:
array([[17, 11,  0, 15],
       [14, 16,  3,  1],
       [17,  0, 10,  1]])

Compute the sum of all array entries:

[15]:
a.sum()
[15]:
105

Computer the average value of array entries:

[16]:
a.mean()
[16]:
8.75

Compute the minimum and maximum values of the array:

[17]:
a.min(), a.max()
[17]:
(0, 17)

Aggregate functions can be used with an additional axis argument, which indicates that the function should be applied along one coordinate axis of the array. For example, a.max(axis = 0) computes maximum of each column of the array:

[18]:
a.max(axis=0)
[18]:
array([17, 16, 10, 15])

Similarly, a.max(axis = 1) computes maximum of each row:

[19]:
a.max(axis=1)
[19]:
array([17, 16, 17])