Boolean and fancy indexing

Boolean arrays

A Boolean array is a numpy array with Boolean (True/False) values. Such array can be obtained by applying a logical operator to another numpy array:

[81]:
import numpy as np

# create a 4x4 array of integers
a = np.arange(16).reshape(4, 4)
a
[81]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
[2]:
# test which elements of a are greated than 5
large_values = (a > 5)
large_values
[2]:
array([[False, False, False, False],
       [False, False,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])
[3]:
# test which elements of a are even
even_values = (a%2 == 0)
even_values
[3]:
array([[ True, False,  True, False],
       [ True, False,  True, False],
       [ True, False,  True, False],
       [ True, False,  True, False]])
[4]:
# another 4x4 array
b = np.reshape(np.arange(21, 5, -1), (4, 4))
b
[4]:
array([[21, 20, 19, 18],
       [17, 16, 15, 14],
       [13, 12, 11, 10],
       [ 9,  8,  7,  6]])
[5]:
# test which elements of a are greater than the corresponding elements of b
equals = (a > b)
equals
[5]:
array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False,  True],
       [ True,  True,  True,  True]])

Logical operations on Boolean arrays

Boolean arrays can be combined using logical operators:

operator

meaning

~

negation (logical “not”)

&

logical “and”

|

logical “or”

[6]:
# test which elements of a are not divisible by 3
b = ~(a%3 == 0)

print(f"a=\n{a}\n")
print(f"b=\n{b}")
a=
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

b=
[[False  True  True False]
 [ True  True False  True]
 [ True False  True  True]
 [False  True  True False]]
[7]:
# test which elements of a are divisible by either 2 or 3
c = (a%2 == 0) | (a%3 == 0)

print(f"a=\n{a}\n")
print(f"c=\n{c}")
a=
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

c=
[[ True False  True  True]
 [ True False  True False]
 [ True  True  True False]
 [ True False  True  True]]
[8]:
# test which elements of a are divisible by both 2 and 3
d = (a%2 == 0) & (a%3 == 0)

print(f"a=\n{a}\n")
print(f"d=\n{d}")
a=
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

d=
[[ True False False False]
 [False False  True False]
 [False False False False]
 [ True False False False]]

Indexing with Boolean arrays

Boolean arrays can be used to select elements of other numpy arrays. If a is any numpy array and b is a boolean array of the same dimensions then a[b] selects all elements of a for which the corresponding value of b is True.

[9]:
# create a 4x4 array of integers
a = np.reshape(np.arange(16), (4, 4))
a
[9]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
[10]:
# test which elements of a are even
b = (a%2 == 0)
b
[10]:
array([[ True, False,  True, False],
       [ True, False,  True, False],
       [ True, False,  True, False],
       [ True, False,  True, False]])
[11]:
# select all even elements of the array a
a[b]
[11]:
array([ 0,  2,  4,  6,  8, 10, 12, 14])

We can use Boolean indexing to modify elements of an array based on a logical condition:

[12]:
# set values of all even elements of the array to 100
a[a%2 == 0] = 100
a
[12]:
array([[100,   1, 100,   3],
       [100,   5, 100,   7],
       [100,   9, 100,  11],
       [100,  13, 100,  15]])

In the next example we create two numpy arrays, x and y, and then use Boolean indexing to set all entries of x that are smaller that the corresponding entries of y to -1:

[3]:
# create two 3x3 arrays of random numbers
rng = np.random.default_rng(0)
x = rng.random((3, 3))
y = rng.random((3, 3))

print(f"x=\n{x}\n")
print(f"y=\n{y}")
x=
[[0.63696169 0.26978671 0.04097352]
 [0.01652764 0.81327024 0.91275558]
 [0.60663578 0.72949656 0.54362499]]

y=
[[0.93507242 0.81585355 0.0027385 ]
 [0.85740428 0.03358558 0.72965545]
 [0.17565562 0.86317892 0.54146122]]
[5]:
x[x < y] = -1
x
[5]:
array([[-1.        , -1.        ,  0.04097352],
       [-1.        ,  0.81327024,  0.91275558],
       [ 0.60663578, -1.        ,  0.54362499]])

Fancy indexing

Fancy indexing is a feature of numpy arrays that lets us provide a list of indices to an array instead of a single index. This selects all array elements with indices on the list.

[6]:
# create an array of 10 integers randomly selected from the range 0 <= x < 100
rng = np.random.default_rng(0)
a = rng.integers(0, 100, 10)
a
[6]:
array([85, 63, 51, 26, 30,  4,  7,  1, 17, 81])
[7]:
# select array elements with index 1, 5, and 6
a[[1, 5, 6]]
[7]:
array([63,  4,  7])

Fancy indexing works with multidimensional arrays as well:

[8]:
b = np.arange(20).reshape(4, 5)
b
[8]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
[9]:
# select rows 1 and 3
b[[1, 3]]
[9]:
array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19]])
[10]:
# select columns 0,1, and 4
b[:, [0, 1, 4]]
[10]:
array([[ 0,  1,  4],
       [ 5,  6,  9],
       [10, 11, 14],
       [15, 16, 19]])

Given a 2-dimensional array b the code b[[1, 2, 3], [1, 4, 4]] selects elements b[1,1], b[2,4], and b[3,4]:

[11]:
b[[1, 2, 3], [1, 4, 4]]
[11]:
array([ 6, 14, 19])

Just as with other indexing schemes, we can use fancy indexing to modify several entries of an array at once:

[12]:
b[[1, 2, 3], [1, 4, 4]] = 1000
b
[12]:
array([[   0,    1,    2,    3,    4],
       [   5, 1000,    7,    8,    9],
       [  10,   11,   12,   13, 1000],
       [  15,   16,   17,   18, 1000]])

Fancy indexing can be mixed with other forms of indexing. For example, we can apply fancy indexing to one axis of an array, and slicing to the second axis:

[22]:
b = np.arange(20).reshape(4, 5)
b
[22]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
[23]:
# select elements which are in rows 1, 3 and columns 0-3
b[[1, 3], :3]
[23]:
array([[ 5,  6,  7],
       [15, 16, 17]])

Fancy indexing example

Here is an example of how fancy indexing can be used in practice. Lets say that we have a numpy array that in the first column gives student id numbers, and in the second column their exam scores:

[13]:
rng = np.random.default_rng(0)
exam_scores = np.array([np.arange(5000, 5010), rng.integers(0, 100, 10)]).T
exam_scores
[13]:
array([[5000,   85],
       [5001,   63],
       [5002,   51],
       [5003,   26],
       [5004,   30],
       [5005,    4],
       [5006,    7],
       [5007,    1],
       [5008,   17],
       [5009,   81]])

Assume that we want to sort rows of this array in increasing order of exam scores. Applying the numpy argsort() function to the score column we obtain an array that shows how row indices should be arranged to give such ordering:

[14]:
indices = np.argsort(exam_scores[:, 1])
indices
[14]:
array([7, 5, 6, 8, 3, 4, 2, 1, 9, 0])

Fancy indexing lets us apply this to sort the array with exam scores:

[15]:
exam_scores[indices]
[15]:
array([[5007,    1],
       [5005,    4],
       [5006,    7],
       [5008,   17],
       [5003,   26],
       [5004,   30],
       [5002,   51],
       [5001,   63],
       [5009,   81],
       [5000,   85]])