NumPy where tutorial (With Examples)

Looking up for entries that satisfy a specific condition is a painful process, especially if you are searching it in a large dataset having hundreds or thousands of entries.
If you know the fundamental SQL queries, you must be aware of the ‘WHERE’ clause that is used with the SELECT statement to fetch such entries from a relational database that satisfy certain conditions.

NumPy offers similar functionality to find such items in a NumPy array that satisfy a given Boolean condition through its ‘where()‘ function — except that it is used in a slightly different way than the SQL SELECT statement with the WHERE clause.

In this tutorial, we’ll look at the various ways the NumPy where function can be used for a variety of use cases. Let’s get going.



A very simple usage of NumPy where

Let’s begin with a simple application of ‘np.where()‘ on a 1-dimensional NumPy array of integers.
We will use ‘np.where’ function to find positions with values that are less than 5.

We’ll first create a 1-dimensional array of 10 integer values randomly chosen between 0 and 9.

import numpy as np
a = np.random.randint()
print("a = {}".format(a))


array a for simple usage of np.where

Now we will call ‘np.where’ with the condition ‘a < 5’, i.e., we’re asking ‘np.where’ to tell us where in the array a are the values less than 5.
It will return us an array of indices where the specified condition is satisfied.

result = np.where(a < 5)


output of simple usage of np.where on array

We get the indices 1,3,6,9 as output, and it can be verified from the array that the values at these positions are indeed less than 5.
Note that the returned value is a 1-element tuple. This tuple has an array of indices.
We’ll understand the reason for the result being returned as a tuple when we discuss np.where on 2D arrays.


How does NumPy where work?

To understand what goes on inside the complex expression involving the ‘np.where’ function, it is important to understand the first parameter of ‘np.where’, that is the condition.

When we call a Boolean expression involving NumPy array such as ‘a > 2’ or ‘a % 2 == 0’, it actually returns a NumPy array of Boolean values.

This array has the value True at positions where the condition evaluates to True and has the value False elsewhere. This serves as a ‘mask‘ for NumPy where function.

Here is a code example.

a = np.array([1, 10, 13, 8, 7, 9, 6, 3, 0])
print ("a > 5:")
print(a > 5)


Boolean mask array using condition on numpy array

So what we effectively do is that we pass an array of Boolean values to the ‘np.where’ function, which then returns the indices where the array had the value True.

This can be verified by passing a constant array of Boolean values instead of specifying the condition on the array that we usually do.

bool_array = np.array([True, True, True, False, False, False, False, False, False])


passing Boolean mask to np.where

Notice how, instead of passing a condition on an array of actual values, we passed a Boolean array, and the ‘np.where’ function returned us the indices where the values were True.


2D matrices

We have seen it on 1-dimensional NumPy arrays, let us understand how would ‘np.where’ behave on 2D matrices.

The idea remains the same. We call the ‘np.where’ function and pass a condition on a 2D matrix. The difference is in the way it returns the result indices.
Earlier, np.where returned a 1-dimensional array of indices (stored inside a tuple) for a 1-D array, specifying the positions where the values satisfy a given condition.

But in the case of a 2D matrix, a single position is specified using two values — the row index and the column index.
So in this case, np.where will return two arrays, the first one carrying the row indices and the second one carrying the corresponding column indices.

Both these rows and column index arrays are stored inside a tuple (now you know why we got a tuple as an answer even in case of a 1-D array).

Let’s see this in action to better understand it.
We’ll write a code to find where in a 3×3 matrix are the entries divisible by 2.

a = np.random.randint(0,10, size=(3,3))
print("a =\n{}\n".format(a))
result = np.where(a % 2 == 0)
print("result: {}".format(result))


np.where on a 2d matrix

The returned tuple has two arrays, each bearing the row and column indices of the positions in the matrix where the values are divisible by 2.

Ordered pairwise selection of values from the two arrays gives us a position each.
The length of each of the two arrays is 5, indicating there are five such positions satisfying the given condition.

If we look at the 3rd pair — (1,1), the value at (1,1) in the matrix is six, which is divisible by 2.
Likewise, you can check and verify with other pairs of indices as well.


Multidimensional array

Just as we saw the working of ‘np.where’ on a 2-D matrix, we will get similar results when we apply np.where on a multidimensional NumPy array.

The length of the returned tuple will be equal to the number of dimensions of the input array.
Each array at position k in the returned tuple will represent the indices in the kth dimension of the elements satisfying the specified condition.

Let’s quickly look at an example.

a = np.random.randint(0,10, size=(3,3,3,3)) #4-dimensional array
print("a =\n{}\n".format(a))
result = np.where(a == 5) #checking which values are equal to 5
print("len(result)= {}".format(len(result)))
print("len(result[0]= {})".format(len(result[0])))


np.where on multidimensional array

len(result) = 4 indicates the input array is of 4 dimension.

The length of one of the arrays in the result tuple is 6, which means there are six positions in the given 3x3x3x3 array where the given condition (i.e., containing value 5) is satisfied.


Using the result as an index

So far we have looked at how we get the tuple of indices, in each dimension, of the values satisfying the given condition.

Most of the time we’d be interested in fetching the actual values satisfying the given condition instead of their indices.

To achieve this, we can use the returned tuple as an index on the given array. This will return only those values whose indices are stored in the tuple.

Let’s check this for the 2-D matrix example.

a = np.random.randint(0,10, size=(3,3))
print("a =\n{}\n".format(a))
result_indices = np.where(a % 2 == 0)
result = a[result_indices]
print("result: {}".format(result))


using the result of np.where as index of numpy array

As discussed above, we get all those values (not their indices) that satisfy the given condition which, in our case was divisibility by 2, i.e., even numbers.


Parameters ‘x’ and ‘y’

Instead of getting the indices as a result of calling the ‘np.where’ function, we can also provide as parameters, two optional arrays x and y of the same shape (or broadcastable shape) as input array, whose values will be returned when the specified condition on the corresponding values in input array is True or False respectively.

For instance, if we call the method on a 1-dimensional array of length 10, and we supply two more arrays x and y of the same length.
In this case, whenever a value in input array satisfies the given condition, the corresponding value in array x will be returned whereas, if the condition is false on a given value, the corresponding value from array y will be returned.

These values from x and y at their respective positions will be returned as an array of the same shape as the input array.

Let’s get a better understanding of this through code.

a = np.random.randint(0,10, size=(10))
x = a
y = a*10
print("a = {}".format(a))
print("x = {}".format(x))
print("y = {}".format(y))
result = np.where(a%2 == 1, x, y) #if number is odd return the same number else return its multiple of 10.
print("\nresult = {}".format(result))


demonstration of using parameters x and y in numpy where

This method is useful if you want to replace the values satisfying a particular condition by another set of values and leaving those not satisfying the condition unchanged.
In that case, we will pass the replacement value(s) to the parameter x and the original array to the parameter y.

Note that we can pass either both x and y together or none of them. We can’t pass one of them and skip the other.

Multiple conditions

So far we have been evaluating a single Boolean condition in the ‘np.where’ function. We may sometimes need to combine multiple Boolean conditions using Boolean operators like ‘AND‘ or ‘OR’.

It is easy to specify multiple conditions and combine them using a Boolean operator.
The only caveat is that for the NumPy array of Boolean values, we cannot use the normal keywords ‘and’ or ‘or’ that we typically use for single values.
We need to use the ‘&’ operator for ‘AND’ and ‘|’ operator for ‘OR’ operation for element-wise Boolean combination operations.

Let us understand this through an example.

a = np.random.randint(0,15, (5,5)) #5x5 matrix with values from 0 to 14


A 5x5 matrix to be used for demonstrating multiple conditions in np.where

We will look for values that are smaller than 8 and are odd. We can combine these two conditions using the AND (&) operator.

# get indices of odd values less than 8 in a
indices = np.where((a < 8) & (a % 2==1)) 
#print the actual values


combining multiple conditions with AND operation in np.where

We can also use the OR (|) operator to combine the same conditions.
This will give us values that are ‘less than 8’ OR ‘odd values, ‘ i.e., all values less than 8 and all odd values greater than 8 will be returned.

# get indices of values less than 8 OR odd values in a
indices = np.where((a < 8) | (a % 2==1))
#print the actual values


combining multiple conditions with OR operation in np.where


Finding rows of zeros

Sometimes, in a 2D matrix, some or all of the rows have all values equal to zero. For instance, check out the following NumPy array.

a = np.array([[1, 2, 0],
             [0, 9, 20],
             [0, 0, 0],
             [3, 3, 12],
             [0, 0, 0]
             [1, 0, 0]])


a 2d numpy array having two rows with all values equal to zero

As we can see the rows 2 and 4 have all values equal to zero. But how do we find this using the ‘np.where’ function?

If we want to find such rows using NumPy where function, we will need to come up with a Boolean array indicating which rows have all values equal to zero.

We can use the ‘np.any()‘ function with ‘axis = 1’, which returns True if at least one of the values in a row is non-zero.

The result of np.any() will be a Boolean array of length equal to the number of rows in our NumPy matrix, in which the positions with the value True indicate the corresponding row has at least one non-zero value.

But we need a Boolean array that was quite the opposite of this!

Well, we can get this through a simple inversion step. The NOT or tilde (~) operator inverts each of the Boolean values in a NumPy array.

The inverted Boolean array can then be passed to the ‘np.where’ function.

Ok, that was a long, tiring explanation.
Let’s see this thing in action.

zero_rows = np.where(~np.any(a, axis=1))[0]


output of np.where indicating indices of zero rows

Let’s look at what’s happening step-by-step:

  1. np.any() returns True if at least one element in the matrix is True (non-zero). axis = 1 indicates it to do this operation row-wise.
  2. It would return a Boolean array of length equal to the number of rows in a, with the value True for rows having non-zero values, and False for rows having all values = 0.
    np.any(a, axis=1)
    output of np.any on a 2d matrix with parameter axis=1
  3. The tilde (~) operator inverts the above Boolean array:
    ~np.any(a, axis=1)
    output of inverting the borolean array using tilde operator
  4. ‘np.where()’ accepts this Boolean array and returns indices having the value True.

The indexing [0] is used because, as discussed earlier, ‘np.where’ returns a tuple.


Finding the last occurrence of a true condition

We know that NumPy’s ‘where’ function returns multiple indices or pairs of indices (in case of a 2D matrix) for which the specified condition is true.

But sometimes we are interested in only the first occurrence or the last occurrence of the value for which the specified condition is met.

Let’s take the simple example of a one-dimensional array where we will find the last occurrence of a value divisible by 3.

a = np.random.randint(0,10, size=(10))
print("Array a:", a)
indices = np.where(a%3==0)[0]
last_occurrence_position = indices[-1]
print("last occurrence at", last_occurrence_position)


an array a and output index of last occurence of a condition in np.where

Here we could directly use the index ‘-1’ on the returned indices to get the last value in the array.

But how would we extract the position of the last occurrence in a multidimensional array, where the returned result is a tuple of arrays and each array stores the indices in one of the dimensions?

We can use the zip function, which takes multiple iterables and returns a pairwise combination of values from each iterable in the given order.

It returns an iterator object, and so we need to convert the returned object into a list or a tuple or any iterable.

Let’s first see how zip works:

a = (1, 2, 3, 4)
b = (5, 6, 7, 8)
c = list(zip(a,b))


demonstration of zip function

So the first element of a and the first element of b form a tuple, then the second element of a and the second element of b form the second tuple in c, and so on.

We’ll use the same technique to find the position of the last occurrence of a condition being satisfied in a multidimensional array.

Let’s use it for a 2D matrix with the same condition as we saw in the earlier example.

a = np.random.randint(0,10, size=(3,3))
print("Matrix a:\n", a)
indices = np.where(a % 3 == 0)
last_occurrence_position = list(zip(*indices))[-1]
print("last occurrence at",last_occurrence_position)


a matrix a and output index of last occurrence of a condition in np.where

We can see in the matrix the last occurrence of a multiple of 3 is at the position (2,1), which is the value 6.

Note: The * operator is an unpacking operator that we can use to unpack a sequence of values into separate positional arguments.


We began the tutorial with simple usage of ‘np.where’ function on a 1-dimensional array with conditions specified on numeric data.

Then we looked at the application of ‘np.where’ on a 2D matrix and then on a general multidimensional NumPy array.
Also, we understood how to interpret the tuple of arrays returned by ‘np.where’ in such cases.

Then we understood the functionality of ‘np.where’ in detail, using Boolean masks.
We also saw how we could use the result of this method as an index to extract the actual original values that satisfy the given condition.

We looked at the behavior of the ‘np.where’ function with the optional arguments ‘x’ and ‘y’.

We also looked at the nested use of ‘np.where’, its usage in finding the zero rows in a 2D matrix, and then finding the last occurrence of the value satisfying the condition specified by ‘np.where’

Leave a Reply

Your email address will not be published. Required fields are marked *