NumPy where tutorial (With Examples)

The numpy.where function is used to retrieve the indices of elements in an ndarray where a given condition is true.

By the end of this tutorial, you will have a solid understanding of how to use numpy.where to query NumPy arrays.

 

 

Syntax and Parameters

The numpy.where function allows you to perform complex queries on NumPy arrays.
Here’s the basic syntax:

numpy.where(condition[, x, y])
  • condition: This parameter is an array containing boolean values. It defines the condition that must be satisfied.
    You can use comparison operators to define the condition on a given array.
  • x, y: These are optional parameters. If provided, numpy.where returns elements selected from x or y depending on the condition.
    If these parameters are not provided, the function will return the indices where the condition is true.

Let’s explore the syntax with an example:

import numpy as np
array = np.array([10, 20, 30, 40])
condition = array > 25
result = np.where(condition)
print(result)

Output:

(array([2, 3], dtype=int64),)

In this example, we have defined a condition array > 25.

The numpy.where function checks this condition for each element in the array and returns a tuple containing the indices of the elements that meet the condition.

The elements 30 and 40 satisfy the condition, and their indices (2 and 3) are returned.

The optional parameters x and y provide further control over the output.

 

Replacing Values using x and y Parameters

The x and y parameters in numpy.where provide additional flexibility in the function’s behavior. When these parameters are provided, the function returns values from x and y based on the condition, instead of returning the indices.
Here’s an example to demonstrate the use of x and y:

import numpy as np
array = np.array([5, 15, 25, 35])
result = np.where(array > 20, 'High', 'Low')
print(result)

Output:

['Low' 'Low' 'High' 'High']

In this example, the x and y parameters are set to High and Low, respectively. The condition is array > 20.

The value High is returned when the condition is satisfied (for the elements 25 and 35).

Where the condition is not satisfied (for the elements 5 and 15), the value Low is returned.

Using numpy.where, we replace all non-matched numbers with the string ‘Low’ and all matched numbers with the string ‘High’.

 

Return values

You can return values that satisfynumpy.where query instead of returning indices like this:

import numpy as np
array = np.array([25, 15, 35, 10, 40])
filtered_indices = np.where(array > 20)
filtered_values = array[filtered_indices]
print("Filtered indices:", filtered_indices)
print("Filtered values:", filtered_values)

Output:

Filtered indices: (array([0, 2, 4]),)
Filtered values: [25 35 40]

In this example, we first use numpy.where to find the indices where the condition array > 20 is true. Then, we use those indices to extract the corresponding values from the original array.

The result is a new array containing only the values that satisfy the condition.

 

Using where with Multiple Conditions

Here’s an example that demonstrates how to use numpy.where with multiple conditions:

import numpy as np
array = np.array([5, 15, 25, 35, 45])
condition = (array > 20) & (array < 40)
result = np.where(condition, 'Match', 'No Match')
print(result)

Output:

['No Match' 'No Match' 'Match' 'Match' 'No Match']

In this example, we used the logical AND operator & to combine two conditions: array > 20 and array < 40.

The numpy.where function returns Match for elements that satisfy both conditions (25 and 35) and No Match for elements that do not.

 

Combining where with Logical Operations

numpy.where can be combined with logical operations to create complex queries on arrays.

By using logical operators like & (and), | (or), and ~ (not), you can combine multiple conditions.
Here’s an example to demonstrate the combination of numpy.where with logical operations:

import numpy as np
array = np.array([10, 20, 30, 40, 50])
result = np.where((array > 15) & (array < 45) | (array == 10), 'Selected', 'Not Selected')
print(result)

Output:

['Selected' 'Selected' 'Selected' 'Selected' 'Not Selected']

In this example, we combined three conditions:

1. (array > 15): Selects elements greater than 15.
2. (array < 45): Selects elements less than 45.
3. (array == 10): Selects elements equal to 10.

We used the & operator to combine the first two conditions and the | operator to include the third condition.

The result is an array that marks all elements except the last one (50) as 'Selected'.

 

Using where with Mathematical Functions

The numpy.where function can be combined with mathematical functions to perform computations based on conditions.

This allows you to apply different mathematical transformations to elements depending on whether a condition is met.
Here’s an example:

import numpy as np
array = np.array([1, 2, 3, 4, 5])
result = np.where(array > 3, np.square(array), np.sqrt(array))
print(result)

Output:

[1.         1.41421356 1.73205081 16.         25.        ]

In this example, the numpy.where function applies two different mathematical functions based on the condition array > 3:

If the condition is true, the np.square function is applied, squaring the value.

If the condition is false, the np.sqrt function is applied, taking the square root of the value.

For the elements 1, 2, and 3 (where the condition is false), the square root is computed.

For the elements 4 and 5 (where the condition is true), the square is computed.

 

Nested where Functions

The numpy.where function can be nested within itself to create a chain of conditions, allowing for more granular control over the output.

This is useful when you want to apply multiple levels of conditions.
Here’s an example of nested numpy.where functions:

import numpy as np
array = np.array([5, 15, 25, 35, 45])
result = np.where(array < 20, 'Low', np.where(array < 40, 'Medium', 'High'))
print(result)

Output:

['Low' 'Low' 'Medium' 'Medium' 'High']

In this example, we used two nested numpy.where functions to categorize the elements into three groups.

The first numpy.where function checks if the elements are less than 20. If true, it returns Low.

If false, it calls the second numpy.where function, which further categorizes the elements into Medium or High.

 

Performance Comparison with Native Python

Here’s a benchmark test using both numpy.where and a native Python approach:

import numpy as np
import time
array = np.random.randint(0, 100, size=100000000)

# Using numpy.where
start_time = time.time()
result_np = np.where(array > 50, 'Greater', 'Smaller')
end_time = time.time()
print("Using numpy.where:", end_time - start_time)

# Using native Python
start_time = time.time()
result_python = ['Greater' if x > 50 else 'Smaller' for x in array]
end_time = time.time()
print("Using native Python:", end_time - start_time)

Output:

Using numpy.where: 1.0875394344329834
Using native Python: 10.121704816818237

In this comparison, we measured the time taken to perform the same operation using numpy.where and a native Python list comprehension.

The numpy.where is significantly faster, as it leverages the underlying C implementation and avoids Python’s loop overhead.

 

Vectorized Operations with where

Vectorized operations refer to applying a function or operation to an entire array at once, rather than iterating through it element by element.

numpy.where supports vectorized operations, making it efficient for large-scale data manipulation.
Here’s an example that demonstrates vectorized operations with numpy.where:

import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([5, 4, 3, 2, 1])
condition = array1 > array2
result = np.where(condition, array1 + array2, array1 - array2)
print(result)

Output:

[-4 -2  6  6 10]

In this example, we created two NumPy arrays and a condition that compares their corresponding elements.

Using numpy.where, we applied two different vectorized operations based on the condition:

If the condition is true, the corresponding elements of array1 and array2 are added.

If the condition is false, the corresponding elements of array1 and array2 are subtracted.

Since the condition is only true for the third, fourth, and fifth elements, those are added, while the rest are subtracted.

 

Broadcasting with where (Handling Different Shapes)

Broadcasting in NumPy refers to the ability to perform operations on arrays of different shapes and sizes in a way that they are automatically broadcasted to a common shape.

Here’s an example:

import numpy as np
array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
condition = np.array([True, False, True])
result = np.where(condition, array, -array)
print(result)

Output:

[[ 1 -2 3]
[ 4 -5 6]
[ 7 -8 9]]

In this example, the condition array has a shape of (3,), while the array has a shape of (3, 3).

The numpy.where function broadcasts the condition to match the shape of the array.

For the first and third columns (where the condition is true), the original values are retained.

For the second column (where the condition is false), the values are negated.

 

Resources

https://numpy.org/doc/stable/reference/generated/numpy.where.html

Leave a Reply

Your email address will not be published. Required fields are marked *