NumPy where tutorial (With Examples)
The numpy.where
function is used to retrieve the indices of elements in an ndarray where a given condition is true.
By the end of this tutorial, you will have a solid understanding of how to use numpy.where
to query NumPy arrays.
- 1 Syntax and Parameters
- 2 Replacing Values using x and y Parameters
- 3 Return values
- 4 Using where with Multiple Conditions
- 5 Combining where with Logical Operations
- 6 Using where with Mathematical Functions
- 7 Nested where Functions
- 8 Performance Comparison with Native Python
- 9 Vectorized Operations with where
- 10 Broadcasting with where (Handling Different Shapes)
- 11 Resources
Syntax and Parameters
The numpy.where
function allows you to perform complex queries on NumPy arrays.
Here’s the basic syntax:
numpy.where(condition[, x, y])
condition
: This parameter is an array containing boolean values. It defines the condition that must be satisfied.
You can use comparison operators to define the condition on a given array.x
,y
: These are optional parameters. If provided,numpy.where
returns elements selected fromx
ory
depending on the condition.
If these parameters are not provided, the function will return the indices where the condition is true.
Let’s explore the syntax with an example:
import numpy as np array = np.array([10, 20, 30, 40]) condition = array > 25 result = np.where(condition) print(result)
Output:
(array([2, 3], dtype=int64),)
In this example, we have defined a condition array > 25.
The numpy.where
function checks this condition for each element in the array and returns a tuple containing the indices of the elements that meet the condition.
The elements 30 and 40 satisfy the condition, and their indices (2 and 3) are returned.
The optional parameters x and y provide further control over the output.
Replacing Values using x and y Parameters
The x
and y
parameters in numpy.where
provide additional flexibility in the function’s behavior. When these parameters are provided, the function returns values from x
and y
based on the condition, instead of returning the indices.
Here’s an example to demonstrate the use of x
and y
:
import numpy as np array = np.array([5, 15, 25, 35]) result = np.where(array > 20, 'High', 'Low') print(result)
Output:
['Low' 'Low' 'High' 'High']
In this example, the x and y parameters are set to High and Low, respectively. The condition is array > 20.
The value High is returned when the condition is satisfied (for the elements 25 and 35).
Where the condition is not satisfied (for the elements 5 and 15), the value Low is returned.
Using numpy.where
, we replace all non-matched numbers with the string ‘Low’ and all matched numbers with the string ‘High’.
Return values
You can return values that satisfynumpy.where
query instead of returning indices like this:
import numpy as np array = np.array([25, 15, 35, 10, 40]) filtered_indices = np.where(array > 20) filtered_values = array[filtered_indices] print("Filtered indices:", filtered_indices) print("Filtered values:", filtered_values)
Output:
Filtered indices: (array([0, 2, 4]),) Filtered values: [25 35 40]
In this example, we first use numpy.where
to find the indices where the condition array > 20 is true. Then, we use those indices to extract the corresponding values from the original array.
The result is a new array containing only the values that satisfy the condition.
Using where with Multiple Conditions
Here’s an example that demonstrates how to use numpy.where
with multiple conditions:
import numpy as np array = np.array([5, 15, 25, 35, 45]) condition = (array > 20) & (array < 40) result = np.where(condition, 'Match', 'No Match') print(result)
Output:
['No Match' 'No Match' 'Match' 'Match' 'No Match']
In this example, we used the logical AND operator & to combine two conditions: array > 20 and array < 40.
The numpy.where
function returns Match for elements that satisfy both conditions (25 and 35) and No Match for elements that do not.
Combining where with Logical Operations
numpy.where
can be combined with logical operations to create complex queries on arrays.
By using logical operators like &
(and), |
(or), and ~
(not), you can combine multiple conditions.
Here’s an example to demonstrate the combination of numpy.where
with logical operations:
import numpy as np array = np.array([10, 20, 30, 40, 50]) result = np.where((array > 15) & (array < 45) | (array == 10), 'Selected', 'Not Selected') print(result)
Output:
['Selected' 'Selected' 'Selected' 'Selected' 'Not Selected']
In this example, we combined three conditions:
1. (array > 15): Selects elements greater than 15.
2. (array < 45): Selects elements less than 45.
3. (array == 10): Selects elements equal to 10.
We used the &
operator to combine the first two conditions and the |
operator to include the third condition.
The result is an array that marks all elements except the last one (50) as 'Selected'
.
Using where with Mathematical Functions
The numpy.where
function can be combined with mathematical functions to perform computations based on conditions.
This allows you to apply different mathematical transformations to elements depending on whether a condition is met.
Here’s an example:
import numpy as np array = np.array([1, 2, 3, 4, 5]) result = np.where(array > 3, np.square(array), np.sqrt(array)) print(result)
Output:
[1. 1.41421356 1.73205081 16. 25. ]
In this example, the numpy.where function applies two different mathematical functions based on the condition array > 3:
If the condition is true, the np.square function is applied, squaring the value.
If the condition is false, the np.sqrt function is applied, taking the square root of the value.
For the elements 1, 2, and 3 (where the condition is false), the square root is computed.
For the elements 4 and 5 (where the condition is true), the square is computed.
Nested where Functions
The numpy.where
function can be nested within itself to create a chain of conditions, allowing for more granular control over the output.
This is useful when you want to apply multiple levels of conditions.
Here’s an example of nested numpy.where
functions:
import numpy as np array = np.array([5, 15, 25, 35, 45]) result = np.where(array < 20, 'Low', np.where(array < 40, 'Medium', 'High')) print(result)
Output:
['Low' 'Low' 'Medium' 'Medium' 'High']
In this example, we used two nested numpy.where
functions to categorize the elements into three groups.
The first numpy.where
function checks if the elements are less than 20. If true, it returns Low.
If false, it calls the second numpy.where
function, which further categorizes the elements into Medium or High.
Performance Comparison with Native Python
Here’s a benchmark test using both numpy.where
and a native Python approach:
import numpy as np import time array = np.random.randint(0, 100, size=100000000) # Using numpy.where start_time = time.time() result_np = np.where(array > 50, 'Greater', 'Smaller') end_time = time.time() print("Using numpy.where:", end_time - start_time) # Using native Python start_time = time.time() result_python = ['Greater' if x > 50 else 'Smaller' for x in array] end_time = time.time() print("Using native Python:", end_time - start_time)
Output:
Using numpy.where: 1.0875394344329834 Using native Python: 10.121704816818237
In this comparison, we measured the time taken to perform the same operation using numpy.where
and a native Python list comprehension.
The numpy.where
is significantly faster, as it leverages the underlying C implementation and avoids Python’s loop overhead.
Vectorized Operations with where
Vectorized operations refer to applying a function or operation to an entire array at once, rather than iterating through it element by element.
numpy.where
supports vectorized operations, making it efficient for large-scale data manipulation.
Here’s an example that demonstrates vectorized operations with numpy.where
:
import numpy as np array1 = np.array([1, 2, 3, 4, 5]) array2 = np.array([5, 4, 3, 2, 1]) condition = array1 > array2 result = np.where(condition, array1 + array2, array1 - array2) print(result)
Output:
[-4 -2 6 6 10]
In this example, we created two NumPy arrays and a condition that compares their corresponding elements.
Using numpy.where
, we applied two different vectorized operations based on the condition:
If the condition is true, the corresponding elements of array1 and array2 are added.
If the condition is false, the corresponding elements of array1 and array2 are subtracted.
Since the condition is only true for the third, fourth, and fifth elements, those are added, while the rest are subtracted.
Broadcasting with where (Handling Different Shapes)
Broadcasting in NumPy refers to the ability to perform operations on arrays of different shapes and sizes in a way that they are automatically broadcasted to a common shape.
Here’s an example:
import numpy as np array = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) condition = np.array([True, False, True]) result = np.where(condition, array, -array) print(result)
Output:
[[ 1 -2 3] [ 4 -5 6] [ 7 -8 9]]
In this example, the condition array has a shape of (3,), while the array has a shape of (3, 3).
The numpy.where
function broadcasts the condition to match the shape of the array.
For the first and third columns (where the condition is true), the original values are retained.
For the second column (where the condition is false), the values are negated.
Resources
https://numpy.org/doc/stable/reference/generated/numpy.where.html
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.