Filter NumPy Array by Mask Array in Python
In this tutorial, you’ll learn how to filter NumPy array by mask array, we’ll apply masks to one-dimensional (1D) and two-dimensional (2D) arrays, and even higher-dimensional data.
We’ll also explore the concept of broadcasting masks to apply a single condition across multiple dimensions of an array.
1D Mask for 1D Array
You can use a mask array to filter 1D array.
First, import NumPy and initialize your data array:
import numpy as np data_array = np.array([3.2, 4.5, 5.7, 2.2, 6.1, 3.3, 7.4, 1.8, 5.9])
Now, let’s define the mask array as per your specification:
mask_array = np.array([True, False, True, False, True, False, True, False, True])
With the mask array defined, apply it to the data array to filter the values:
filtered_data = data_array[mask_array] print(filtered_data)
Output:
[3.2 5.7 6.1 7.4 5.9]
In this output, filtered_data
contains elements from data_array
corresponding to the True
positions in mask_array
.
1D Mask for 2D Array (Row or Column Filtering)
Applying a 1D mask for a row or column allows you to selectively extract entire rows or columns based on a condition applied across one dimension.
First, create a sample 2D array:
import numpy as np user_metrics = np.array([ [2.5, 180, 4.1], [5.1, 220, 3.8], [3.3, 140, 4.5], [7.8, 200, 3.2], [1.2, 80, 4.8], [4.5, 160, 3.9], [6.9, 240, 3.5] ])
Suppose you want to filter the rows where data usage is more than 4 GB. First, create a mask from the first column (data usage):
mask = user_metrics[:, 0] > 4 # Mask for data usage greater than 4 GB print(mask)
Output:
[False True False True False True True]
The mask array contains True
for rows where the condition (data usage > 4 GB) is met.
Now, apply this mask to filter the rows:
filtered_rows = user_metrics[mask] print(filtered_rows)
Output:
[[ 5.1 220. 3.8] [ 7.8 200. 3.2] [ 4.5 160. 3.9] [ 6.9 240. 3.5]]
The filtered_rows
array now includes only those rows from user_metrics
where the data usage was more than 4 GB.
Similarly, you can modify this approach to filter columns, although that’s less common since columns usually represent different types of data.
Combining Multiple Masks
Let’s build on the previous examples and see how to combine multiple masks for more sophisticated data filtering.
First, recall our 2D array of user metrics:
import numpy as np user_metrics = np.array([ [2.5, 180, 4.1], # Data in GB, Call duration in minutes, Customer rating [5.1, 220, 3.8], [3.3, 140, 4.5], [7.8, 200, 3.2], [1.2, 80, 4.8], [4.5, 160, 3.9], [6.9, 240, 3.5] ])
Suppose you want to filter users who have used more than 4 GB of data and have a customer rating above 3.5. Create two masks and then combine them:
mask1 = user_metrics[:, 0] > 4 # Data usage > 4 GB mask2 = user_metrics[:, 2] > 3.5 # Customer rating > 3.5 combined_mask = mask1 & mask2 print(combined_mask)
Output:
[False True False False False True False]
The combined_mask
is a result of combining mask1
and mask2
using the logical AND operation (&
). It is True
only where both conditions are met.
Now, apply this combined mask to filter the array:
filtered_data = user_metrics[combined_mask] print(filtered_data)
Output:
[[ 5.1 220. 3.8] [ 4.5 160. 3.9]]
You can also use other logical operations like OR (|
) and NOT (~
) to create more diverse combinations of masks.
Masking in Higher-Dimensional Arrays
Imagine a scenario where a 3D array represents data usage, call duration, and customer ratings across different cities and time periods.
First, let’s create a sample 3D array:
import numpy as np # Sample 3D array: dimensions might represent [City, Time Period, Metric] telecom_data = np.array([ [[2.5, 180, 4.1], [5.1, 220, 3.8], [3.3, 140, 4.5]], [[3.2, 150, 4.0], [7.8, 200, 3.2], [1.2, 80, 4.8]], [[4.5, 160, 3.9], [6.9, 240, 3.5], [2.1, 110, 4.2]] ])
In this array, let’s assume the first dimension is different cities, the second dimension is time periods, and the third dimension is various metrics.
To apply masking to this array, you first need to define a condition. Let’s say you want to identify where data usage exceeds 4 GB:
mask = telecom_data[:, :, 0] > 4 # Mask for data usage > 4 GB print(mask)
Output:
[[False True False] [False True False] [ True True False]]
This mask is a 2D array representing the condition applied across cities and time periods for the data usage metric.
Apply this mask to the 3D array:
filtered_data = telecom_data[mask] print(filtered_data)
Output:
[[5.1 220. 3.8] [7.8 200. 3.2] [4.5 160. 3.9] [6.9 240. 3.5]]
In filtered_data
, you get a flattened array of metric sets where the data usage exceeds 4 GB.
Note that the result is no longer a 3D structure because the mask is applied across two dimensions and flattened the data.
Broadcasting Masks
Broadcasting in NumPy allows you to apply operations across arrays of different shapes.
This concept extends to the use of masks as well, enabling you to apply a mask across an entire array, even if their dimensions don’t exactly match.
Consider a case where you have a 2D array representing various user metrics over time, and you want to apply a condition across all these metrics uniformly.
First, let’s create a sample 2D array:
import numpy as np user_metrics = np.array([ [2.5, 5.1, 3.3, 7.8, 1.2], [180, 220, 140, 200, 80], [4.1, 3.8, 4.5, 3.2, 4.8] ])
Suppose you have a mask based on a single condition that you want to apply across all rows. For instance, a condition that identifies metrics greater than 4:
mask = np.array([False, True, False, True, False]) # Mask to be broadcasted
Broadcast this mask across the entire 2D array:
broadcasted_mask = mask[np.newaxis, :] filtered_metrics = user_metrics[:, broadcasted_mask[0]] print(filtered_metrics)
Output:
[[ 5.1 7.8] [220. 200. ] [ 3.8 3.2]]
In this example, broadcasted_mask
is a 2D version of the original mask, expanded to match the shape of user_metrics
.
The mask is applied to all rows, filtering columns based on the mask’s condition.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.