# Filter NumPy Array by Mask Array in Python

In this tutorial, you’ll learn how to filter NumPy array by mask array, we’ll apply masks to one-dimensional (1D) and two-dimensional (2D) arrays, and even higher-dimensional data.

We’ll also explore the concept of broadcasting masks to apply a single condition across multiple dimensions of an array.

## 1D Mask for 1D Array

You can use a mask array to filter 1D array.

First, import NumPy and initialize your data array:

```import numpy as np
data_array = np.array([3.2, 4.5, 5.7, 2.2, 6.1, 3.3, 7.4, 1.8, 5.9])
```

```mask_array = np.array([True, False, True, False, True, False, True, False, True])
```

With the mask array defined, apply it to the data array to filter the values:

```filtered_data = data_array[mask_array]
print(filtered_data)
```

Output:

```[3.2 5.7 6.1 7.4 5.9]
```

In this output, `filtered_data` contains elements from `data_array` corresponding to the `True` positions in `mask_array`.

## 1D Mask for 2D Array (Row or Column Filtering)

Applying a 1D mask for a row or column allows you to selectively extract entire rows or columns based on a condition applied across one dimension.

First, create a sample 2D array:

```import numpy as np
user_metrics = np.array([
[2.5, 180, 4.1],
[5.1, 220, 3.8],
[3.3, 140, 4.5],
[7.8, 200, 3.2],
[1.2, 80,  4.8],
[4.5, 160, 3.9],
[6.9, 240, 3.5]
])
```

Suppose you want to filter the rows where data usage is more than 4 GB. First, create a mask from the first column (data usage):

```mask = user_metrics[:, 0] > 4  # Mask for data usage greater than 4 GB
```

Output:

```[False  True False  True False  True  True]
```

The mask array contains `True` for rows where the condition (data usage > 4 GB) is met.

Now, apply this mask to filter the rows:

```filtered_rows = user_metrics[mask]
print(filtered_rows)
```

Output:

```[[  5.1 220.    3.8]
[  7.8 200.    3.2]
[  4.5 160.    3.9]
[  6.9 240.    3.5]]
```

The `filtered_rows` array now includes only those rows from `user_metrics` where the data usage was more than 4 GB.

Similarly, you can modify this approach to filter columns, although that’s less common since columns usually represent different types of data.

Let’s build on the previous examples and see how to combine multiple masks for more sophisticated data filtering.

First, recall our 2D array of user metrics:

```import numpy as np
user_metrics = np.array([
[2.5, 180, 4.1],  # Data in GB, Call duration in minutes, Customer rating
[5.1, 220, 3.8],
[3.3, 140, 4.5],
[7.8, 200, 3.2],
[1.2, 80,  4.8],
[4.5, 160, 3.9],
[6.9, 240, 3.5]
])
```

Suppose you want to filter users who have used more than 4 GB of data and have a customer rating above 3.5. Create two masks and then combine them:

```mask1 = user_metrics[:, 0] > 4  # Data usage > 4 GB
mask2 = user_metrics[:, 2] > 3.5  # Customer rating > 3.5
```

Output:

```[False  True False False False  True False]
```

The `combined_mask` is a result of combining `mask1` and `mask2` using the logical AND operation (`&`). It is `True` only where both conditions are met.

Now, apply this combined mask to filter the array:

```filtered_data = user_metrics[combined_mask]
print(filtered_data)
```

Output:

```[[  5.1 220.    3.8]
[  4.5 160.    3.9]]
```

You can also use other logical operations like OR (`|`) and NOT (`~`) to create more diverse combinations of masks.

Imagine a scenario where a 3D array represents data usage, call duration, and customer ratings across different cities and time periods.

First, let’s create a sample 3D array:

```import numpy as np

# Sample 3D array: dimensions might represent [City, Time Period, Metric]
telecom_data = np.array([
[[2.5, 180, 4.1], [5.1, 220, 3.8], [3.3, 140, 4.5]],
[[3.2, 150, 4.0], [7.8, 200, 3.2], [1.2, 80, 4.8]],
[[4.5, 160, 3.9], [6.9, 240, 3.5], [2.1, 110, 4.2]]
])
```

In this array, let’s assume the first dimension is different cities, the second dimension is time periods, and the third dimension is various metrics.

To apply masking to this array, you first need to define a condition. Let’s say you want to identify where data usage exceeds 4 GB:

```mask = telecom_data[:, :, 0] > 4  # Mask for data usage > 4 GB
```

Output:

```[[False  True False]
[False  True False]
[ True  True False]]
```

This mask is a 2D array representing the condition applied across cities and time periods for the data usage metric.

Apply this mask to the 3D array:

```filtered_data = telecom_data[mask]
print(filtered_data)
```

Output:

```[[5.1 220.   3.8]
[7.8 200.   3.2]
[4.5 160.   3.9]
[6.9 240.   3.5]]
```

In `filtered_data`, you get a flattened array of metric sets where the data usage exceeds 4 GB.

Note that the result is no longer a 3D structure because the mask is applied across two dimensions and flattened the data.

Broadcasting in NumPy allows you to apply operations across arrays of different shapes.

This concept extends to the use of masks as well, enabling you to apply a mask across an entire array, even if their dimensions don’t exactly match.

Consider a case where you have a 2D array representing various user metrics over time, and you want to apply a condition across all these metrics uniformly.

First, let’s create a sample 2D array:

```import numpy as np
user_metrics = np.array([
[2.5, 5.1, 3.3, 7.8, 1.2],
[180, 220, 140, 200, 80],
[4.1, 3.8, 4.5, 3.2, 4.8]
])
```

Suppose you have a mask based on a single condition that you want to apply across all rows. For instance, a condition that identifies metrics greater than 4:

```mask = np.array([False, True, False, True, False])  # Mask to be broadcasted
```

```broadcasted_mask = mask[np.newaxis, :]
```[[  5.1   7.8]
In this example, `broadcasted_mask` is a 2D version of the original mask, expanded to match the shape of `user_metrics`.