Filter Null and not Null Values in Pandas query method

The query method allows you filter DataFrame rows based on a query expression.

In this tutorial, you will learn how to check for null and not null values in the Pandas query method in Python.

 

 

Check for Null Values

Firstly, let’s create a sample DataFrame with some data, including null values.

import pandas as pd
data = {'ID': [1, 2, 3, 4, 5],
        'Metric1': [32, 45, None, 21, 56],
        'Metric2': [None, 25, 43, 54, None]}
df = pd.DataFrame(data)
print(df)

Output:

   ID  Metric1  Metric2
0   1     32.0      NaN
1   2     45.0     25.0
2   3      NaN     43.0
3   4     21.0     54.0
4   5     56.0      NaN

The DataFrame contains two columns, Metric1 and Metric2, with some null values represented as NaN (Not a Number).

Now, let’s count the number of null values in the Metric1 column using isnull():

# Count null values in the Metric1 column
null_count = df['Metric1'].isnull().sum()
print("Number of null values in Metric1:", null_count)

Output:

Number of null values in Metric1: 1

 

Check for Not Null Values

You can use the notnull() function to find the number of not null values in a DataFrame column.

# Count not null values in the Metric1 column
not_null_count = df['Metric1'].notnull().sum()
print("Number of not null values in Metric1:", not_null_count)

Output:

Number of not null values in Metric1: 4

 

Filter Rows Based on Null Values

The query method can filter out null values from specific rows:

filtered_df = df.query("Metric1.isnull()")
print(filtered_df)

Output:

   ID  Metric1  Metric2
2   3      NaN     43.0

The filtered DataFrame displays only the row where Metric1 is null.

 

Filter Rows Based on Not Null Values

Just as you can filter rows based on null values, you can also do the opposite—filter rows where the values are not null.

Here’s how to do that using the query method.

filtered_not_null_df = df.query("Metric1.notnull()")
print(filtered_not_null_df)

Output:

   ID  Metric1  Metric2
0   1     32.0      NaN
1   2     45.0     25.0
3   4     21.0     54.0
4   5     56.0      NaN

The DataFrame now only includes rows where Metric1 is not null.

 

Filter Based on Null Values Across Multiple Columns

In many real-world scenarios, you may need to filter rows based on null values across multiple columns.

Let’s see how to filter rows based on null values in either the Metric1 or Metric2 columns.

# Filter rows where either Metric1 or Metric2 is null
filtered_multiple_cols_df = df.query("Metric1.isnull() | Metric2.isnull()")
print(filtered_multiple_cols_df)

Output:

   ID  Metric1  Metric2
0   1     32.0      NaN
2   3      NaN     43.0
4   5     56.0      NaN

The use of the pipe symbol (|) serves as a logical OR, enabling the filter to catch rows where either of the columns is null.

 

Combining isnull() and notnull()

The query method offers a flexible way to combine isnull() and notnull() conditions.

Let’s dive into an example.

# Filter rows where Metric1 is not null and Metric2 is null
combined_query_df = df.query("Metric1.notnull() & Metric2.isnull()")
print(combined_query_df)

Output:

   ID  Metric1  Metric2
0   1     32.0      NaN
4   5     56.0      NaN

The DataFrame now displays rows where Metric1 is not null and Metric2 is null.

The ampersand symbol (&) is used for logical AND, meaning both conditions must be met for a row to be included in the output.

 

Filtering on Null Values with Other Conditions

Let’s illustrate this with an example where you want to filter rows where Metric1 is greater than 30 and is not null.

# Filter rows where Metric1 is greater than 30 and not null
conditional_query_df = df.query("Metric1 > 30 & Metric1.notnull()")
print(conditional_query_df)

Output:

   ID  Metric1  Metric2
0   1     32.0      NaN
1   2     45.0     25.0
4   5     56.0      NaN
2 thoughts on “Filter Null and not Null Values in Pandas query method
  1. Thanks for the tutorial.

    When I try the suggested code for filtering rows which contain a null value:
    df.query(“metric1.isnull()”)
    I get the following error:
    TypeError: ‘Series’ objects are mutable, thus they cannot be hashed

    I can use bracket notation successfully:
    df.loc[df[‘Metric1’].isnull()]
    But I would like to understand the problem and why the .query() method isn’t working for me.

    1. You’re welcome!

      I tested both snippets and they return the same result without any issues.
      In the line you wrote, you used “metric1” and it should be “Metric1” Capital “M”.

      Regards,

Leave a Reply

Your email address will not be published. Required fields are marked *