Ignore Case Sensitivity in Pandas query Method

When working with Pandas DataFrames, you may need to filter rows based on text columns that contain the same values but with inconsistent capitalization.

Using Pandas query method for filtering rows becomes challenging in these situations.

In this tutorial, you will learn how to perform case-insensitive queries on a Pandas DataFrame in the query method using str.lower() and str.upper().

Additional methods covered include using str.contains() and str.match() to search for substrings and patterns while ignoring case sensitivity.

 

 

Using the str.lower() or str.upper()

To perform a case-insensitive search using the query method in Pandas, one approach is to use the str.lower() or str.upper() functions to convert all strings to a common case—either all lower case or all upper case.

Then you can compare them easily. First, let’s create a sample DataFrame.

import pandas as pd
data = {'CustomerID': [1, 2, 3, 4],
        'Name': ['John Doe', 'Jane Smith', 'Emily Clark', 'Mike Brown'],
        'City': ['New York', 'San Francisco', 'new york', 'NEW YORK']}
df = pd.DataFrame(data)
print(df)

Output:

   CustomerID        Name           City
0           1    John Doe       New York
1           2  Jane Smith  San Francisco
2           3 Emily Clark       new york
3           4  Mike Brown       NEW YORK

You can see that the ‘City’ column has the same city (‘New York’) in different cases.

Now, let’s perform a case-insensitive query using str.lower().

# Case-insensitive query using str.lower()
result = df.query("City.str.lower() == 'new york'")
print(result)

Output:

   CustomerID        Name      City
0           1    John Doe  New York
2           3 Emily Clark  new york
3           4  Mike Brown  NEW YORK

All the rows with the city ‘New York’, regardless of the case, are included in the result.

 

Using str.contains()

In some situations, you may want to filter rows based on whether a particular string exists within a column’s values, ignoring case sensitivity.

You can combine str.contains() with the case parameter set to False for a case-insensitive search.

Now, let’s say you’re interested in filtering all rows where the city contains the word ‘York’, irrespective of the case.

result_str_contains = df.query("City.str.contains('york', case=False)")
print(result_str_contains)

Output:

   CustomerID        Name      City
0           1    John Doe  New York
2           3 Emily Clark  new york
3           4  Mike Brown  NEW YORK

As you can see, the output includes all rows where the city contains the word ‘York’, regardless of how it’s capitalized.

 

Using str.match()

The str.match() method can also be applied within the query function for pattern matching, allowing for regular expression searches.

When combined with the case parameter set to False, it enables you to perform case-insensitive regular expression matching.

Suppose you want to find all the rows where the city starts with the word ‘New’, ignoring case.

You can do this by using the str.match() method and specifying a regular expression pattern:

# Using str.match() for a case-insensitive search
result_str_match = df.query("City.str.match('new', case=False)")
print(result_str_match)

Output:

   CustomerID        Name      City
0           1    John Doe  New York
2           3 Emily Clark  new york
3           4  Mike Brown  NEW YORK

Here, all the rows where the city starts with the word ‘New’ are included, regardless of the case.

Leave a Reply

Your email address will not be published. Required fields are marked *