Using the Pandas Query Method on Index Columns

While the query method in Pandas is used for column-based operations, it is capable of handling index-based queries.

In this tutorial, you will learn how to query index columns using query method.

 

 

Querying Rows Based on Index Column Using query

Let’s consider the same sample data:

import pandas as pd
data = {'Name': ['User1', 'User2', 'User3', 'User4'],
        'MonthlyRevenue': [2000, 1500, 3000, 4000],
        'ContractLength': [12, 24, 12, 18]}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True)
print(df)

Output:

      MonthlyRevenue  ContractLength
Name                                
User1           2000              12
User2           1500              24
User3           3000              12
User4           4000              18

To query based on the index column, you need to use the index identifier in your query string:

result = df.query("index == 'User1'")
print(result)

Output:

      MonthlyRevenue  ContractLength
Name                                
User1           2000              12

 

Slicing the Index to Select a Range of Rows

To select a range of rows, you can use the index identifier and the slicing notation in the query method:

result = df.query("index >= 'User1' and index <= 'User3'")
print(result)

Output:

      MonthlyRevenue  ContractLength
Name                                
User1           2000              12
User2           1500              24
User3           3000              12

Here, we see that the query has successfully returned the rows with index labels from 'User1' to 'User3', inclusive.

 

Using Boolean Indexing on the Index

Boolean indexing is a powerful feature that allows you to filter rows based on certain conditions.

When paired with the query method, you can perform complex filtering criteria with ease.

Suppose you want to filter rows where the index label contains the digit1. To do this, you can use the str.contains method along with query

result = df.query("index.str.contains('1')")
print(result)

Output:

      MonthlyRevenue  ContractLength
Name                                
User1           2000              12

 

Querying with Inequality Operators

You can use the inequality operators like <, <=, >, and >= on index columns when they are lexicographically sortable (i.e., string-based or numeric).

Let’s say you want to filter rows where the index label is lexicographically greater than ‘User2’.

To do this, you can use the inequality operator >.

result = df.query("index > 'User2'")
print(result)

Output:

      MonthlyRevenue  ContractLength
Name                                
User3           3000              12
User4           4000              18

The output shows that the query method filters out rows where the index labels ‘User3’ and ‘User4’ are lexicographically greater than ‘User2’.

 

Combining Multiple Conditions When Querying the Index

The query method supports combining conditions using logical operators like and, or, and not.

Suppose you want to select rows where the index label is either ‘User1’ or ‘User3’ and the MonthlyRevenue is above 2000.

You can combine these conditions as follows:

result = df.query("(index == 'User1' or index == 'User3') and MonthlyRevenue > 2000")
print(result)

Output:

      MonthlyRevenue  ContractLength
Name                                
User3           3000              12

 

Select Rows Based on Index on a MultiIndex DataFrame

The query method can work efficiently with MultiIndex DataFrames as well.

Here’s some sample data with a MultiIndex:

import pandas as pd
arrays = [
    ['User1', 'User1', 'User2', 'User2'],
    ['Jan', 'Feb', 'Jan', 'Feb']
]
index = pd.MultiIndex.from_arrays(arrays, names=('Name', 'Month'))
data = {'MonthlyRevenue': [2000, 1800, 1500, 1600],
        'ContractLength': [12, 12, 24, 24]}
df = pd.DataFrame(data, index=index)
print(df)

Output:

           MonthlyRevenue  ContractLength
Name  Month                              
User1 Jan            2000              12
      Feb            1800              12
User2 Jan            1500              24
      Feb            1600              24

To query this MultiIndex DataFrame, you can use multiple levels of the index in your query string.

For instance, to select rows where the ‘Name’ is ‘User1’ and the ‘Month’ is ‘Jan’:

result = df.query("Name == 'User1' and Month == 'Jan'")
print(result)

Output:

           MonthlyRevenue  ContractLength
Name  Month                              
User1 Jan            2000              12
Leave a Reply

Your email address will not be published. Required fields are marked *