Using variables in Pandas query (Build Dynamic queries)

The query method allows you to filter and subset DataFrames in Pandas.

It enables you to write SQL-like queries to filter your data.

In this tutorial, we’ll go through the process of incorporating variables in your query statements to make dynamic queries.

 

 

Using ‘@’ Symbol for Direct Substitution

The @ symbol allows you to incorporate Python variables into your query string.

First, you’ll need to create a sample DataFrame.

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4, 5],
    'PlanType': ['Basic', 'Premium', 'Basic', 'Enterprise', 'Premium'],
    'MonthlyCharge': [10, 30, 10, 50, 30]
}
df = pd.DataFrame(data)
print(df)

Output:

   CustomerID    PlanType  MonthlyCharge
0           1       Basic             10
1           2     Premium             30
2           3       Basic             10
3           4  Enterprise             50
4           5     Premium             30

Let’s say you want to filter rows where MonthlyCharge is 30.

You can store this value in a variable and then use it in the query method with the @ symbol.

charge_to_filter = 30
filtered_df = df.query('MonthlyCharge == @charge_to_filter')
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge
1           2  Premium             30
4           5  Premium             30

 

Using String Formatting Methods

Another way to use variables in the Pandas query method is by using Python’s string formatting.

This technique is useful when you’re dealing with more complex queries or when you need to build a query string dynamically.

Using f-strings

Python 3.6 introduced f-strings, a way to embed expressions inside string literals using curly braces {}. Let’s use f-strings to filter rows based on multiple conditions.

Suppose you have a variable for MonthlyCharge and another one for PlanType, and you want to filter data based on both.

charge_to_filter = 30
plan_to_filter = 'Premium'
query_str = f"MonthlyCharge == {charge_to_filter} and PlanType == '{plan_to_filter}'"
filtered_df = df.query(query_str)
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge
1           2  Premium             30
4           5  Premium             30

Notice how the query string was built using f-string formatting. The variables charge_to_filter and plan_to_filter are directly embedded into the query string.

 

Using .format() Method

If you’re using a version of Python prior to 3.6, or if you prefer the .format() method for string interpolation, you can get the same result.

query_str = "MonthlyCharge == {} and PlanType == '{}'".format(charge_to_filter, plan_to_filter)
filtered_df = df.query(query_str)
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge
1           2  Premium             30
4           5  Premium             30

 

Use of Logical Operators with Variables

In Pandas, you can use AND, OR, and NOT operators along with variables in the query method to apply multiple conditions.

Using AND Operator

You can combine multiple conditions using the AND operator. For example, to filter rows based on a specific PlanType and a range of MonthlyCharge values, you can do the following:

min_charge = 20
max_charge = 40
plan_to_filter = 'Premium'

# Using AND operator
filtered_df = df.query('@min_charge <= MonthlyCharge <= @max_charge and PlanType == @plan_to_filter')
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge
1           2  Premium             30
4           5  Premium             30

The output DataFrame contains rows where the MonthlyCharge is between 20 and 40 and the PlanType is ‘Premium’.

Using OR Operator

With the OR operator, you can filter rows that meet at least one of several conditions.

filtered_df = df.query('MonthlyCharge == @min_charge or PlanType == @plan_to_filter')
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge
1           2  Premium             30
4           5  Premium             30

The output DataFrame includes rows where either the MonthlyCharge is 20 or the PlanType is ‘Premium’.

Using NOT Operator

The NOT operator allows you to exclude rows that meet certain conditions.

filtered_df = df.query('not (MonthlyCharge > @min_charge and PlanType == @plan_to_filter)')
print(filtered_df)

Output:

   CustomerID     PlanType  MonthlyCharge
0           1        Basic             10
2           3        Basic             10
3           4  Enterprise             50



				
Leave a Reply

Your email address will not be published. Required fields are marked *