Get Single Value (Scalars) from Pandas query method

The query() method in Pandas allows you to filter DataFrame rows based on a query string.

However, what if you want to extract a specific value from the query result? This tutorial will guide you through the steps to do that.

 

 

Get a Single Value Using Row and Column Numbers

Suppose you have a sample DataFrame like this:

import pandas as pd
data = {'CustomerID': [1, 2, 3, 4, 5],
        'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica'],
        'Age': [28, 22, 35, 40, 18],
        'MonthlySpend': [200, 150, 100, 120, 300]}
df = pd.DataFrame(data)
print(df)

Output:

   CustomerID     Name  Age  MonthlySpend
0           1     John   28           200
1           2    Emily   22           150
2           3  Michael   35           100
3           4    Sarah   40           120
4           5  Jessica   18           300

Let’s say you want to find out the MonthlySpend of the customer whose CustomerID is 3.

You can use the query() method to filter this particular row, and then specify the row and column numbers to get the value:

value = df.query('CustomerID == 3').iloc[0, 3]
print(value)

Output:

100

The iloc[0, 3] specifies the row and column number to retrieve the single value after applying the query() method.

 

Using Brackets []

Let’s consider the same DataFrame used earlier.

If you want to know the Name of the customer who spends the most each month, you can perform this query:

max_spend_name = df.query('MonthlySpend == MonthlySpend.max()')['Name'].values[0]
print(max_spend_name)

Output:

Jessica

The query() method is used to filter the row where the MonthlySpend is maximum.

Then, the ['Name'].values[0] is used to extract the value from the Name column for that filtered row.

 

Using .at()

You can use the .at[] property when you need to get a single value from a DataFrame and you know the label of the row and the column.

Suppose you want to know the Age of the customer whose CustomerID is 2.

First, use the query() method to filter the row, and then use .at[] to get the value:

row_label = df.query('CustomerID == 2').index[0]
value = df.at[row_label, 'Age']
print(value)

Output:

22

The query() method filters the row first, and the .index[0] gets the index label of the filtered row.

 

Using .iat()

You can use the .iat[] property if you want to get a single value based on integer-based positions for rows and columns.

It’s useful when you’ve filtered the DataFrame with query() and you want to get a value based on its relative position in the new DataFrame slice.

To get the Name of the customer whose Age is 40, you can do the following:

filtered_df = df.query('Age == 40')
value = filtered_df.iat[0, 1]
print(value)

Output:

Sarah

The .iat[0, 1] specifies the row and column positions relative to the filtered DataFrame slice to retrieve the desired value.

 

Get Single Value from MultiIndex DataFrame

Let’s create a sample MultiIndex DataFrame that mimics a scenario where you have customer data and their monthly spend across different years:

import pandas as pd
arrays = [
    ['2019', '2019', '2020', '2020', '2021', '2021'],
    [1, 2, 1, 2, 1, 2],
]
index = pd.MultiIndex.from_arrays(arrays, names=('Year', 'CustomerID'))
data = {'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica', 'David'],
        'MonthlySpend': [200, 150, 100, 120, 300, 250]}
df = pd.DataFrame(data, index=index)
print(df)

Output:

                 Name  MonthlySpend
Year CustomerID                    
2019 1           John           200
     2          Emily           150
2020 1        Michael           100
     2          Sarah           120
2021 1        Jessica           300
     2          David           250

Here you see a MultiIndex DataFrame, where the indices are a combination of Year and CustomerID.

If you want to find the MonthlySpend for the customer with CustomerID 1 in the year 2020, you can use the following code:

row_label = df.query('CustomerID == 1 and Year == "2020"').index[0]
value = df.at[row_label, 'MonthlySpend']
print(value)

Output:

100

The .query() method filters the rows based on the specified conditions, and .at[] is used to get the scalar value.

 

Performance Comparison

To illustrate this, let’s time these methods using the Python timeit library on our example DataFrame:

import pandas as pd
import timeit
data = {'CustomerID': [1, 2, 3, 4, 5],
        'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica'],
        'Age': [28, 22, 35, 40, 18],
        'MonthlySpend': [200, 150, 100, 120, 300]}
df = pd.DataFrame(data)

def using_at():
    return df.at[2, 'Age']

def using_iat():
    return df.iat[2, 2]

def using_loc():
    return df.loc[2, 'Age']

def using_iloc():
    return df.iloc[2, 2]

# Measure time taken for each method
time_at = timeit.timeit(using_at, number=10000)
time_iat = timeit.timeit(using_iat, number=10000)
time_loc = timeit.timeit(using_loc, number=10000)
time_iloc = timeit.timeit(using_iloc, number=10000)

print(f".at: {time_at}")
print(f".iat: {time_iat}")
print(f".loc: {time_loc}")
print(f".iloc: {time_iloc}")

Output:

.at: 0.23045820009429008
.iat: 0.8481589999282733
.loc: 0.5063259000889957
.iloc: 0.9831687000114471

The output reveals the time taken for 10,000 runs of each method.

As you can see, .at[] and .loc[] are faster than .iat[] and .iloc[] for accessing single values.

Leave a Reply

Your email address will not be published. Required fields are marked *