Get Single Value (Scalars) from Pandas query method
The query()
method in Pandas allows you to filter DataFrame rows based on a query string.
However, what if you want to extract a specific value from the query result? This tutorial will guide you through the steps to do that.
Get a Single Value Using Row and Column Numbers
Suppose you have a sample DataFrame like this:
import pandas as pd data = {'CustomerID': [1, 2, 3, 4, 5], 'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica'], 'Age': [28, 22, 35, 40, 18], 'MonthlySpend': [200, 150, 100, 120, 300]} df = pd.DataFrame(data) print(df)
Output:
CustomerID Name Age MonthlySpend 0 1 John 28 200 1 2 Emily 22 150 2 3 Michael 35 100 3 4 Sarah 40 120 4 5 Jessica 18 300
Let’s say you want to find out the MonthlySpend
of the customer whose CustomerID
is 3.
You can use the query()
method to filter this particular row, and then specify the row and column numbers to get the value:
value = df.query('CustomerID == 3').iloc[0, 3] print(value)
Output:
100
The iloc[0, 3]
specifies the row and column number to retrieve the single value after applying the query()
method.
Using Brackets []
Let’s consider the same DataFrame used earlier.
If you want to know the Name
of the customer who spends the most each month, you can perform this query:
max_spend_name = df.query('MonthlySpend == MonthlySpend.max()')['Name'].values[0] print(max_spend_name)
Output:
Jessica
The query()
method is used to filter the row where the MonthlySpend
is maximum.
Then, the ['Name'].values[0]
is used to extract the value from the Name
column for that filtered row.
Using .at()
You can use the .at[]
property when you need to get a single value from a DataFrame and you know the label of the row and the column.
Suppose you want to know the Age
of the customer whose CustomerID
is 2.
First, use the query()
method to filter the row, and then use .at[]
to get the value:
row_label = df.query('CustomerID == 2').index[0] value = df.at[row_label, 'Age'] print(value)
Output:
22
The query()
method filters the row first, and the .index[0]
gets the index label of the filtered row.
Using .iat()
You can use the .iat[]
property if you want to get a single value based on integer-based positions for rows and columns.
It’s useful when you’ve filtered the DataFrame with query()
and you want to get a value based on its relative position in the new DataFrame slice.
To get the Name
of the customer whose Age
is 40, you can do the following:
filtered_df = df.query('Age == 40') value = filtered_df.iat[0, 1] print(value)
Output:
Sarah
The .iat[0, 1]
specifies the row and column positions relative to the filtered DataFrame slice to retrieve the desired value.
Get Single Value from MultiIndex DataFrame
Let’s create a sample MultiIndex DataFrame that mimics a scenario where you have customer data and their monthly spend across different years:
import pandas as pd arrays = [ ['2019', '2019', '2020', '2020', '2021', '2021'], [1, 2, 1, 2, 1, 2], ] index = pd.MultiIndex.from_arrays(arrays, names=('Year', 'CustomerID')) data = {'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica', 'David'], 'MonthlySpend': [200, 150, 100, 120, 300, 250]} df = pd.DataFrame(data, index=index) print(df)
Output:
Name MonthlySpend Year CustomerID 2019 1 John 200 2 Emily 150 2020 1 Michael 100 2 Sarah 120 2021 1 Jessica 300 2 David 250
Here you see a MultiIndex DataFrame, where the indices are a combination of Year
and CustomerID
.
If you want to find the MonthlySpend
for the customer with CustomerID
1 in the year 2020, you can use the following code:
row_label = df.query('CustomerID == 1 and Year == "2020"').index[0] value = df.at[row_label, 'MonthlySpend'] print(value)
Output:
100
The .query()
method filters the rows based on the specified conditions, and .at[]
is used to get the scalar value.
Performance Comparison
To illustrate this, let’s time these methods using the Python timeit
library on our example DataFrame:
import pandas as pd import timeit data = {'CustomerID': [1, 2, 3, 4, 5], 'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica'], 'Age': [28, 22, 35, 40, 18], 'MonthlySpend': [200, 150, 100, 120, 300]} df = pd.DataFrame(data) def using_at(): return df.at[2, 'Age'] def using_iat(): return df.iat[2, 2] def using_loc(): return df.loc[2, 'Age'] def using_iloc(): return df.iloc[2, 2] # Measure time taken for each method time_at = timeit.timeit(using_at, number=10000) time_iat = timeit.timeit(using_iat, number=10000) time_loc = timeit.timeit(using_loc, number=10000) time_iloc = timeit.timeit(using_iloc, number=10000) print(f".at: {time_at}") print(f".iat: {time_iat}") print(f".loc: {time_loc}") print(f".iloc: {time_iloc}")
Output:
.at: 0.23045820009429008 .iat: 0.8481589999282733 .loc: 0.5063259000889957 .iloc: 0.9831687000114471
The output reveals the time taken for 10,000 runs of each method.
As you can see, .at[]
and .loc[]
are faster than .iat[]
and .iloc[]
for accessing single values.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.