Add Row with Average in Pandas DataFrame
In this tutorial, we’ll explore step-by-step methods to add a row with average values in a Pandas DataFrame.
You will leverage key Pandas functions like mean()
and groupby()
to do this effectively.
Calculating the Average using mean()
Let’s start with the simplest and most straightforward approach—using the mean()
function.
import pandas as pd data = {'CustomerID': [1, 2, 3], 'MonthlyCharges': [70.20, 45.30, 89.90], 'TotalCharges': [492.15, 1450.60, 789.25]} df = pd.DataFrame(data) # Calculate the average using mean() average_values = df.mean() print(average_values)
Output:
CustomerID 2.000000 MonthlyCharges 68.466667 TotalCharges 910.666667 dtype: float64
The mean()
function calculates the average for each numerical column and returns a Pandas Series with these average values.
Add Average Row Using loc[]
You can use the loc[]
function to add a row with average values.
First, calculate the average of numerical columns:
average_values = df.select_dtypes(include=['number']).mean()
Now add this row:
df.loc['Average'] = average_values print(df)
Output:
CustomerID MonthlyCharges TotalCharges 0 1.0 70.200000 492.150000 1 2.0 45.300000 1450.600000 2 3.0 89.900000 789.250000 Average 2.0 68.466667 910.666667
Add Average Row Using concat()
You can use the concat()
function if you want to add a row with average values to your DataFrame.
First, let’s create a DataFrame containing the average values, which will then be concatenated to the original DataFrame.
# Calculate the average values for the DataFrame average_values = df.select_dtypes(include=['number']).mean() # Convert the Pandas Series to a DataFrame average_df = pd.DataFrame([average_values]) average_df['CustomerID'] = 'Average' print(average_df)
Output:
CustomerID MonthlyCharges TotalCharges 0 Average 68.466667 910.666667
Here, we’ve transformed the average values into a single-row DataFrame.
Now that you have a DataFrame with the average values, you can concatenate it with the original DataFrame.
concatenated_df = pd.concat([df, average_df], ignore_index=True) print(concatenated_df)
Output:
CustomerID MonthlyCharges TotalCharges 0 1.0 70.200000 492.150000 1 2.0 45.300000 1450.600000 2 3.0 89.900000 789.250000 3 2.0 68.466667 910.666667 4 Average 68.466667 910.666667
Adding Multiple Average Rows Based on Grouping
Pandas provides the groupby()
function to group your data, and you can then append these averages back to the original DataFrame for better insights.
First, let’s group the data based on CustomerID
.
grouped_df = df.groupby('CustomerID') print(grouped_df.size())
Output:
CustomerID 1.0 1 2.0 2 3.0 1 dtype: int64
The groupby()
function provides us with groups based on unique CustomerID
.
Now, you can calculate the average for each group:
# Calculate the mean for each group group_average = grouped_df.mean() print(group_average)
Output:
MonthlyCharges TotalCharges CustomerID 1.0 70.200000 492.150000 2.0 56.883333 1180.633333 3.0 89.900000 789.250000
Here, the mean values for MonthlyCharges
and TotalCharges
are calculated for each CustomerID
group.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.