Add Total Row To Pandas DataFrame (Data Summarization)
Sometimes, you need to add a summary or total row to your Pandas DataFrame for better data visualization or for generating reports.
This tutorial will walk you through how to do that.
Calculating Totals for Each Column
First, let’s create a sample DataFrame to work with:
import pandas as pd data = { 'Plan_Type': ['Basic', 'Premium', 'Pro'], 'Monthly_Fee': [30, 50, 100], 'Subscribers': [200, 150, 50] } df = pd.DataFrame(data) print(df)
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30 200 1 Premium 50 150 2 Pro 100 50
Now, let’s calculate the totals for the numerical columns.
totals = df.sum(numeric_only=True) print(totals)
Output:
Monthly_Fee 180 Subscribers 400 dtype: int64
This operation sums up each numerical column and stores the result in a Pandas Series object.
Adding Totals Row using loc[]
After calculating the totals for each numerical column, you can add these totals as a new row in the DataFrame.
The DataFrame.loc[]
property allows you to access a group of rows and columns by label(s) or a boolean array.
Here’s how you can add a new row containing the calculated totals using loc
:
df.loc['Total'] = pd.Series(totals) print(df)
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30.0 200.0 1 Premium 50.0 150.0 2 Pro 100.0 50.0 Total NaN 180.0 400.0
Only Totaling Specific Columns
There may be instances when you only want to total specific columns and not every numerical column in the DataFrame.
For example, you might want to sum the ‘Subscribers’ column, but not the ‘Monthly_Fee’.
First, let’s get back our original DataFrame without the ‘Total’ row:
# Remove the 'Total' row to get back the original DataFrame df = df.drop('Total')
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30 200 1 Premium 50 150 2 Pro 100 50
Now, calculate the total for just the ‘Subscribers’ column:
total_subscribers = df['Subscribers'].sum() print(total_subscribers)
Output:
400
Next, add this total only to the ‘Subscribers’ column in a new row:
df.loc['Total', 'Subscribers'] = total_subscribers print(df)
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30.0 200.0 1 Premium 50.0 150.0 2 Pro 100.0 50.0 Total NaN NaN 400.0
Totaling Columns of a Specific Data Type
If you want to perform totals only on columns of a specific data type, you can do so by first filtering those columns.
Let’s reset our DataFrame to remove the ‘Total’ row:
df = df.drop('Total')
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30.0 200.0 1 Premium 50.0 150.0 2 Pro 100.0 50.0
Now, we’ll calculate and add the total row, but this time only for columns with numeric data types:
totals = df.select_dtypes(include=['number']).sum() # Add a new row with these totals df.loc['Total'] = pd.Series(totals) print(df)
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30.0 200.0 1 Premium 50.0 150.0 2 Pro 100.0 50.0 Total NaN 180.0 400.0
By using the select_dtypes
method, you filter only the numeric columns (‘Monthly_Fee’ and ‘Subscribers’ in this case) for the sum operation.
Skipping Specific Columns From Totals
You can use the select_dtypes
method by specifying which data types to exclude.
Let’s reset our DataFrame once again to remove the ‘Total’ row:
df = df.drop('Total')
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30.0 200.0 1 Premium 50.0 150.0 2 Pro 100.0 50.0
Now, let’s calculate the total row, but this time we’ll skip the columns that have string or date data types:
totals = df.select_dtypes(exclude=['object', 'datetime']).sum() # Add a new row with these totals df.loc['Total'] = pd.Series(totals) print(df)
Output:
Plan_Type Monthly_Fee Subscribers 0 Basic 30.0 200.0 1 Premium 50.0 150.0 2 Pro 100.0 50.0 Total NaN 180.0 400.0
Totaling by Groups if the DataFrame is Grouped
Grouping data is a commonly performed operation for segmenting a DataFrame into categories and applying a function like sum
to each group.
Pandas offers robust capabilities for this through its groupby
function. Let’s see how you can calculate totals for each group in a DataFrame.
First, we’ll create a sample DataFrame that includes a ‘Region’ column to represent different geographic areas.
# Sample DataFrame with 'Region' column data = { 'Region': ['West', 'West', 'East', 'East'], 'Plan_Type': ['Basic', 'Premium', 'Basic', 'Pro'], 'Monthly_Fee': [30, 50, 20, 100], 'Subscribers': [200, 150, 100, 50] } df = pd.DataFrame(data) print(df)
Output:
Region Plan_Type Monthly_Fee Subscribers 0 West Basic 30 200 1 West Premium 50 150 2 East Basic 20 100 3 East Pro 100 50
Now, let’s group this DataFrame by the ‘Region’ column and calculate the totals for each group:
# Group by 'Region' and calculate the sum for each group grouped_totals = df.groupby('Region').sum() print(grouped_totals)
Output:
Monthly_Fee Subscribers Region East 120 150 West 80 350
Here, the DataFrame is grouped by the ‘Region’ column, and the sum for each numeric column (‘Monthly_Fee’ and ‘Subscribers’) is calculated within each group.
The resulting DataFrame, grouped_totals
, shows these summed values.
You can append these totals back to the original DataFrame to have a complete view of individual and grouped totals:
# Append the grouped totals back to the original DataFrame final_df = pd.concat([df, grouped_totals.reset_index().assign(Plan_Type='Total')], ignore_index=True) print(final_df)
Output:
Region Plan_Type Monthly_Fee Subscribers 0 West Basic 30 200 1 West Premium 50 150 2 East Basic 20 100 3 East Pro 100 50 4 East Total 120 150 5 West Total 80 350
In this final DataFrame, rows labeled as ‘Total’ represent the sum of ‘Monthly_Fee’ and ‘Subscribers’ for each ‘Region’.
Totals with a MultiIndex DataFrame
Calculating totals for MultiIndex DataFrames can be slightly different than for regular DataFrames.
First, let’s create a MultiIndex DataFrame:
arrays = [ ['East', 'East', 'West', 'West'], ['Basic', 'Pro', 'Basic', 'Premium'] ] index = pd.MultiIndex.from_arrays(arrays, names=('Region', 'Plan_Type')) df = pd.DataFrame({ 'Monthly_Fee': [20, 100, 30, 50], 'Subscribers': [100, 50, 200, 150] }, index=index) print(df)
Output:
Monthly_Fee Subscribers Region Plan_Type East Basic 20 100 Pro 100 50 West Basic 30 200 Premium 50 150
Now, you may want to add totals for each ‘Region’. You can do this using groupby
and sum
as you would with a single-index DataFrame:
# Calculate the total for each 'Region' grouped_totals = df.groupby('Region').sum() print(grouped_totals)
Output:
Monthly_Fee Subscribers Region East 120 150 West 80 350
You can create a new DataFrame for the grouped totals and then concatenate it with the original DataFrame using the pd.concat()
function.
totals_df = pd.DataFrame(grouped_totals, columns=df.columns) totals_df.index = pd.MultiIndex.from_tuples([(region, 'Total') for region in grouped_totals.index]) # Concatenate the original DataFrame and the grouped totals DataFrame final_df = pd.concat([df, totals_df]) print(final_df)
Output:
Monthly_Fee Subscribers Region Plan_Type East Basic 20 100 Pro 100 50 West Basic 30 200 Premium 50 150 East Total 120 150 West Total 80 350
The resulting final_df
DataFrame includes a total row for each ‘Region’, under the ‘Plan_Type’ labeled as ‘Total’.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.