Add Total Row To Pandas DataFrame (Data Summarization)

Sometimes, you need to add a summary or total row to your Pandas DataFrame for better data visualization or for generating reports.

This tutorial will walk you through how to do that.

 

 

Calculating Totals for Each Column

First, let’s create a sample DataFrame to work with:

import pandas as pd
data = {
    'Plan_Type': ['Basic', 'Premium', 'Pro'],
    'Monthly_Fee': [30, 50, 100],
    'Subscribers': [200, 150, 50]
}
df = pd.DataFrame(data)
print(df)

Output:

  Plan_Type  Monthly_Fee  Subscribers
0     Basic           30          200
1   Premium           50          150
2       Pro          100           50

Now, let’s calculate the totals for the numerical columns.

totals = df.sum(numeric_only=True)
print(totals)

Output:

Monthly_Fee     180
Subscribers     400
dtype: int64

This operation sums up each numerical column and stores the result in a Pandas Series object.

 

Adding  Totals Row using loc[]

After calculating the totals for each numerical column, you can add these totals as a new row in the DataFrame.

The DataFrame.loc[] property allows you to access a group of rows and columns by label(s) or a boolean array.

Here’s how you can add a new row containing the calculated totals using loc:

df.loc['Total'] = pd.Series(totals)
print(df)

Output:

        Plan_Type  Monthly_Fee  Subscribers
0           Basic         30.0        200.0
1         Premium         50.0        150.0
2             Pro        100.0         50.0
Total         NaN        180.0        400.0

 

Only Totaling Specific Columns

There may be instances when you only want to total specific columns and not every numerical column in the DataFrame.

For example, you might want to sum the ‘Subscribers’ column, but not the ‘Monthly_Fee’.

First, let’s get back our original DataFrame without the ‘Total’ row:

# Remove the 'Total' row to get back the original DataFrame
df = df.drop('Total')

Output:

  Plan_Type  Monthly_Fee  Subscribers
0     Basic           30          200
1   Premium           50          150
2       Pro          100           50

Now, calculate the total for just the ‘Subscribers’ column:

total_subscribers = df['Subscribers'].sum()
print(total_subscribers)

Output:

400

Next, add this total only to the ‘Subscribers’ column in a new row:

df.loc['Total', 'Subscribers'] = total_subscribers
print(df)

Output:

  Plan_Type  Monthly_Fee  Subscribers
0     Basic         30.0        200.0
1   Premium         50.0        150.0
2       Pro        100.0         50.0
Total    NaN          NaN        400.0

 

Totaling Columns of a Specific Data Type

If you want to perform totals only on columns of a specific data type, you can do so by first filtering those columns.

Let’s reset our DataFrame to remove the ‘Total’ row:

df = df.drop('Total')

Output:

  Plan_Type  Monthly_Fee  Subscribers
0     Basic         30.0        200.0
1   Premium         50.0        150.0
2       Pro        100.0         50.0

Now, we’ll calculate and add the total row, but this time only for columns with numeric data types:

totals = df.select_dtypes(include=['number']).sum()

# Add a new row with these totals
df.loc['Total'] = pd.Series(totals)
print(df)

Output:

        Plan_Type  Monthly_Fee  Subscribers
0           Basic         30.0        200.0
1         Premium         50.0        150.0
2             Pro        100.0         50.0
Total         NaN        180.0        400.0

By using the select_dtypes method, you filter only the numeric columns (‘Monthly_Fee’ and ‘Subscribers’ in this case) for the sum operation.

 

Skipping Specific Columns From Totals

You can use the select_dtypes method by specifying which data types to exclude.

Let’s reset our DataFrame once again to remove the ‘Total’ row:

df = df.drop('Total')

Output:

  Plan_Type  Monthly_Fee  Subscribers
0     Basic         30.0        200.0
1   Premium         50.0        150.0
2       Pro        100.0         50.0

Now, let’s calculate the total row, but this time we’ll skip the columns that have string or date data types:

totals = df.select_dtypes(exclude=['object', 'datetime']).sum()

# Add a new row with these totals
df.loc['Total'] = pd.Series(totals)
print(df)

Output:

        Plan_Type  Monthly_Fee  Subscribers
0           Basic         30.0        200.0
1         Premium         50.0        150.0
2             Pro        100.0         50.0
Total         NaN        180.0        400.0

 

Totaling by Groups if the DataFrame is Grouped

Grouping data is a commonly performed operation for segmenting a DataFrame into categories and applying a function like sum to each group.

Pandas offers robust capabilities for this through its groupby function. Let’s see how you can calculate totals for each group in a DataFrame.

First, we’ll create a sample DataFrame that includes a ‘Region’ column to represent different geographic areas.

# Sample DataFrame with 'Region' column
data = {
    'Region': ['West', 'West', 'East', 'East'],
    'Plan_Type': ['Basic', 'Premium', 'Basic', 'Pro'],
    'Monthly_Fee': [30, 50, 20, 100],
    'Subscribers': [200, 150, 100, 50]
}
df = pd.DataFrame(data)
print(df)

Output:

  Region Plan_Type  Monthly_Fee  Subscribers
0   West     Basic           30          200
1   West   Premium           50          150
2   East     Basic           20          100
3   East       Pro          100           50

Now, let’s group this DataFrame by the ‘Region’ column and calculate the totals for each group:

# Group by 'Region' and calculate the sum for each group
grouped_totals = df.groupby('Region').sum()
print(grouped_totals)

Output:

        Monthly_Fee  Subscribers
Region                          
East            120          150
West             80          350

Here, the DataFrame is grouped by the ‘Region’ column, and the sum for each numeric column (‘Monthly_Fee’ and ‘Subscribers’) is calculated within each group.

The resulting DataFrame, grouped_totals, shows these summed values.

You can append these totals back to the original DataFrame to have a complete view of individual and grouped totals:

# Append the grouped totals back to the original DataFrame
final_df = pd.concat([df, grouped_totals.reset_index().assign(Plan_Type='Total')], ignore_index=True)
print(final_df)

Output:

  Region Plan_Type  Monthly_Fee  Subscribers
0   West     Basic           30          200
1   West   Premium           50          150
2   East     Basic           20          100
3   East       Pro          100           50
4   East     Total          120          150
5   West     Total           80          350

In this final DataFrame, rows labeled as ‘Total’ represent the sum of ‘Monthly_Fee’ and ‘Subscribers’ for each ‘Region’.

 

Totals with a MultiIndex DataFrame

Calculating totals for MultiIndex DataFrames can be slightly different than for regular DataFrames.

First, let’s create a MultiIndex DataFrame:

arrays = [
    ['East', 'East', 'West', 'West'],
    ['Basic', 'Pro', 'Basic', 'Premium']
]
index = pd.MultiIndex.from_arrays(arrays, names=('Region', 'Plan_Type'))
df = pd.DataFrame({
    'Monthly_Fee': [20, 100, 30, 50],
    'Subscribers': [100, 50, 200, 150]
}, index=index)
print(df)

Output:

                Monthly_Fee  Subscribers
Region Plan_Type                       
East   Basic             20          100
       Pro              100           50
West   Basic             30          200
       Premium           50          150

Now, you may want to add totals for each ‘Region’. You can do this using groupby and sum as you would with a single-index DataFrame:

# Calculate the total for each 'Region'
grouped_totals = df.groupby('Region').sum()
print(grouped_totals)

Output:

        Monthly_Fee  Subscribers
Region                          
East            120          150
West             80          350

You can create a new DataFrame for the grouped totals and then concatenate it with the original DataFrame using the pd.concat() function.

totals_df = pd.DataFrame(grouped_totals, columns=df.columns)
totals_df.index = pd.MultiIndex.from_tuples([(region, 'Total') for region in grouped_totals.index])

# Concatenate the original DataFrame and the grouped totals DataFrame
final_df = pd.concat([df, totals_df])
print(final_df)

Output:

                Monthly_Fee  Subscribers
Region Plan_Type                       
East   Basic             20          100
       Pro              100           50
West   Basic             30          200
       Premium           50          150
East   Total            120          150
West   Total             80          350

The resulting final_df DataFrame includes a total row for each ‘Region’, under the ‘Plan_Type’ labeled as ‘Total’.

Leave a Reply

Your email address will not be published. Required fields are marked *