Add Rows to Pandas MultiIndex DataFrame
In this tutorial, you will learn various methods to add rows to a Pandas MultiIndex DataFrame.
We will use real-world sample data to demonstrate the process step-by-step.
Using loc[]
First, create a sample MultiIndex DataFrame:
import pandas as pd arrays = [ ['2020-Q1', '2020-Q1', '2020-Q2', '2020-Q2'], ['AT&T', 'Verizon', 'AT&T', 'Verizon'] ] index = pd.MultiIndex.from_arrays(arrays, names=('Quarter', 'Company')) data = {'Revenue': [10, 12, 11, 14], 'Expenses': [5, 4, 6, 7]} df = pd.DataFrame(data, index=index) print(df)
Output:
Revenue Expenses Quarter Company 2020-Q1 AT&T 10 5 Verizon 12 4 2020-Q2 AT&T 11 6 Verizon 14 7
Now, you can add rows using the loc[]
method. Here’s how:
# Add a row for AT&T in '2020-Q3' new_row = pd.Series({'Revenue': 9, 'Expenses': 5}, name=('2020-Q3', 'AT&T')) df.loc[("2020-Q3", "AT&T"), :] = new_row # Add another row for Verizon in '2020-Q3' new_row2 = pd.Series({'Revenue': 13, 'Expenses': 6}, name=('2020-Q3', 'Verizon')) df.loc[("2020-Q3", "Verizon"), :] = new_row2 # Use .reindex to add the new index values as empty rows df = df.reindex(df.index.union([("2020-Q3", "AT&T"), ("2020-Q3", "Verizon")])) print(df)
Output:
Revenue Expenses Quarter Company 2020-Q1 AT&T 10.0 5.0 Verizon 12.0 4.0 2020-Q2 AT&T 11.0 6.0 Verizon 14.0 7.0 2020-Q3 AT&T 9.0 5.0 Verizon 13.0 6.0
First, we insert the index first using .reindex
, then assign the values.
We use .loc
to select those rows by index and assign the data.
Using concat()
The concat()
method is useful when you have to concatenate along an axis—either rows or columns.
Add Single Row
Here’s how you can use concat()
to add a single row for Verizon in ‘2021-Q1’:
# Create a new row as a DataFrame new_row = pd.DataFrame({'Revenue': [18], 'Expenses': [9]}, index=pd.MultiIndex.from_tuples([('2021-Q1', 'Verizon')], names=('Quarter', 'Company'))) # Concatenate the new row df = pd.concat([df, new_row]) print(df)
Output:
Revenue Expenses Quarter Company 2020-Q1 AT&T 10.0 5.0 Verizon 12.0 4.0 2020-Q2 AT&T 11.0 6.0 Verizon 14.0 7.0 2020-Q3 AT&T 9.0 5.0 Verizon 13.0 6.0 2021-Q1 Verizon 18.0 9.0
Add Multiple Rows
To concatenate multiple rows, you’ll create a DataFrame containing these rows and then use concat()
:
# Create new rows as a DataFrame new_rows = pd.DataFrame({'Revenue': [19, 20], 'Expenses': [10, 11]}, index=pd.MultiIndex.from_tuples([('2021-Q1', 'T-Mobile'), ('2021-Q2', 'AT&T')], names=('Quarter', 'Company'))) # Concatenate the new rows df = pd.concat([df, new_rows]) print(df)
Output:
Revenue Expenses Quarter Company 2020-Q1 AT&T 10.0 5.0 Verizon 12.0 4.0 2020-Q2 AT&T 11.0 6.0 Verizon 14.0 7.0 2020-Q3 AT&T 9.0 5.0 Verizon 13.0 6.0 2021-Q1 Verizon 18.0 9.0 T-Mobile 19.0 10.0 2021-Q2 AT&T 20.0 11.0
Adding Rows with Different Levels
Let’s first create a MultiIndex DataFrame that has an additional level called ‘Region’.
# Create sample MultiIndex DataFrame using telecom data with an additional level arrays = [ ['2020-Q1', '2020-Q1', '2020-Q2', '2020-Q2'], ['AT&T', 'Verizon', 'AT&T', 'Verizon'], ['East', 'West', 'East', 'West'] ] index = pd.MultiIndex.from_arrays(arrays, names=('Quarter', 'Company', 'Region')) data = {'Revenue': [10, 12, 11, 14], 'Expenses': [5, 4, 6, 7]} df_multi_level = pd.DataFrame(data, index=index) print(df_multi_level)
Output:
Revenue Expenses Quarter Company Region 2020-Q1 AT&T East 10 5 Verizon West 12 4 2020-Q2 AT&T East 11 6 Verizon West 14 7
If you want to add a row that doesn’t have all the index levels, you’ll need to manually create a MultiIndex with the missing level set to None
or some placeholder.
# Create custom index with a missing level custom_index = pd.MultiIndex.from_tuples([('2020-Q3', 'AT&T', None)], names=('Quarter', 'Company', 'Region')) # Create a DataFrame with the custom index custom_row = pd.DataFrame({'Revenue': [9], 'Expenses': [5]}, index=custom_index) # Append the custom row to the original DataFrame df_multi_level = df_multi_level.append(custom_row) print(df_multi_level)
Output:
Revenue Expenses Quarter Company Region 2020-Q1 AT&T East 10 5 Verizon West 12 4 2020-Q2 AT&T East 11 6 Verizon West 14 7 2020-Q3 AT&T NaN 9 5
In this example, we added a row for AT&T in ‘2020-Q3’ with a missing ‘Region’.
Benchmarking Row Addition Methods
To set up the benchmark, let’s first create a large MultiIndex DataFrame with 100,000 rows.
import pandas as pd import numpy as np # Set row length n = 100_000 # Create arrays for MultiIndex quarters = ['2020-Q1', '2020-Q2', '2020-Q3', '2020-Q4'] * 25_000 companies = ['AT&T', 'Verizon', 'T-Mobile'] * 33_333 + ['AT&T'] # Create MultiIndex index = pd.MultiIndex.from_arrays([quarters, companies], names=['Quarter', 'Company']) # Create DataFrame df_large = pd.DataFrame({ 'Revenue': np.random.randint(1, 20, n), 'Expenses': np.random.randint(1, 10, n) }, index=index) print(df_large.head())
Output:
Revenue Expenses Quarter Company 2020-Q1 AT&T 19 9 2020-Q2 Verizon 15 4 2020-Q3 T-Mobile 16 9 2020-Q4 AT&T 16 1 2020-Q1 Verizon 2 7
Benchmark loc[]
To add a row using loc[]
, let’s time the operation.
start_time = time.time() df_large_sorted = df_large.sort_index() # Sort the DataFrame by the MultiIndex for _ in range(n): row = pd.Series({ 'Revenue': np.random.randint(1, 20), 'Expenses': np.random.randint(1, 10) }, name=('2021-Q1', 'Google')) df_large_sorted.loc[row.name] = row loc_time = time.time() - start_time df_loc = df_large_sorted.sort_index() print(f"Time taken to add rows using loc: {loc_time} seconds")
Benchmark concat()
Lastly, let’s benchmark the concat()
method.
start_time = time.time() new_rows = pd.DataFrame({ 'Revenue': np.random.randint(1, 20, n), 'Expenses': np.random.randint(1, 10, n) }, index=index) df_concat = pd.concat([df_large, new_rows]) concat_time = time.time() - start_time print(f"Time taken to add rows using concat: {concat_time} seconds")
Output:
Time taken to add rows using concat: 0.003000497817993164 seconds Time taken to add rows using loc: 1.3989920616149902 seconds
The concat
method is faster than using loc
to add rows to a multiIndex DataFrame, especially when dealing with a large number of rows.
The reason is that concat
operates on entire DataFrames at once, whereas loc
adds rows one by one in a loop.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.