Add Multiple Rows to Pandas DataFrame (Different Methods)
Adding a row to DataFrame is one of the common tasks while working with Pandas, but sometimes we want to add multiple rows to a DataFrame.
This tutorial will guide you through the process of doing just that using different methods.
Using loc[]
By assigning values to a new index using loc[]
, we can add rows to the bottom of our DataFrame.
Let’s first create a sample Dataframe:
import pandas as pd data = { 'Customer_ID': [1, 2, 3], 'Monthly_Bill': [45.0, 55.0, 65.0], 'Plan': ['Basic', 'Standard', 'Premium'] } df = pd.DataFrame(data) print(df)
Output:
Customer_ID Monthly_Bill Plan 0 1 45.0 Basic 1 2 55.0 Standard 2 3 65.0 Premium
Next, let’s add more customer data using the loc[]
method.
df.loc[3] = [4, 75.0, 'Deluxe'] df.loc[4] = [5, 85.0, 'Ultimate'] print(df)
Output:
Customer_ID Monthly_Bill Plan 0 1 45.0 Basic 1 2 55.0 Standard 2 3 65.0 Premium 3 4 75.0 Deluxe 4 5 85.0 Ultimate
Using concat()
The pd.concat()
function provides yet another robust way to add multiple rows to a DataFrame.
It is useful when you have to concatenate multiple DataFrames along a particular axis, handle unmatched columns, or even create multi-indexed DataFrames.
Basic Concatenation Along the Row Axis
To concatenate along the row axis, set the axis
parameter to 0. This is also the default behavior.
data2 = { 'Customer_ID': [10, 11], 'Monthly_Bill': [135.0, 145.0], 'Plan': ['National', 'Continental'] } df2 = pd.DataFrame(data2) # Concatenate along the row axis result_df = pd.concat([df, df2], axis=0, ignore_index=True) print(result_df)
Output:
Customer_ID Monthly_Bill Plan 0 1 45.0 Basic 1 2 55.0 Standard 2 3 65.0 Premium 3 4 75.0 Deluxe 4 5 85.0 Ultimate 5 10 135.0 National 6 11 145.0 Continental
Handling Columns That Don’t Match Up
If the DataFrames have different columns, pd.concat()
can still handle it gracefully.
# DataFrame with an extra column data3 = { 'Customer_ID': [12], 'Monthly_Bill': [155.0], 'Plan': ['Global Plus'], 'Duration': [12] } df3 = pd.DataFrame(data3) # Concatenating result_df = pd.concat([df, df3], axis=0, ignore_index=True, sort=False) print(result_df)
Output:
Customer_ID Monthly_Bill Plan Duration 0 1 45.0 Basic NaN 1 2 55.0 Standard NaN 2 3 65.0 Premium NaN 3 4 75.0 Deluxe NaN 4 5 85.0 Ultimate NaN 5 12 155.0 Global Plus 12.0
Notice that for the rows from the original DataFrame where the new column “Duration” does not exist, Pandas fills it with NaN
.
Adding Multiple Rows in a Specified Position (Between Rows)
You can insert rows at a specific position by slicing and concatenating DataFrames.
First, slice the DataFrame into two parts: one before the position where you want to add new rows, and one after.
# Original DataFrame print("Original DataFrame:") print(df) # Slice DataFrame df1 = df.iloc[:3] df2 = df.iloc[3:]
Output:
Original DataFrame: Customer_ID Monthly_Bill Plan 0 1 45.0 Basic 1 2 55.0 Standard 2 3 65.0 Premium 3 4 75.0 Deluxe 4 5 85.0 Ultimate
Here, df1
contains the first three rows, and df2
contains the rest.
Now, create a DataFrame containing the rows you want to insert.
# DataFrame to insert data_to_insert = { 'Customer_ID': [15, 16], 'Monthly_Bill': [100.0, 110.0], 'Plan': ['Special', 'Special+'] } df_to_insert = pd.DataFrame(data_to_insert) print("DataFrame to Insert:") print(df_to_insert)
Output:
DataFrame to Insert: Customer_ID Monthly_Bill Plan 0 15 100.0 Special 1 16 110.0 Special+
Finally, concatenate df1
, the DataFrame to insert, and df2
in that order.
# Concatenate DataFrames df = pd.concat([df1, df_to_insert, df2], ignore_index=True) print("DataFrame After Insertion:") print(df)
Output:
DataFrame After Insertion: Customer_ID Monthly_Bill Plan 0 1 45.0 Basic 1 2 55.0 Standard 2 3 65.0 Premium 3 15 100.0 Special 4 16 110.0 Special+ 5 4 75.0 Deluxe 6 5 85.0 Ultimate
Performance Comparison
For performance comparison, we can use Python’s built-in time
module to measure the elapsed time for each operation.
import pandas as pd import numpy as np import time df = pd.DataFrame({'A': np.arange(1, 1000001), 'B': np.arange(1000001, 2000001)}) df_to_add = pd.DataFrame({'A': np.arange(2000001, 3000001), 'B': np.arange(3000001, 4000001)}) # Benchmark loc start_time = time.time() for _ in range(100): df.loc[len(df):len(df)+len(df_to_add)] = df_to_add elapsed_time_loc = time.time() - start_time print(f"Elapsed time for loc: {elapsed_time_loc:.4f} seconds") # Benchmark concat start_time = time.time() for _ in range(100): pd.concat([df, df_to_add], ignore_index=True) elapsed_time_concat = time.time() - start_time print(f"Elapsed time for concat: {elapsed_time_concat:.4f} seconds")
Output:
Elapsed time for loc: 0.0394 seconds Elapsed time for concat: 0.7206 seconds
In this example, you’ll notice that loc[]
is more time-efficient than concat()
for adding a large number of rows.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.