Add Multiple Rows to Pandas DataFrame (Different Methods)

Adding a row to DataFrame is one of the common tasks while working with Pandas, but sometimes we want to add multiple rows to a DataFrame.

This tutorial will guide you through the process of doing just that using different methods.

 

 

Using loc[]

By assigning values to a new index using loc[], we can add rows to the bottom of our DataFrame.

Let’s first create a sample Dataframe:

import pandas as pd
data = {
    'Customer_ID': [1, 2, 3],
    'Monthly_Bill': [45.0, 55.0, 65.0],
    'Plan': ['Basic', 'Standard', 'Premium']
}
df = pd.DataFrame(data)
print(df)

Output:

   Customer_ID  Monthly_Bill      Plan
0            1          45.0     Basic
1            2          55.0  Standard
2            3          65.0   Premium

Next, let’s add more customer data using the loc[] method.

df.loc[3] = [4, 75.0, 'Deluxe']
df.loc[4] = [5, 85.0, 'Ultimate']
print(df)

Output:

   Customer_ID  Monthly_Bill      Plan
0            1          45.0     Basic
1            2          55.0  Standard
2            3          65.0   Premium
3            4          75.0    Deluxe
4            5          85.0  Ultimate

 

Using concat()

The pd.concat() function provides yet another robust way to add multiple rows to a DataFrame.

It is useful when you have to concatenate multiple DataFrames along a particular axis, handle unmatched columns, or even create multi-indexed DataFrames.

Basic Concatenation Along the Row Axis

To concatenate along the row axis, set the axis parameter to 0. This is also the default behavior.

data2 = {
    'Customer_ID': [10, 11],
    'Monthly_Bill': [135.0, 145.0],
    'Plan': ['National', 'Continental']
}
df2 = pd.DataFrame(data2)

# Concatenate along the row axis
result_df = pd.concat([df, df2], axis=0, ignore_index=True)
print(result_df)

Output:

   Customer_ID  Monthly_Bill         Plan
0            1          45.0        Basic
1            2          55.0     Standard
2            3          65.0      Premium
3            4          75.0       Deluxe
4            5          85.0     Ultimate
5           10         135.0     National
6           11         145.0  Continental

Handling Columns That Don’t Match Up

If the DataFrames have different columns, pd.concat() can still handle it gracefully.

# DataFrame with an extra column
data3 = {
    'Customer_ID': [12],
    'Monthly_Bill': [155.0],
    'Plan': ['Global Plus'],
    'Duration': [12]
}
df3 = pd.DataFrame(data3)

# Concatenating
result_df = pd.concat([df, df3], axis=0, ignore_index=True, sort=False)
print(result_df)

Output:

   Customer_ID  Monthly_Bill         Plan  Duration
0            1          45.0        Basic       NaN
1            2          55.0     Standard       NaN
2            3          65.0      Premium       NaN
3            4          75.0       Deluxe       NaN
4            5          85.0     Ultimate       NaN
5           12         155.0  Global Plus      12.0

Notice that for the rows from the original DataFrame where the new column “Duration” does not exist, Pandas fills it with NaN.

 

Adding Multiple Rows in a Specified Position (Between Rows)

You can insert rows at a specific position by slicing and concatenating DataFrames.

First, slice the DataFrame into two parts: one before the position where you want to add new rows, and one after.

# Original DataFrame
print("Original DataFrame:")
print(df)

# Slice DataFrame
df1 = df.iloc[:3]
df2 = df.iloc[3:]

Output:

Original DataFrame:
   Customer_ID  Monthly_Bill         Plan
0            1          45.0        Basic
1            2          55.0     Standard
2            3          65.0      Premium
3            4          75.0       Deluxe
4            5          85.0     Ultimate

Here, df1 contains the first three rows, and df2 contains the rest.

Now, create a DataFrame containing the rows you want to insert.

# DataFrame to insert
data_to_insert = {
    'Customer_ID': [15, 16],
    'Monthly_Bill': [100.0, 110.0],
    'Plan': ['Special', 'Special+']
}
df_to_insert = pd.DataFrame(data_to_insert)
print("DataFrame to Insert:")
print(df_to_insert)

Output:

DataFrame to Insert:
   Customer_ID  Monthly_Bill      Plan
0           15         100.0   Special
1           16         110.0  Special+

Finally, concatenate df1, the DataFrame to insert, and df2 in that order.

# Concatenate DataFrames
df = pd.concat([df1, df_to_insert, df2], ignore_index=True)
print("DataFrame After Insertion:")
print(df)

Output:

DataFrame After Insertion:
   Customer_ID  Monthly_Bill       Plan
0            1          45.0      Basic
1            2          55.0   Standard
2            3          65.0    Premium
3           15         100.0    Special
4           16         110.0   Special+
5            4          75.0     Deluxe
6            5          85.0   Ultimate

 

Performance Comparison

For performance comparison, we can use Python’s built-in time module to measure the elapsed time for each operation.

import pandas as pd
import numpy as np
import time
df = pd.DataFrame({'A': np.arange(1, 1000001), 'B': np.arange(1000001, 2000001)})
df_to_add = pd.DataFrame({'A': np.arange(2000001, 3000001), 'B': np.arange(3000001, 4000001)})

# Benchmark loc
start_time = time.time()
for _ in range(100):
    df.loc[len(df):len(df)+len(df_to_add)] = df_to_add
elapsed_time_loc = time.time() - start_time
print(f"Elapsed time for loc: {elapsed_time_loc:.4f} seconds")

# Benchmark concat
start_time = time.time()
for _ in range(100):
    pd.concat([df, df_to_add], ignore_index=True)
elapsed_time_concat = time.time() - start_time
print(f"Elapsed time for concat: {elapsed_time_concat:.4f} seconds")

Output:

Elapsed time for loc: 0.0394 seconds
Elapsed time for concat: 0.7206 seconds

In this example, you’ll notice that loc[] is more time-efficient than concat() for adding a large number of rows.

Leave a Reply

Your email address will not be published. Required fields are marked *