Adding Rows to Empty Pandas DataFrame

This tutorial aims to guide you through the simple process of adding rows to an empty Pandas DataFrame.

You will learn how to add rows using loc property, concat method, and we’ll perform a benchmark test between them to see which one is faster.

 

 

Adding Rows to Empty DataFrame Using loc

Let’s start by creating an empty DataFrame with specified columns.

import pandas as pd

# Define columns for our DataFrame
columns = ['CustomerID', 'Name', 'Plan', 'MonthlyCharge']

# Create an empty DataFrame
df = pd.DataFrame(columns=columns)
print(df)

Output:

Empty DataFrame
Columns: [CustomerID, Name, Plan, MonthlyCharge]
Index: []

You’ll observe that we have successfully created an empty DataFrame with the specified columns.

To add a single row using the loc property, specify the index (or label) and then provide the row data as a list or a dictionary.

# Adding a single row using loc
df.loc[0] = [101, 'John Doe', 'Premium', 45.99]
print(df)

Output:

   CustomerID      Name     Plan  MonthlyCharge
0         101  John Doe  Premium          45.99

Notice that a row has been added to the DataFrame with the provided details.

If you have multiple rows to add, you can use a loop. Here’s how you can add multiple rows using the loc property:

# Sample data to add
rows = [
    [102, 'Jane Smith', 'Standard', 25.99],
    [103, 'Robert Brown', 'Basic', 15.99]
]

for idx, row in enumerate(rows, start=1):
    df.loc[idx] = row
print(df)

Output:

   CustomerID         Name      Plan  MonthlyCharge
0         101     John Doe   Premium          45.99
1         102   Jane Smith  Standard          25.99
2         103  Robert Brown    Basic          15.99

As illustrated above, we’ve successfully added multiple rows to our DataFrame using the loc property.

 

Using concat method

Let’s start by initializing an empty DataFrame with desired column names:

empty_df = pd.DataFrame(columns=['CustomerID', 'Name', 'Plan', 'MonthlyCharge'])
print(empty_df)

Output:

Empty DataFrame
Columns: [CustomerID, Name, Plan, MonthlyCharge]
Index: []

We’ve successfully created an empty DataFrame with our desired columns.

Now, concatenate some data to this empty DataFrame:

data_df = pd.DataFrame({
    'CustomerID': [112, 113],
    'Name': ['Thomas Green', 'Natalie White'],
    'Plan': ['Standard', 'Basic'],
    'MonthlyCharge': [30.99, 20.99]
})

# Concatenate the data onto the empty DataFrame
result_df = pd.concat([empty_df, data_df], ignore_index=True)
print(result_df)

Output:

   CustomerID          Name      Plan  MonthlyCharge
0         112  Thomas Green  Standard          30.99
1         113  Natalie White    Basic          20.99

Even though our starting DataFrame was empty, the concat function seamlessly added the new data from data_df onto it.

 

Performance Difference between loc and concat

Let’s create a scenario where we’re adding multiple rows to a DataFrame, using each method. We’ll make use of Python’s built-in timeit library to measure the execution time.

import timeit
import pandas as pd

# Sample row data
sample_data = {'CustomerID': 114, 'Name': 'Sample User', 'Plan': 'Basic', 'MonthlyCharge': 20.99}

def add_using_loc():
    df = pd.DataFrame(columns=['CustomerID', 'Name', 'Plan', 'MonthlyCharge'])
    for _ in range(1000):
        df.loc[len(df)] = list(sample_data.values())

def add_using_concat():
    df = pd.DataFrame(columns=['CustomerID', 'Name', 'Plan', 'MonthlyCharge'])
    for _ in range(1000):
        df = pd.concat([df, pd.DataFrame([sample_data])], ignore_index=True)

# Measure the execution time
loc_time = timeit.timeit(add_using_loc, number=10)
concat_time = timeit.timeit(add_using_concat, number=10)

print(f"Time taken using loc: {loc_time:.4f} seconds")
print(f"Time taken using concat: {concat_time:.4f} seconds")

Output:

Time taken using loc: 13.0261 seconds
Time taken using concat: 8.4665 seconds

As you can see, the concat method is faster to add rows to an empty DataFrame.

 

Adding Rows to Empty MultiIndex DataFrame

Let’s start by setting up a MultiIndex DataFrame with two levels of indices – let’s say ‘Region’ and ‘CustomerID’, with columns ‘Name’, ‘Plan’, and ‘MonthlyCharge’.

# Set up multi-level columns
arrays = [['North', 'North', 'South', 'South'],
          [115, 116, 117, 118]]

index = pd.MultiIndex.from_arrays(arrays, names=('Region', 'CustomerID'))

# Create an empty MultiIndex DataFrame
multi_df = pd.DataFrame(columns=['Name', 'Plan', 'MonthlyCharge'], index=index)
print(multi_df)

Output:

                 Name Plan MonthlyCharge
Region CustomerID                      
North  115       NaN  NaN           NaN
       116       NaN  NaN           NaN
South  117       NaN  NaN           NaN
       118       NaN  NaN           NaN

We’ve set up an empty MultiIndex DataFrame ready for populating.

You can use the loc property to add rows to our MultiIndex DataFrame:

multi_df.loc[('North', 115), :] = ['Alice Brown', 'Basic', 20.99]
multi_df.loc[('North', 116), :] = ['Bob White', 'Premium', 45.99]
multi_df.loc[('South', 117), :] = ['Charlie Black', 'Basic', 18.99]
multi_df.loc[('South', 118), :] = ['David Green', 'Standard', 28.99]
print(multi_df)

Output:

                      Name      Plan MonthlyCharge
Region CustomerID                                
North  115     Alice Brown     Basic         20.99
       116       Bob White   Premium         45.99
South  117   Charlie Black     Basic         18.99
       118     David Green  Standard         28.99

With the help of the loc property, rows were filled efficiently. By specifying the MultiIndex values, we could directly target the location we wanted to fill in.

Leave a Reply

Your email address will not be published. Required fields are marked *