Adding Rows to Empty Pandas DataFrame
This tutorial aims to guide you through the simple process of adding rows to an empty Pandas DataFrame.
You will learn how to add rows using loc
property, concat
method, and we’ll perform a benchmark test between them to see which one is faster.
Adding Rows to Empty DataFrame Using loc
Let’s start by creating an empty DataFrame with specified columns.
import pandas as pd # Define columns for our DataFrame columns = ['CustomerID', 'Name', 'Plan', 'MonthlyCharge'] # Create an empty DataFrame df = pd.DataFrame(columns=columns) print(df)
Output:
Empty DataFrame Columns: [CustomerID, Name, Plan, MonthlyCharge] Index: []
You’ll observe that we have successfully created an empty DataFrame with the specified columns.
To add a single row using the loc
property, specify the index (or label) and then provide the row data as a list or a dictionary.
# Adding a single row using loc df.loc[0] = [101, 'John Doe', 'Premium', 45.99] print(df)
Output:
CustomerID Name Plan MonthlyCharge 0 101 John Doe Premium 45.99
Notice that a row has been added to the DataFrame with the provided details.
If you have multiple rows to add, you can use a loop. Here’s how you can add multiple rows using the loc
property:
# Sample data to add rows = [ [102, 'Jane Smith', 'Standard', 25.99], [103, 'Robert Brown', 'Basic', 15.99] ] for idx, row in enumerate(rows, start=1): df.loc[idx] = row print(df)
Output:
CustomerID Name Plan MonthlyCharge 0 101 John Doe Premium 45.99 1 102 Jane Smith Standard 25.99 2 103 Robert Brown Basic 15.99
As illustrated above, we’ve successfully added multiple rows to our DataFrame using the loc
property.
Using concat method
Let’s start by initializing an empty DataFrame with desired column names:
empty_df = pd.DataFrame(columns=['CustomerID', 'Name', 'Plan', 'MonthlyCharge']) print(empty_df)
Output:
Empty DataFrame Columns: [CustomerID, Name, Plan, MonthlyCharge] Index: []
We’ve successfully created an empty DataFrame with our desired columns.
Now, concatenate some data to this empty DataFrame:
data_df = pd.DataFrame({ 'CustomerID': [112, 113], 'Name': ['Thomas Green', 'Natalie White'], 'Plan': ['Standard', 'Basic'], 'MonthlyCharge': [30.99, 20.99] }) # Concatenate the data onto the empty DataFrame result_df = pd.concat([empty_df, data_df], ignore_index=True) print(result_df)
Output:
CustomerID Name Plan MonthlyCharge 0 112 Thomas Green Standard 30.99 1 113 Natalie White Basic 20.99
Even though our starting DataFrame was empty, the concat
function seamlessly added the new data from data_df
onto it.
Performance Difference between loc and concat
Let’s create a scenario where we’re adding multiple rows to a DataFrame, using each method. We’ll make use of Python’s built-in timeit
library to measure the execution time.
import timeit import pandas as pd # Sample row data sample_data = {'CustomerID': 114, 'Name': 'Sample User', 'Plan': 'Basic', 'MonthlyCharge': 20.99} def add_using_loc(): df = pd.DataFrame(columns=['CustomerID', 'Name', 'Plan', 'MonthlyCharge']) for _ in range(1000): df.loc[len(df)] = list(sample_data.values()) def add_using_concat(): df = pd.DataFrame(columns=['CustomerID', 'Name', 'Plan', 'MonthlyCharge']) for _ in range(1000): df = pd.concat([df, pd.DataFrame([sample_data])], ignore_index=True) # Measure the execution time loc_time = timeit.timeit(add_using_loc, number=10) concat_time = timeit.timeit(add_using_concat, number=10) print(f"Time taken using loc: {loc_time:.4f} seconds") print(f"Time taken using concat: {concat_time:.4f} seconds")
Output:
Time taken using loc: 13.0261 seconds Time taken using concat: 8.4665 seconds
As you can see, the concat
method is faster to add rows to an empty DataFrame.
Adding Rows to Empty MultiIndex DataFrame
Let’s start by setting up a MultiIndex DataFrame with two levels of indices – let’s say ‘Region’ and ‘CustomerID’, with columns ‘Name’, ‘Plan’, and ‘MonthlyCharge’.
# Set up multi-level columns arrays = [['North', 'North', 'South', 'South'], [115, 116, 117, 118]] index = pd.MultiIndex.from_arrays(arrays, names=('Region', 'CustomerID')) # Create an empty MultiIndex DataFrame multi_df = pd.DataFrame(columns=['Name', 'Plan', 'MonthlyCharge'], index=index) print(multi_df)
Output:
Name Plan MonthlyCharge Region CustomerID North 115 NaN NaN NaN 116 NaN NaN NaN South 117 NaN NaN NaN 118 NaN NaN NaN
We’ve set up an empty MultiIndex DataFrame ready for populating.
You can use the loc
property to add rows to our MultiIndex DataFrame:
multi_df.loc[('North', 115), :] = ['Alice Brown', 'Basic', 20.99] multi_df.loc[('North', 116), :] = ['Bob White', 'Premium', 45.99] multi_df.loc[('South', 117), :] = ['Charlie Black', 'Basic', 18.99] multi_df.loc[('South', 118), :] = ['David Green', 'Standard', 28.99] print(multi_df)
Output:
Name Plan MonthlyCharge Region CustomerID North 115 Alice Brown Basic 20.99 116 Bob White Premium 45.99 South 117 Charlie Black Basic 18.99 118 David Green Standard 28.99
With the help of the loc
property, rows were filled efficiently. By specifying the MultiIndex values, we could directly target the location we wanted to fill in.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.