Add Rows to Pandas DataFrame in Loop

In this tutorial, you’ll learn different methods to add rows to Pandas DataFrame using loops.

We’ll use methods such as: concat(), loc[], iloc[], iterrows(), and from_records().

 

 

Using concat

Let’s start with a sample DataFrame and assume we have multiple batches of new customers to add:

data = {'CustomerID': [1, 2, 3],
        'Name': ['John', 'Emily', 'Michael'],
        'Plan': ['Basic', 'Premium', 'Standard'],
        'Balance': [50, 120, 80]}
df = pd.DataFrame(data)

batch_1 = pd.DataFrame({'CustomerID': [4, 5],
                        'Name': ['Sarah', 'Alex'],
                        'Plan': ['Basic', 'Premium'],
                        'Balance': [60, 100]})
batch_2 = pd.DataFrame({'CustomerID': [6, 7],
                        'Name': ['Daniel', 'Emma'],
                        'Plan': ['Standard', 'Basic'],
                        'Balance': [70, 55]})
batches = [batch_1, batch_2]
print("Initial DataFrame:")
print(df)

Output:

Initial DataFrame:
   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80

Now, you can use a loop to add these batches to the existing DataFrame using concat:

for batch in batches:
    df = pd.concat([df, batch], ignore_index=True)
print("DataFrame after adding batches:")
print(df)

Output:

DataFrame after adding batches:
   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80
3           4    Sarah     Basic       60
4           5     Alex   Premium      100
5           6   Daniel  Standard       70
6           7     Emma     Basic       55

 

Adding Rows using loc and iloc in a Loop

These methods are useful for modifying existing rows or inserting rows in the middle of a DataFrame.

Using loc

Let’s start with the initial DataFrame and a list of new customers:

data = {'CustomerID': [1, 2, 3],
        'Name': ['John', 'Emily', 'Michael'],
        'Plan': ['Basic', 'Premium', 'Standard'],
        'Balance': [50, 120, 80]}
df = pd.DataFrame(data)

# List of new customers as dictionaries
new_customers = [
    {'CustomerID': 4, 'Name': 'Sarah', 'Plan': 'Basic', 'Balance': 60},
    {'CustomerID': 5, 'Name': 'Alex', 'Plan': 'Premium', 'Balance': 100}
]
print("Initial DataFrame:")
print(df)

Output:

Initial DataFrame:
   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80

Now you can use the loc property within a for loop to add each new customer to the DataFrame:

for idx, customer in enumerate(new_customers, start=len(df)):
    df.loc[idx] = [customer['CustomerID'], customer['Name'], customer['Plan'], customer['Balance']]
print("DataFrame after adding rows using loc:")
print(df)

Output:

DataFrame after adding rows using loc:
   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80
3           4    Sarah     Basic       60
4           5     Alex   Premium      100

Using iloc

To add new rows using iloc, you’ll first need to increase the DataFrame’s index size.

Then you can use iloc to directly place data into the new row positions:

# Number of new rows to add
num_new_rows = 3

# Increase DataFrame index size
df_length = len(df)
df = df.reindex(df.index.tolist() + list(range(df_length, df_length + num_new_rows)))

for i in range(num_new_rows):
    new_row_index = df_length + i
    df.iloc[new_row_index] = [new_row_index + 1, f'Customer{new_row_index + 1}', 'Basic', 50 + new_row_index]
print("DataFrame after adding rows using iloc in a loop:")
print(df)

Output:

DataFrame after adding rows using iloc in a loop:
   CustomerID       Name      Plan  Balance
0         1.0       John     Basic     50.0
1         2.0      Emily   Premium    120.0
2         3.0    Michael  Standard     80.0
3         4.0  Customer4     Basic     53.0
4         5.0  Customer5     Basic     54.0
5         6.0  Customer6     Basic     55.0

 

Using iterrows()

Using the iterrows() function provides yet another approach to loop through each row of a DataFrame to add new rows.

The function returns an iterator resulting an index and row data as pairs.

This method is useful when you need to consider the index while manipulating rows.

Our initial DataFrame:

data = {'CustomerID': [1, 2, 3],
        'Name': ['John', 'Emily', 'Michael'],
        'Plan': ['Basic', 'Premium', 'Standard'],
        'Balance': [50, 120, 80]}
df = pd.DataFrame(data)
print("Initial DataFrame:")
print(df)

Output:

Initial DataFrame:
   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80

Let’s create new rows where each new row’s balance is the corresponding original row’s balance minus a service charge of 5.

Here’s how you can use iterrows() to do this:

for index, row in df.iterrows():
    new_row = row.copy()
    new_row['Balance'] = row['Balance'] - 5  # Apply a service charge
    df.loc[len(df)] = new_row
df.reset_index(drop=True, inplace=True)
print(df)

Output:

   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80
3           1     John     Basic       45
4           2    Emily   Premium      115
5           3  Michael  Standard       75

 

Using DataFrame.from_records

You can use DataFrame.from_records method to add multiple rows to a DataFrame that are created by a loop.

Here, we’ll dynamically create a list of dictionaries and then convert it into a DataFrame.

Let’s start with the initial DataFrame:

data = {'CustomerID': [1, 2, 3],
        'Name': ['John', 'Emily', 'Michael'],
        'Plan': ['Basic', 'Premium', 'Standard'],
        'Balance': [50, 120, 80]}
df = pd.DataFrame(data)
print("Initial DataFrame:")
print(df)

Output:

Initial DataFrame:
   CustomerID     Name      Plan  Balance
0           1     John     Basic       50
1           2    Emily   Premium      120
2           3  Michael  Standard       80

Now, let’s suppose you want to add new customer rows dynamically, perhaps based on some condition or external data source. For demonstration, we’ll add 3 new rows in a for loop:

new_rows_list = []

# Loop to create new rows
for i in range(4, 7):
    new_row = {'CustomerID': i, 'Name': f'Customer{i}', 'Plan': 'Basic', 'Balance': 60 + i}
    new_rows_list.append(new_row)
new_rows_df = pd.DataFrame.from_records(new_rows_list)
print("New rows as DataFrame:")
print(new_rows_df)

Output:

New rows as DataFrame:
   CustomerID      Name   Plan  Balance
0           4  Customer4  Basic       64
1           5  Customer5  Basic       65
2           6  Customer6  Basic       66

Finally, you can concatenate this new DataFrame with the original one:

# Merge the new rows DataFrame with the original DataFrame
df = pd.concat([df, new_rows_df], ignore_index=True)
print("DataFrame after efficient append using DataFrame.from_records and a for loop:")
print(df)

Output:

DataFrame after efficient append using DataFrame.from_records and a for loop:
   CustomerID       Name      Plan  Balance
0           1       John     Basic       50
1           2      Emily   Premium      120
2           3    Michael  Standard       80
3           4  Customer4     Basic       64
4           5  Customer5     Basic       65
5           6  Customer6     Basic       66

 

Performance Comparison

Let’s start by creating a sample DataFrame with 10,000 rows. We’ll time each method to append an additional 1,000 rows.

import pandas as pd
import time
data = {'CustomerID': list(range(1, 10001)),
        'Name': [f'Customer{i}' for i in range(1, 10001)],
        'Plan': ['Basic'] * 10000,
        'Balance': [50] * 10000}
df = pd.DataFrame(data)

Timing concat() Method

new_rows = pd.DataFrame({'CustomerID': list(range(11001, 12001)),
                          'Name': [f'Customer{i}' for i in range(11001, 12001)],
                          'Plan': ['Basic'] * 1000,
                          'Balance': [50] * 1000})
start_time = time.time()
df = pd.concat([df, new_rows], ignore_index=True)
end_time = time.time()
print(f"Time taken using concat(): {end_time - start_time} seconds")

Timing loc with For Loop

start_time = time.time()
for i in range(12001, 13001):
    df.loc[len(df.index)] = [i, f'Customer{i}', 'Basic', 50]
end_time = time.time()
print(f"Time taken using loc with for loop: {end_time - start_time} seconds")

Timing DataFrame.from_records with For Loop

new_rows_list = []
for i in range(13001, 14001):
    new_row = {'CustomerID': i, 'Name': f'Customer{i}', 'Plan': 'Basic', 'Balance': 50}
    new_rows_list.append(new_row)
new_rows_df = pd.DataFrame.from_records(new_rows_list)
start_time = time.time()
df = pd.concat([df, new_rows_df], ignore_index=True)
end_time = time.time()
print(f"Time taken using DataFrame.from_records with for loop: {end_time - start_time} seconds")

Output:

Time taken using concat(): 0.0020020008087158203 seconds
Time taken using loc with for loop: 1.9779589176177979 seconds
Time taken using DataFrame.from_records with for loop: 0.002157926559448242 seconds

As you can see, concat() and DataFrame.from_records() are faster to add rows in a loop.

Leave a Reply

Your email address will not be published. Required fields are marked *