Add Rows to Pandas DataFrame Based on Conditions

In this tutorial, you’ll learn how to add rows to a Pandas DataFrame based on specific conditions.

We’ll cover various scenarios, including adding rows based on simple criteria, multiple conditions with logical operators, and incorporating string matching and DateTime conditions.

 

 

Adding a Row Based on Specific Criteria

First, let’s create a sample DataFrame to work with.

import pandas as pd
df = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Plan': ['Basic', 'Premium', 'Basic', 'Standard'],
    'Monthly_Charge': [20, 50, 20, 30]
})
print(df)

Output:

   ID      Plan  Monthly_Charge
0   1     Basic              20
1   2   Premium              50
2   3     Basic              20
3   4  Standard              30

This output shows the DataFrame df containing information on subscriber IDs, their plans, and monthly charges.

Now, let’s say you want to add a row where the Plan is “Premium.”

new_row = {'ID': 5, 'Plan': 'Premium', 'Monthly_Charge': 50}

# Check if 'Premium' already exists in the 'Plan' column
if 'Premium' in df['Plan'].values:
    df.loc[len(df)] = new_row
print(df)

Output:

   ID      Plan  Monthly_Charge
0   1     Basic              20
1   2   Premium              50
2   3     Basic              20
3   4  Standard              30
4   5   Premium              50

 

Multiple Conditions for Row Addition

Let’s see how you can add a row based on multiple conditions, using logical operators like and, or, and not.

To begin, we’ll work with the same sample DataFrame from the previous example.

Using and Operator

Let’s say you want to add a new row to your DataFrame only if it already contains at least one ‘Premium’ plan with a monthly charge of 50.

new_row = {'ID': 6, 'Plan': 'Premium', 'Monthly_Charge': 50}
    
# Add the new row based on multiple conditions
if ('Premium' in df['Plan'].values) and (50 in df['Monthly_Charge'].values):
    df.loc[len(df)] = new_row
print(df)

Output:

   ID      Plan  Monthly_Charge
0   1     Basic              20
1   2   Premium              50
2   3     Basic              20
3   4  Standard              30
4   5   Premium              50
5   6   Premium              50

Using or Operator

You can use the or operator if you want to add a row when either of the conditions is met. Let’s try to add a row if the DataFrame contains either a ‘Premium’ plan or a monthly charge of 20.

new_row = {'ID': 7, 'Plan': 'Basic', 'Monthly_Charge': 25}

# Add the new row based on multiple conditions
if ('Premium' in df['Plan'].values) or (20 in df['Monthly_Charge'].values):
    df.loc[len(df)] = new_row
print(df)

Output:

   ID      Plan  Monthly_Charge
0   1     Basic              20
1   2   Premium              50
2   3     Basic              20
3   4  Standard              30
4   5   Premium              50
5   6   Premium              50
6   7     Basic              25

 

Nested Conditional Statements to Add Rows

Imagine a situation where you want to add a new row with a ‘Premium’ plan only if the DataFrame already contains a ‘Premium’ plan with a monthly charge of 50, or a ‘Basic’ plan with a monthly charge of 20.

new_row = {'ID': 8, 'Plan': 'Premium', 'Monthly_Charge': 55}

# Add the new row based on nested conditions
if 'Premium' in df['Plan'].values:
    if 50 in df[df['Plan'] == 'Premium']['Monthly_Charge'].values:
        df.loc[len(df)] = new_row
elif 'Basic' in df['Plan'].values:
    if 20 in df[df['Plan'] == 'Basic']['Monthly_Charge'].values:
        df.loc[len(df)] = new_row
print(df)

Output:

   ID      Plan  Monthly_Charge
0   1     Basic              20
1   2   Premium              50
2   3     Basic              20
3   4  Standard              30
4   5   Premium              50
5   6   Premium              50
6   7     Basic              25
7   8   Premium              55

 

Add Row Based on DateTime Condition

When you’re working with time-series data or timestamps, applying conditions based on DateTime objects becomes vital.

First, add a ‘StartDate’ column to the DataFrame with some sample DateTime data.

df['StartDate'] = pd.to_datetime(['2022-01-01', '2022-02-01', '2022-03-01', '2022-01-15', 
                                  '2022-02-20', '2022-01-25', '2022-01-30', '2022-02-15'
                                  ])
print(df)

Output:

   ID      Plan  Monthly_Charge  StartDate
0   1     Basic              20 2022-01-01
1   2   Premium              50 2022-02-01
2   3     Basic              20 2022-03-01
3   4  Standard              30 2022-01-15
4   5   Premium              50 2022-02-20
5   6   Premium              50 2022-01-25
6   7     Basic              25 2022-01-30
7   8   Premium              55 2022-02-15

The ‘StartDate’ column is now part of your DataFrame.

Suppose you want to add a new row only if there are existing entries with a ‘StartDate’ in January 2022. You can do this as follows:

from datetime import datetime
new_row = {'ID': 22, 'Plan': 'Ultra', 'Monthly_Charge': 90, 'StartDate': datetime.strptime('2022-04-15', '%Y-%m-%d')}
if any(df['StartDate'].dt.month == 1) and any(df['StartDate'].dt.year == 2022):
    df.loc[len(df)] = new_row
print(df)

Output:

   ID      Plan  Monthly_Charge  StartDate
0   1     Basic              20 2022-01-01
1   2   Premium              50 2022-02-01
2   3     Basic              20 2022-03-01
3   4  Standard              30 2022-01-15
4   5   Premium              50 2022-02-20
5   6   Premium              50 2022-01-25
6   7     Basic              25 2022-01-30
7   8   Premium              55 2022-02-15
8  22     Ultra              90 2022-04-15

 

Adding Rows Based on String Matching Conditions

Sometimes, you might want to add rows based on string matching conditions.

For instance, you may want to append a new row only if a particular substring appears in a text column of your DataFrame.

Let’s see how this works by adding a new column called ‘Description’ to our DataFrame.

First, we’ll add a ‘Description’ column containing some textual data to our existing DataFrame.

df['Description'] = ['Low tier', 'Top tier', 'Low tier', 'Mid tier',
                      'Top tier', 'Top tier', 'Low tier', 'Top tier',
                      'Superior tier']
print(df)

Output:

   ID      Plan  Monthly_Charge  StartDate    Description
0   1     Basic              20 2022-01-01       Low tier
1   2   Premium              50 2022-02-01       Top tier
2   3     Basic              20 2022-03-01       Low tier
3   4  Standard              30 2022-01-15       Mid tier
4   5   Premium              50 2022-02-20       Top tier
5   6   Premium              50 2022-01-25       Top tier
6   7     Basic              25 2022-01-30       Low tier
7   8   Premium              55 2022-02-15       Top tier
8  22     Ultra              90 2022-04-15  Superior tier

Let’s say you want to add a new row only if there are any descriptions that contain the word “Superior.”

new_row = {'ID': 23, 'Plan': 'Elite', 'Monthly_Charge': 100, 'StartDate': datetime.strptime('2022-05-01', '%Y-%m-%d'), 'Description': 'Elite tier'}

# Add row based on string matching
if df['Description'].str.contains('Superior').any():
    df.loc[len(df)] = new_row
print(df)

Output:

   ID      Plan  Monthly_Charge  StartDate    Description
0   1     Basic              20 2022-01-01       Low tier
1   2   Premium              50 2022-02-01       Top tier
2   3     Basic              20 2022-03-01       Low tier
3   4  Standard              30 2022-01-15       Mid tier
4   5   Premium              50 2022-02-20       Top tier
5   6   Premium              50 2022-01-25       Top tier
6   7     Basic              25 2022-01-30       Low tier
7   8   Premium              55 2022-02-15       Top tier
8  22     Ultra              90 2022-04-15  Superior tier
9  23     Elite             100 2022-05-01     Elite tier

 

Add Row Based on Presence of NaN Values

First, we will add some NaN values to our ‘Monthly_Charge’ column to simulate a typical data issue.

import numpy as np
df.loc[[2, 5, 9], 'Monthly_Charge'] = np.nan
print(df)

Output:

   ID      Plan  Monthly_Charge  StartDate    Description
0   1     Basic            20.0 2022-01-01       Low tier
1   2   Premium            50.0 2022-02-01       Top tier
2   3     Basic             NaN 2022-03-01       Low tier
3   4  Standard            30.0 2022-01-15       Mid tier
4   5   Premium            50.0 2022-02-20       Top tier
5   6   Premium             NaN 2022-01-25       Top tier
6   7     Basic            25.0 2022-01-30       Low tier
7   8   Premium            55.0 2022-02-15       Top tier
8  22     Ultra            90.0 2022-04-15  Superior tier
9  23     Elite             NaN 2022-05-01     Elite tier

Now, let’s say you want to add a row only if there are any NaN values in the ‘Monthly_Charge’ column. Here’s how you can do it:

new_row = {'ID': 24, 'Plan': 'Basic', 'Monthly_Charge': 15, 'StartDate': datetime.strptime('2022-05-10', '%Y-%m-%d'), 'Description': 'Lowest tier'}
if df['Monthly_Charge'].isna().any():
    df.loc[len(df)] = new_row
print(df)

Output:

    ID      Plan  Monthly_Charge  StartDate    Description
0    1     Basic            20.0 2022-01-01       Low tier
1    2   Premium            50.0 2022-02-01       Top tier
2    3     Basic             NaN 2022-03-01       Low tier
3    4  Standard            30.0 2022-01-15       Mid tier
4    5   Premium            50.0 2022-02-20       Top tier
5    6   Premium             NaN 2022-01-25       Top tier
6    7     Basic            25.0 2022-01-30       Low tier
7    8   Premium            55.0 2022-02-15       Top tier
8   22     Ultra            90.0 2022-04-15  Superior tier
9   23     Elite             NaN 2022-05-01     Elite tier
10  24     Basic            15.0 2022-05-10    Lowest tier

 

Add Row Based on Previous Row Value

Consider a situation where you want to add a new row only if the ‘Monthly_Charge’ for the previous row is greater than a certain threshold. Here’s an example:

new_row = {'ID': 25, 'Plan': 'Standard', 'Monthly_Charge': 35}
if df.iloc[-1]['Monthly_Charge'] >= 15:
    df.loc[len(df)] = new_row
print(df)

Output:

    ID      Plan  Monthly_Charge  StartDate    Description
0    1     Basic            20.0 2022-01-01       Low tier
1    2   Premium            50.0 2022-02-01       Top tier
2    3     Basic             NaN 2022-03-01       Low tier
3    4  Standard            30.0 2022-01-15       Mid tier
4    5   Premium            50.0 2022-02-20       Top tier
5    6   Premium             NaN 2022-01-25       Top tier
6    7     Basic            25.0 2022-01-30       Low tier
7    8   Premium            55.0 2022-02-15       Top tier
8   22     Ultra            90.0 2022-04-15  Superior tier
9   23     Elite             NaN 2022-05-01     Elite tier
10  24     Basic            15.0 2022-05-10    Lowest tier
11  25  Standard            35.0        NaT            NaN

 

Leave a Reply

Your email address will not be published. Required fields are marked *