Add Row Names to Pandas DataFrame

In this tutorial, you’ll learn how to customize the row index in your Pandas DataFrames by adding row names.

The addition of row names enhances the readability and context of your data.

We’ll explore the index parameter to set names on creation, the set_index() method to set existing columns, and the rename() method to modify indices for existing DataFrames.

 

 

Using index Parameter

The index parameter in the pd.DataFrame() function allows you to specify the row names at the moment of DataFrame creation.

Here’s a sample code snippet:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}

# Create DataFrame with index
df = pd.DataFrame(data, index=['Customer_1', 'Customer_2', 'Customer_3', 'Customer_4'])
print(df)

Output:

            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
Customer_2           2    Bob  Premium             50
Customer_3           3  Cindy    Basic             20
Customer_4           4  David  Premium             50

 

Using DataFrame.index

You can use the index attribute to add or change row names.

Let’s start by creating a DataFrame without specifying the row names:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("DataFrame without row names:")
print(df)

Output:

DataFrame without row names:
   CustomerID   Name     Plan  MonthlyCharge
0           1  Alice    Basic             20
1           2    Bob  Premium             50
2           3  Cindy    Basic             20
3           4  David  Premium             50

As you can see, the row names are just the default integer indices. Now, let’s add meaningful row names.

df.index = ['Customer_1', 'Customer_2', 'Customer_3', 'Customer_4']
print("DataFrame with new row names:")
print(df)

Output:

DataFrame with new row names:
            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
Customer_2           2    Bob  Premium             50
Customer_3           3  Cindy    Basic             20
Customer_4           4  David  Premium             50

 

Using df.set_index()

Another way to add row names to your DataFrame is by promoting one of the existing columns as an index using the set_index() method.

This can be useful if your dataset includes a column that serves as a unique identifier.

Here’s how to do it:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
   CustomerID   Name     Plan  MonthlyCharge
0           1  Alice    Basic             20
1           2    Bob  Premium             50
2           3  Cindy    Basic             20
3           4  David  Premium             50

You can see the DataFrame starts with default integer indices. Let’s change that by setting the ‘Name’ column as the index.

df.set_index('Name', inplace=True)
print("DataFrame with 'Name' column as row names:")
print(df)

Output:

DataFrame with 'Name' column as row names:
       CustomerID     Plan  MonthlyCharge
Name                                    
Alice           1    Basic             20
Bob             2  Premium             50
Cindy           3    Basic             20
David           4  Premium             50

Note that inplace=True makes the change in the original DataFrame itself.

 

Resetting Row Names Using df.reset_index()

You can use the reset_index() method comes in if you want to revert your DataFrame to its original state, move your current row names back into a column, and set the index to default integer values.

Let’s start with a DataFrame where the ‘Name’ column is serving as row names:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True)
print("DataFrame with 'Name' as row names:")
print(df)

Output:

DataFrame with 'Name' as row names:
       CustomerID     Plan  MonthlyCharge
Name                                    
Alice           1    Basic             20
Bob             2  Premium             50
Cindy           3    Basic             20
David           4  Premium             50

Now, let’s reset the row names to their default integer values:

# Reset row names and move the existing row names back into a column
df.reset_index(inplace=True)
print("DataFrame after resetting row names:")
print(df)

Output:

DataFrame after resetting row names:
    Name  CustomerID     Plan  MonthlyCharge
0  Alice           1    Basic             20
1    Bob           2  Premium             50
2  Cindy           3    Basic             20
3  David           4  Premium             50

 

Adding Multiple Levels of Row Names

In some complex data analysis tasks, you may need to categorize your data across multiple dimensions.

For such use cases, Pandas supports hierarchical indexing or multi-level indexing.

This enables you to have multiple levels of row names, adding depth to your DataFrame.

Let’s extend our dataset with a ‘State’ column and see how to set hierarchical row names:

import pandas as pd
data = {
    'State': ['CA', 'CA', 'NY', 'NY'],
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
  State  CustomerID   Name     Plan  MonthlyCharge
0    CA           1  Alice    Basic             20
1    CA           2    Bob  Premium             50
2    NY           3  Cindy    Basic             20
3    NY           4  David  Premium             50

Now, let’s set both ‘State’ and ‘Name’ as hierarchical row names.

df.set_index(['State', 'Name'], inplace=True)
print("DataFrame with hierarchical row names:")
print(df)

Output:

DataFrame with hierarchical row names:
             CustomerID     Plan  MonthlyCharge
State Name                                     
CA    Alice           1    Basic             20
      Bob             2  Premium             50
NY    Cindy           3    Basic             20
      David           4  Premium             50

 

Adding Row Names using df.rename()

This method provides a flexible way to rename some or all of your row names without altering the DataFrame’s other data.

First, let’s prepare a DataFrame with default integer row names:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
   CustomerID   Name     Plan  MonthlyCharge
0           1  Alice    Basic             20
1           2    Bob  Premium             50
2           3  Cindy    Basic             20
3           4  David  Premium             50

Add Single Row Name

If you want to rename the row with index 0 to ‘Customer_1’, you can set them as a dictionary to the index parameter of the rename method:

df.rename(index={0: 'Customer_1'}, inplace=True)
print("DataFrame after renaming a specific row:")
print(df)

Output:

DataFrame after renaming a specific row:
            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
1                    2    Bob  Premium             50
2                    3  Cindy    Basic             20
3                    4  David  Premium             50

We renamed the row with index 0 to ‘Customer_1’ using df.rename(index={0: 'Customer_1'}, inplace=True).

Add Multiple Row Names

Now, let’s rename multiple rows in one go:

df.rename(index={1: 'Customer_2', 2: 'Customer_3', 3: 'Customer_4'}, inplace=True)
print("DataFrame after renaming multiple rows:")
print(df)

Output:

DataFrame after renaming multiple rows:
            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
Customer_2           2    Bob  Premium             50
Customer_3           3  Cindy    Basic             20
Customer_4           4  David  Premium             50

By supplying a dictionary to the index parameter, we can rename multiple rows in one command, making your data more contextual and easier to interpret.

Leave a Reply

Your email address will not be published. Required fields are marked *