Add Row Names to Pandas DataFrame

In this tutorial, you’ll learn how to customize the row index in your Pandas DataFrames by adding row names.

The addition of row names enhances the readability and context of your data.

We’ll explore the index parameter to set names on creation, the set_index() method to set existing columns, and the rename() method to modify indices for existing DataFrames.

Table of Contents hide

1 Using index Parameter
2 Using DataFrame.index
3 Using df.set_index()
4 Resetting Row Names Using df.reset_index()
5 Adding Multiple Levels of Row Names
6 Adding Row Names using df.rename()
- 6.1 Add Single Row Name
- 6.2 Add Multiple Row Names

Using index Parameter

The index parameter in the pd.DataFrame() function allows you to specify the row names at the moment of DataFrame creation.

Here’s a sample code snippet:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}

# Create DataFrame with index
df = pd.DataFrame(data, index=['Customer_1', 'Customer_2', 'Customer_3', 'Customer_4'])
print(df)

Output:

            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
Customer_2           2    Bob  Premium             50
Customer_3           3  Cindy    Basic             20
Customer_4           4  David  Premium             50

Using DataFrame.index

You can use the index attribute to add or change row names.

Let’s start by creating a DataFrame without specifying the row names:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("DataFrame without row names:")
print(df)

Output:

DataFrame without row names:
   CustomerID   Name     Plan  MonthlyCharge
0           1  Alice    Basic             20
1           2    Bob  Premium             50
2           3  Cindy    Basic             20
3           4  David  Premium             50

As you can see, the row names are just the default integer indices. Now, let’s add meaningful row names.

df.index = ['Customer_1', 'Customer_2', 'Customer_3', 'Customer_4']
print("DataFrame with new row names:")
print(df)

Output:

DataFrame with new row names:
            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
Customer_2           2    Bob  Premium             50
Customer_3           3  Cindy    Basic             20
Customer_4           4  David  Premium             50

Using df.set_index()

Another way to add row names to your DataFrame is by promoting one of the existing columns as an index using the set_index() method.

This can be useful if your dataset includes a column that serves as a unique identifier.

Here’s how to do it:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
   CustomerID   Name     Plan  MonthlyCharge
0           1  Alice    Basic             20
1           2    Bob  Premium             50
2           3  Cindy    Basic             20
3           4  David  Premium             50

You can see the DataFrame starts with default integer indices. Let’s change that by setting the ‘Name’ column as the index.

df.set_index('Name', inplace=True)
print("DataFrame with 'Name' column as row names:")
print(df)

Output:

DataFrame with 'Name' column as row names:
       CustomerID     Plan  MonthlyCharge
Name                                    
Alice           1    Basic             20
Bob             2  Premium             50
Cindy           3    Basic             20
David           4  Premium             50

Note that inplace=True makes the change in the original DataFrame itself.

Resetting Row Names Using df.reset_index()

You can use the reset_index() method comes in if you want to revert your DataFrame to its original state, move your current row names back into a column, and set the index to default integer values.

Let’s start with a DataFrame where the ‘Name’ column is serving as row names:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True)
print("DataFrame with 'Name' as row names:")
print(df)

Output:

DataFrame with 'Name' as row names:
       CustomerID     Plan  MonthlyCharge
Name                                    
Alice           1    Basic             20
Bob             2  Premium             50
Cindy           3    Basic             20
David           4  Premium             50

Now, let’s reset the row names to their default integer values:

# Reset row names and move the existing row names back into a column
df.reset_index(inplace=True)
print("DataFrame after resetting row names:")
print(df)

Output:

DataFrame after resetting row names:
    Name  CustomerID     Plan  MonthlyCharge
0  Alice           1    Basic             20
1    Bob           2  Premium             50
2  Cindy           3    Basic             20
3  David           4  Premium             50

Adding Multiple Levels of Row Names

In some complex data analysis tasks, you may need to categorize your data across multiple dimensions.

For such use cases, Pandas supports hierarchical indexing or multi-level indexing.

This enables you to have multiple levels of row names, adding depth to your DataFrame.

Let’s extend our dataset with a ‘State’ column and see how to set hierarchical row names:

import pandas as pd
data = {
    'State': ['CA', 'CA', 'NY', 'NY'],
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
  State  CustomerID   Name     Plan  MonthlyCharge
0    CA           1  Alice    Basic             20
1    CA           2    Bob  Premium             50
2    NY           3  Cindy    Basic             20
3    NY           4  David  Premium             50

Now, let’s set both ‘State’ and ‘Name’ as hierarchical row names.

df.set_index(['State', 'Name'], inplace=True)
print("DataFrame with hierarchical row names:")
print(df)

Output:

DataFrame with hierarchical row names:
             CustomerID     Plan  MonthlyCharge
State Name                                     
CA    Alice           1    Basic             20
      Bob             2  Premium             50
NY    Cindy           3    Basic             20
      David           4  Premium             50

Adding Row Names using df.rename()

This method provides a flexible way to rename some or all of your row names without altering the DataFrame’s other data.

First, let’s prepare a DataFrame with default integer row names:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Cindy', 'David'],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
   CustomerID   Name     Plan  MonthlyCharge
0           1  Alice    Basic             20
1           2    Bob  Premium             50
2           3  Cindy    Basic             20
3           4  David  Premium             50

Add Single Row Name

If you want to rename the row with index 0 to ‘Customer_1’, you can set them as a dictionary to the index parameter of the rename method:

df.rename(index={0: 'Customer_1'}, inplace=True)
print("DataFrame after renaming a specific row:")
print(df)

Output:

DataFrame after renaming a specific row:
            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
1                    2    Bob  Premium             50
2                    3  Cindy    Basic             20
3                    4  David  Premium             50

We renamed the row with index 0 to ‘Customer_1’ using df.rename(index={0: 'Customer_1'}, inplace=True).

Add Multiple Row Names

Now, let’s rename multiple rows in one go:

df.rename(index={1: 'Customer_2', 2: 'Customer_3', 3: 'Customer_4'}, inplace=True)
print("DataFrame after renaming multiple rows:")
print(df)

Output:

DataFrame after renaming multiple rows:
            CustomerID   Name     Plan  MonthlyCharge
Customer_1           1  Alice    Basic             20
Customer_2           2    Bob  Premium             50
Customer_3           3  Cindy    Basic             20
Customer_4           4  David  Premium             50

By supplying a dictionary to the index parameter, we can rename multiple rows in one command, making your data more contextual and easier to interpret.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Using index Parameter

Using DataFrame.index

Using df.set_index()

Resetting Row Names Using df.reset_index()

Adding Multiple Levels of Row Names

Adding Row Names using df.rename()

Add Single Row Name

Add Multiple Row Names

Related posts

Leave a Reply Cancel reply