Add Row Names to Pandas DataFrame
In this tutorial, you’ll learn how to customize the row index in your Pandas DataFrames by adding row names.
The addition of row names enhances the readability and context of your data.
We’ll explore the index
parameter to set names on creation, the set_index()
method to set existing columns, and the rename() method to modify indices for existing DataFrames.
Using index Parameter
The index
parameter in the pd.DataFrame()
function allows you to specify the row names at the moment of DataFrame creation.
Here’s a sample code snippet:
import pandas as pd data = { 'CustomerID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Cindy', 'David'], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'MonthlyCharge': [20, 50, 20, 50] } # Create DataFrame with index df = pd.DataFrame(data, index=['Customer_1', 'Customer_2', 'Customer_3', 'Customer_4']) print(df)
Output:
CustomerID Name Plan MonthlyCharge Customer_1 1 Alice Basic 20 Customer_2 2 Bob Premium 50 Customer_3 3 Cindy Basic 20 Customer_4 4 David Premium 50
Using DataFrame.index
You can use the index
attribute to add or change row names.
Let’s start by creating a DataFrame without specifying the row names:
import pandas as pd data = { 'CustomerID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Cindy', 'David'], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'MonthlyCharge': [20, 50, 20, 50] } df = pd.DataFrame(data) print("DataFrame without row names:") print(df)
Output:
DataFrame without row names: CustomerID Name Plan MonthlyCharge 0 1 Alice Basic 20 1 2 Bob Premium 50 2 3 Cindy Basic 20 3 4 David Premium 50
As you can see, the row names are just the default integer indices. Now, let’s add meaningful row names.
df.index = ['Customer_1', 'Customer_2', 'Customer_3', 'Customer_4'] print("DataFrame with new row names:") print(df)
Output:
DataFrame with new row names: CustomerID Name Plan MonthlyCharge Customer_1 1 Alice Basic 20 Customer_2 2 Bob Premium 50 Customer_3 3 Cindy Basic 20 Customer_4 4 David Premium 50
Using df.set_index()
Another way to add row names to your DataFrame is by promoting one of the existing columns as an index using the set_index()
method.
This can be useful if your dataset includes a column that serves as a unique identifier.
Here’s how to do it:
import pandas as pd data = { 'CustomerID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Cindy', 'David'], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'MonthlyCharge': [20, 50, 20, 50] } df = pd.DataFrame(data) print("Original DataFrame:") print(df)
Output:
Original DataFrame: CustomerID Name Plan MonthlyCharge 0 1 Alice Basic 20 1 2 Bob Premium 50 2 3 Cindy Basic 20 3 4 David Premium 50
You can see the DataFrame starts with default integer indices. Let’s change that by setting the ‘Name’ column as the index.
df.set_index('Name', inplace=True) print("DataFrame with 'Name' column as row names:") print(df)
Output:
DataFrame with 'Name' column as row names: CustomerID Plan MonthlyCharge Name Alice 1 Basic 20 Bob 2 Premium 50 Cindy 3 Basic 20 David 4 Premium 50
Note that inplace=True
makes the change in the original DataFrame itself.
Resetting Row Names Using df.reset_index()
You can use the reset_index()
method comes in if you want to revert your DataFrame to its original state, move your current row names back into a column, and set the index to default integer values.
Let’s start with a DataFrame where the ‘Name’ column is serving as row names:
import pandas as pd data = { 'CustomerID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Cindy', 'David'], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'MonthlyCharge': [20, 50, 20, 50] } df = pd.DataFrame(data) df.set_index('Name', inplace=True) print("DataFrame with 'Name' as row names:") print(df)
Output:
DataFrame with 'Name' as row names: CustomerID Plan MonthlyCharge Name Alice 1 Basic 20 Bob 2 Premium 50 Cindy 3 Basic 20 David 4 Premium 50
Now, let’s reset the row names to their default integer values:
# Reset row names and move the existing row names back into a column df.reset_index(inplace=True) print("DataFrame after resetting row names:") print(df)
Output:
DataFrame after resetting row names: Name CustomerID Plan MonthlyCharge 0 Alice 1 Basic 20 1 Bob 2 Premium 50 2 Cindy 3 Basic 20 3 David 4 Premium 50
Adding Multiple Levels of Row Names
In some complex data analysis tasks, you may need to categorize your data across multiple dimensions.
For such use cases, Pandas supports hierarchical indexing or multi-level indexing.
This enables you to have multiple levels of row names, adding depth to your DataFrame.
Let’s extend our dataset with a ‘State’ column and see how to set hierarchical row names:
import pandas as pd data = { 'State': ['CA', 'CA', 'NY', 'NY'], 'CustomerID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Cindy', 'David'], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'MonthlyCharge': [20, 50, 20, 50] } df = pd.DataFrame(data) print("Original DataFrame:") print(df)
Output:
Original DataFrame: State CustomerID Name Plan MonthlyCharge 0 CA 1 Alice Basic 20 1 CA 2 Bob Premium 50 2 NY 3 Cindy Basic 20 3 NY 4 David Premium 50
Now, let’s set both ‘State’ and ‘Name’ as hierarchical row names.
df.set_index(['State', 'Name'], inplace=True) print("DataFrame with hierarchical row names:") print(df)
Output:
DataFrame with hierarchical row names: CustomerID Plan MonthlyCharge State Name CA Alice 1 Basic 20 Bob 2 Premium 50 NY Cindy 3 Basic 20 David 4 Premium 50
Adding Row Names using df.rename()
This method provides a flexible way to rename some or all of your row names without altering the DataFrame’s other data.
First, let’s prepare a DataFrame with default integer row names:
import pandas as pd data = { 'CustomerID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Cindy', 'David'], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'MonthlyCharge': [20, 50, 20, 50] } df = pd.DataFrame(data) print("Original DataFrame:") print(df)
Output:
Original DataFrame: CustomerID Name Plan MonthlyCharge 0 1 Alice Basic 20 1 2 Bob Premium 50 2 3 Cindy Basic 20 3 4 David Premium 50
Add Single Row Name
If you want to rename the row with index 0
to ‘Customer_1’, you can set them as a dictionary to the index
parameter of the rename
method:
df.rename(index={0: 'Customer_1'}, inplace=True) print("DataFrame after renaming a specific row:") print(df)
Output:
DataFrame after renaming a specific row: CustomerID Name Plan MonthlyCharge Customer_1 1 Alice Basic 20 1 2 Bob Premium 50 2 3 Cindy Basic 20 3 4 David Premium 50
We renamed the row with index 0
to ‘Customer_1’ using df.rename(index={0: 'Customer_1'}, inplace=True)
.
Add Multiple Row Names
Now, let’s rename multiple rows in one go:
df.rename(index={1: 'Customer_2', 2: 'Customer_3', 3: 'Customer_4'}, inplace=True) print("DataFrame after renaming multiple rows:") print(df)
Output:
DataFrame after renaming multiple rows: CustomerID Name Plan MonthlyCharge Customer_1 1 Alice Basic 20 Customer_2 2 Bob Premium 50 Customer_3 3 Cindy Basic 20 Customer_4 4 David Premium 50
By supplying a dictionary to the index
parameter, we can rename multiple rows in one command, making your data more contextual and easier to interpret.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.