Seaborn Line Plot with Categorical Data (Visualize Trends)

To create a Seaborn line plot with categorical data, follow these steps:

Put your data in a Pandas DataFrame with a categorical column (e.g., ‘Month’) and a numerical column (e.g., ‘CustomerCount’).
Convert the categorical column to a ‘category’ data type and ensure it’s in the desired order (if necessary).
Use Seaborn lineplot function to specify the categorical column as the x-axis and the numerical column as the y-axis.

data['Month'] = data['Month'].astype('category')
months_in_order = ['January', 'February', 'March', 'April', 'May', 'June', 
                   'July', 'August', 'September', 'October', 'November', 'December']
data['Month'] = pd.Categorical(data['Month'], categories=months_in_order, ordered=True)
sns.lineplot(data=data, x='Month', y='CustomerCount')

In this tutorial, you’ll learn the essential steps of working with categorical data in line plots using Python Seaborn and Pandas libraries.

Table of Contents hide

1 Load Data Using Pandas
2 Categorical Variables
3 Create Line Plot with Categorical Data
4 Using the x and hue Arguments

Load Data Using Pandas

Load Data from Excel

To load data from an Excel file, you’ll need to use the pandas.read_excel() function:

import pandas as pd
data_excel = pd.read_excel('sample_data.xlsx')
print(data_excel.head())

Output:

   Month  CustomerCount
0    Jan            250
1    Feb            300
2    Mar            275
3    Apr            320
4    May            350

Load Data from JSON

For JSON files, use the pandas.read_json() function:

data_json = pd.read_json('sample_data.json')
print(data_json.head())

Output:

   Month  CustomerCount
0    Jan            250
1    Feb            300
2    Mar            275
3    Apr            320
4    May            350

Load Data from XML

To load data from XML, you can use the pandas.read_xml() function:

data_xml = pd.read_xml('sample_data.xml')
print(data_xml.head())

Output:

   Month  CustomerCount
0    Jan            250
1    Feb            300
2    Mar            275
3    Apr            320
4    May            350

Categorical Variables

First, let’s inspect the data types in your DataFrame using the .dtypes attribute:

print(data_excel.dtypes)

Output:

Month            object
CustomerCount     int64
dtype: object

This output indicates that ‘Month’ is an object (typically strings in Pandas) and ‘CustomerCount’ is an integer (int64).

Categorical variables are often represented as object types in Pandas.

In our example, ‘Month’ is a categorical variable as it represents distinct categories.

To explicitly convert an object type to a categorical type, use the astype('category') method:

data_excel['Month'] = data_excel['Month'].astype('category')
print(data_excel.dtypes)

Output:

Month           category
CustomerCount      int64
dtype: object

Now, ‘Month’ is explicitly categorized and ready for categorical data operations and visualizations.

Create Line Plot with Categorical Data

You can use the sns.lineplot() function and specify your categorical and numerical columns:

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(0)
months = pd.date_range('2021-01', periods=12, freq='ME').month_name()
data = pd.DataFrame({
    'Month': np.repeat(months, 5),
    'CustomerCount': np.random.randint(100, 1000, size=60)
})
data['Month'] = data['Month'].astype('category')
months_in_order = ['January', 'February', 'March', 'April', 'May', 'June', 
                   'July', 'August', 'September', 'October', 'November', 'December']
data['Month'] = pd.Categorical(data['Month'], categories=months_in_order, ordered=True)
sns.lineplot(data=data, x='Month', y='CustomerCount')
plt.title('Monthly Customer Count')
plt.xlabel('Month')
plt.ylabel('Number of Customers')
plt.xticks(rotation=45)
plt.show()

Output:

Using the x and hue Arguments

The x argument specifies the categorical variable for the x-axis, while hue allows you to group data by another categorical variable, creating multiple lines in the plot.

This method is useful for comparing trends across different categories.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(0)
months = pd.date_range('2021-01', periods=12, freq='ME').month_name()
data_excel = pd.DataFrame({
    'Month': np.repeat(months, 5),
    'CustomerCount': np.random.randint(100, 1000, size=60)
})
region_sequence = ['North', 'South', 'East', 'West', 'North']
repeat_times = len(data_excel) // len(region_sequence) + 1
data_excel['Region'] = (region_sequence * repeat_times)[:len(data_excel)]
data_excel['Region'] = data_excel['Region'].astype('category')
data_excel['Month'] = data_excel['Month'].astype('category')
sns.lineplot(data=data_excel, x='Month', y='CustomerCount', hue='Region')
plt.title('Monthly Customer Count by Region')
plt.xlabel('Month')
plt.ylabel('Number of Customers')
plt.xticks(rotation=45)
plt.show()

Output:

The output is a line plot with different lines for each ‘Region’. The x-axis shows the months, and the y-axis represents the number of customers.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Load Data Using Pandas

Load Data from Excel

Load Data from JSON

Load Data from XML

Categorical Variables

Create Line Plot with Categorical Data

Using the x and hue Arguments

Related posts

Leave a Reply Cancel reply