Seaborn Line Plot with Categorical Data (Visualize Trends)

To create a Seaborn line plot with categorical data, follow these steps:

  1. Put your data in a Pandas DataFrame with a categorical column (e.g., ‘Month’) and a numerical column (e.g., ‘CustomerCount’).
  2. Convert the categorical column to a ‘category’ data type and ensure it’s in the desired order (if necessary).
  3. Use Seaborn lineplot function to specify the categorical column as the x-axis and the numerical column as the y-axis.
data['Month'] = data['Month'].astype('category')
months_in_order = ['January', 'February', 'March', 'April', 'May', 'June', 
                   'July', 'August', 'September', 'October', 'November', 'December']
data['Month'] = pd.Categorical(data['Month'], categories=months_in_order, ordered=True)
sns.lineplot(data=data, x='Month', y='CustomerCount')

In this tutorial, you’ll learn the essential steps of working with categorical data in line plots using Python Seaborn and Pandas libraries.

 

 

Load Data Using Pandas

Load Data from Excel

To load data from an Excel file, you’ll need to use the pandas.read_excel() function:

import pandas as pd
data_excel = pd.read_excel('sample_data.xlsx')
print(data_excel.head())

Output:

   Month  CustomerCount
0    Jan            250
1    Feb            300
2    Mar            275
3    Apr            320
4    May            350

Load Data from JSON

For JSON files, use the pandas.read_json() function:

data_json = pd.read_json('sample_data.json')
print(data_json.head())

Output:

   Month  CustomerCount
0    Jan            250
1    Feb            300
2    Mar            275
3    Apr            320
4    May            350

Load Data from XML

To load data from XML, you can use the pandas.read_xml() function:

data_xml = pd.read_xml('sample_data.xml')
print(data_xml.head())

Output:

   Month  CustomerCount
0    Jan            250
1    Feb            300
2    Mar            275
3    Apr            320
4    May            350

 

Categorical Variables

First, let’s inspect the data types in your DataFrame using the .dtypes attribute:

print(data_excel.dtypes)

Output:

Month            object
CustomerCount     int64
dtype: object

This output indicates that ‘Month’ is an object (typically strings in Pandas) and ‘CustomerCount’ is an integer (int64).

Categorical variables are often represented as object types in Pandas.

In our example, ‘Month’ is a categorical variable as it represents distinct categories.

To explicitly convert an object type to a categorical type, use the astype('category') method:

data_excel['Month'] = data_excel['Month'].astype('category')
print(data_excel.dtypes)

Output:

Month           category
CustomerCount      int64
dtype: object

Now, ‘Month’ is explicitly categorized and ready for categorical data operations and visualizations.

 

Create Line Plot with Categorical Data

You can use the sns.lineplot() function and specify your categorical and numerical columns:

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(0)
months = pd.date_range('2021-01', periods=12, freq='ME').month_name()
data = pd.DataFrame({
    'Month': np.repeat(months, 5),
    'CustomerCount': np.random.randint(100, 1000, size=60)
})
data['Month'] = data['Month'].astype('category')
months_in_order = ['January', 'February', 'March', 'April', 'May', 'June', 
                   'July', 'August', 'September', 'October', 'November', 'December']
data['Month'] = pd.Categorical(data['Month'], categories=months_in_order, ordered=True)
sns.lineplot(data=data, x='Month', y='CustomerCount')
plt.title('Monthly Customer Count')
plt.xlabel('Month')
plt.ylabel('Number of Customers')
plt.xticks(rotation=45)
plt.show()

Output:

Create Line Plot with Categorical Data

 

Using the x and hue Arguments

The x argument specifies the categorical variable for the x-axis, while hue allows you to group data by another categorical variable, creating multiple lines in the plot.

This method is useful for comparing trends across different categories.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(0)
months = pd.date_range('2021-01', periods=12, freq='ME').month_name()
data_excel = pd.DataFrame({
    'Month': np.repeat(months, 5),
    'CustomerCount': np.random.randint(100, 1000, size=60)
})
region_sequence = ['North', 'South', 'East', 'West', 'North']
repeat_times = len(data_excel) // len(region_sequence) + 1
data_excel['Region'] = (region_sequence * repeat_times)[:len(data_excel)]
data_excel['Region'] = data_excel['Region'].astype('category')
data_excel['Month'] = data_excel['Month'].astype('category')
sns.lineplot(data=data_excel, x='Month', y='CustomerCount', hue='Region')
plt.title('Monthly Customer Count by Region')
plt.xlabel('Month')
plt.ylabel('Number of Customers')
plt.xticks(rotation=45)
plt.show()

Output:

Using the x and hue Arguments

The output is a line plot with different lines for each ‘Region’. The x-axis shows the months, and the y-axis represents the number of customers.

Leave a Reply

Your email address will not be published. Required fields are marked *