Seaborn Estimators: Visualize Data Aggregation

Seaborn estimators offer a method for statistical data aggregation, allowing for calculating mean, median, and standard deviation directly within visualizations.

In this tutorial, we will explore the various types of estimators available in Seaborn, such as mean, median, sum, and more, including how to create custom estimators.

Then, we will implement these estimators in different Seaborn plots such as line plots and bar plots.

Next, we will discuss grouping data for more refined estimations and conclude with methods to visualize aggregated data.

 

 

Types of Estimators in Seaborn

Built-in Estimators

Mean Estimator

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = {
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
    'Calls': [200, 150, 180, 210, 190, 200],
    'Duration': [60, 55, 65, 70, 60, 75]
}
df = pd.DataFrame(data)

# Mean Estimator
sns.lineplot(x='Month', y='Duration', data=df, estimator='mean')
plt.show()

Output:

Mean Estimator

Median Estimator

sns.lineplot(x='Month', y='Calls', data=df, estimator='median')
plt.show()

Output:

Median Estimator

Sum Estimator

sns.lineplot(x='Month', y='Duration', data=df, estimator=np.sum)
plt.show()

Output:

Sum Estimator

Standard Deviation Estimator

sns.lineplot(x='Month', y='Calls', data=df, estimator=np.std)
plt.show()

Output:

Standard Deviation Estimator

Variance Estimator

sns.lineplot(x='Month', y='Duration', data=df, estimator=np.var)
plt.show()

Output:

Variance Estimator

Custom Estimators

If built-in estimators don’t meet specific analytical needs, Seaborn allows you to create custom estimators.

def custom_range_estimator(arr):
    return np.max(arr) - np.min(arr)

sns.lineplot(x='Month', y='Calls', data=df, estimator=custom_range_estimator)
plt.show()

Output:

Custom Estimators

This custom estimator calculates the range of calls.

 

Implement Estimators in Seaborn Plots

Implementing estimators in your visualizations provides deeper insights into your dataset.

Estimators in Line Plots

Line plots are ideal for observing trends over time. Here’s how to implement various estimators in a line plot:

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = {
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
    'Calls': [200, 150, 180, 210, 190, 200],
    'Duration': [60, 55, 65, 70, 60, 75]
}
df = pd.DataFrame(data)
sns.lineplot(x='Month', y='Calls', data=df, estimator='mean')
plt.show()

Output:

Estimators in Line Plots

This plot helps track the average call volume trends over the months.

Estimators in Bar Plots

Bar plots are effective for comparing quantities across different categories. Here’s how to use estimators in bar plots:

sns.barplot(x='Month', y='Duration', data=df, estimator=np.sum)
plt.show()

Output:

Estimators in Bar Plots

This visualization shows the total duration of calls in each month.

def custom_median_absolute_deviation(arr):
    return np.median(np.abs(arr - np.median(arr)))

sns.barplot(x='Month', y='Calls', data=df, estimator=custom_median_absolute_deviation)
plt.show()

Output:

Custom median absolute deviation

Grouping Data for Estimation

Grouping data allows you to perform comparative analyses across different subsets of your data.

First, consider grouping data by a single variable. For instance, you can compare the average call duration across different months.

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
    'ServiceTier': ['Basic', 'Premium', 'Basic', 'Premium', 'Basic', 'Premium', 'Basic', 'Premium', 'Basic', 'Premium', 'Basic', 'Premium'],
    'Calls': [200, 220, 180, 240, 190, 210, 230, 250, 200, 260, 210, 220],
    'Duration': [60, 80, 65, 90, 60, 85, 70, 95, 75, 100, 80, 90]
}
df = pd.DataFrame(data)

# Line plot grouped by 'Month'
sns.lineplot(x='Month', y='Duration', data=df, estimator='mean', hue='ServiceTier')
plt.show()

Output:

Grouping by a Single Variable

Leave a Reply

Your email address will not be published. Required fields are marked *