Fill Area Between Seaborn Line Plots: Highlight Data Trends

In this tutorial, you’ll learn how to fill areas between Seaborn line plots.

We’ll explore methods ranging from simple fills against a constant value to filling between multiple line plots, handling overlaps, and applying gradient fill.

 

 

Fill Between Single Line Plot and Constant Value

Imagine you have a dataset that tracks the average monthly data usage of users over a year.

You want to visualize this data and highlight the months where data usage exceeded a certain threshold.

First, let’s import the necessary libraries and prepare a sample dataset:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Data_Usage_GB': [12, 15, 18, 20, 23, 25, 27, 30, 28, 24, 19, 16]
}
df = pd.DataFrame(data)

Next, create a line plot and fill the area between the line and the x-axis for data usage exceeding 20 GB:

sns.lineplot(x='Month', y='Data_Usage_GB', data=df, marker='o')
plt.fill_between(x=df['Month'], y1=df['Data_Usage_GB'], y2=20, where=(df['Data_Usage_GB'] > 20), color='red', alpha=0.3)
plt.title('Monthly Data Usage with Highlighted Overuse')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.show()

Output:

Fill Between Single Line Plot and Constant Value

 

Fill Area Between Two Line Plots

Suppose you now have data for the average monthly data usage for two different user groups over the same period.

You want to compare these two groups to identify patterns.

First, let’s include this additional information:

data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Group_A_Data_Usage_GB': [12, 15, 18, 20, 23, 25, 27, 30, 28, 24, 19, 16],
    'Group_B_Data_Usage_GB': [14, 17, 15, 22, 24, 26, 28, 27, 29, 23, 20, 18]
}
df = pd.DataFrame(data)

Now, create two line plots for each user group and fill the area between them:

sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')
sns.lineplot(x='Month', y='Group_B_Data_Usage_GB', data=df, marker='o', color='green', label='Group B')

# Fill the area between the two line plots
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], color='purple', alpha=0.3)
plt.title('Comparison of Monthly Data Usage Between Two User Groups')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.legend()
plt.show()

Output:

Fill Area Between Two Line Plots

 

Handle Overlapping Lines and Intersection Points

Suppose your data overlaps at certain points throughout the year.

You’ll want to not only fill the area between these lines but also handle and highlight their intersection points.

First, let’s create some overlapping points:

data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Group_A_Data_Usage_GB': [12, 15, 18, 20, 23, 22, 27, 25, 28, 24, 19, 16],
    'Group_B_Data_Usage_GB': [14, 17, 15, 22, 21, 22, 28, 27, 29, 23, 20, 18]
}
df = pd.DataFrame(data)

Now, let’s plot the lines, fill the areas, and highlight the intersection points:

sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')
sns.lineplot(x='Month', y='Group_B_Data_Usage_GB', data=df, marker='o', color='green', label='Group B')
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], where=(df['Group_A_Data_Usage_GB'] > df['Group_B_Data_Usage_GB']), color='lightblue', alpha=0.3, interpolate=True)
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], where=(df['Group_A_Data_Usage_GB'] <= df['Group_B_Data_Usage_GB']), color='lightgreen', alpha=0.3, interpolate=True)
for i in range(1, len(df)):
    if (df.loc[i, 'Group_A_Data_Usage_GB'] - df.loc[i - 1, 'Group_A_Data_Usage_GB']) * (df.loc[i, 'Group_B_Data_Usage_GB'] - df.loc[i - 1, 'Group_B_Data_Usage_GB']) < 0:
        plt.plot(df.loc[i, 'Month'], df.loc[i, 'Group_A_Data_Usage_GB'], 'ro')
plt.title('Overlap and Intersection in Monthly Data Usage Between Two User Groups')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.legend()
plt.show()

Output:

Handle Overlapping Lines and Intersection Points

Intersection points, where the lines cross, are marked with red dots.

 

Fill Areas Based on Conditions

Imagine you want to highlight the months where one user group’s data usage exceeds the other’s.

Let’s use the following sample data:

data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Group_A_Data_Usage_GB': [12, 15, 18, 20, 23, 22, 27, 25, 28, 24, 19, 16],
    'Group_B_Data_Usage_GB': [14, 17, 15, 22, 21, 22, 28, 27, 29, 23, 20, 18]
}
df = pd.DataFrame(data)

Now, let’s create the plot and fill the areas based on our condition:

sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')
sns.lineplot(x='Month', y='Group_B_Data_Usage_GB', data=df, marker='o', color='green', label='Group B')

# Fill the area where Group A's data usage is more than Group B's
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], where=(df['Group_A_Data_Usage_GB'] > df['Group_B_Data_Usage_GB']), color='lightblue', alpha=0.3)

# Fill the area where Group B's data usage is more than Group A's
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], where=(df['Group_B_Data_Usage_GB'] > df['Group_A_Data_Usage_GB']), color='lightgreen', alpha=0.3)
plt.title('Conditional Area Filling Based on Data Usage Comparison')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.legend()
plt.show()

Output:

Fill Areas Based on Conditions

In the resulting plot, you will see that the areas where Group A’s data usage is higher than Group B’s are filled with light blue, and conversely, where Group B’s is higher, the area is filled with light green.

 

Custom Fill Patterns

Continuing with our existing dataset:

data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Group_A_Data_Usage_GB': [12, 15, 18, 20, 23, 22, 27, 25, 28, 24, 19, 16],
    'Group_B_Data_Usage_GB': [14, 17, 15, 22, 21, 22, 28, 27, 29, 23, 20, 18]
}
df = pd.DataFrame(data)

Now, let’s apply custom fill patterns to our plot:

sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')
sns.lineplot(x='Month', y='Group_B_Data_Usage_GB', data=df, marker='o', color='green', label='Group B')

# Custom fill patterns
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], 
                 where=(df['Group_A_Data_Usage_GB'] >= df['Group_B_Data_Usage_GB']), 
                 color='none', hatch='///', edgecolor='blue', linewidth=0.0, alpha=0.5)
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], 
                 where=(df['Group_A_Data_Usage_GB'] < df['Group_B_Data_Usage_GB']), 
                 color='none', hatch='\\\\\\', edgecolor='green', linewidth=0.0, alpha=0.5)
plt.title('Data Usage with Custom Fill Patterns')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.legend()
plt.show()

Output:

Custom Fill Patterns

The areas where Group A’s data usage is higher than Group B’s are marked with a blue forward slash pattern (‘///’), while the areas where Group B’s data usage is higher are indicated with a green backslash pattern (‘\\’).

The use of hatch patterns, as opposed to solid fills, adds a layer of texture to the visualization.

 

Fill Multiple Areas Between Different Sets of Lines

Imagine you want to compare data usage trends across multiple user groups, not just two.

Let’s add another group to our comparison:

data['Group_C_Data_Usage_GB'] = [10, 13, 17, 19, 20, 21, 26, 24, 23, 22, 18, 15]
df = pd.DataFrame(data)

Now, let’s fill multiple areas between these different sets of lines:

sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')
sns.lineplot(x='Month', y='Group_B_Data_Usage_GB', data=df, marker='o', color='green', label='Group B')
sns.lineplot(x='Month', y='Group_C_Data_Usage_GB', data=df, marker='o', color='red', label='Group C')

# Fill areas between the different sets of lines
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_B_Data_Usage_GB'], color='lightblue', alpha=0.3)
plt.fill_between(x=df['Month'], y1=df['Group_B_Data_Usage_GB'], y2=df['Group_C_Data_Usage_GB'], color='lightgreen', alpha=0.3)
plt.fill_between(x=df['Month'], y1=df['Group_A_Data_Usage_GB'], y2=df['Group_C_Data_Usage_GB'], color='lightcoral', alpha=0.3)
plt.title('Comparative Data Usage Across Multiple Groups')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.legend()
plt.show()

Output:

Fill Multiple Areas Between Different Sets of Lines

In the plot, you’ll see three distinct areas filled with different colors. Each area represents the space between two different sets of lines:

  • The light blue area shows the difference between Group A and Group B.
  • The light green area highlights the difference between Group B and Group C.
  • The light coral area illustrates the difference between Group A and Group C.

 

Gradient Fills

Let’s use the following dataset:

data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Group_A_Data_Usage_GB': [12, 15, 18, 20, 23, 22, 27, 25, 28, 24, 19, 16]
}
df = pd.DataFrame(data)

Now, let’s implement the gradient fill:

import numpy as np
sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')

# Creating a gradient fill
num_points = len(df['Month'])
gradient = np.linspace(0, 1, num_points)
colors = plt.cm.viridis(gradient)
for i in range(num_points - 1):
    plt.fill_between(df['Month'][i:i+2], df['Group_A_Data_Usage_GB'][i:i+2], color=colors[i], alpha=0.5)
plt.title('Data Usage with Gradient Fill')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.legend()
plt.show()

Output:

Gradient Fills

 

Implement Dynamic Upper and Lower Bounds for Filling

This method allows you to dynamically adjust the filled areas based on the underlying data’s statistical properties.

Let’s illustrate this by filling areas within one standard deviation above and below the mean of a user group’s data usage.

This will highlight the range within which most of the data points fall.

We’ll use the dataset for Group A:

data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    'Group_A_Data_Usage_GB': [12, 15, 18, 20, 23, 22, 27, 25, 28, 24, 19, 16]
}
df = pd.DataFrame(data)

Now, let’s calculate the mean and standard deviation, and create a plot with dynamic bounds:

# Calculating mean and standard deviation
mean_usage = df['Group_A_Data_Usage_GB'].mean()
std_deviation = df['Group_A_Data_Usage_GB'].std()
sns.lineplot(x='Month', y='Group_A_Data_Usage_GB', data=df, marker='o', color='blue', label='Group A')

# Fill between mean ± standard deviation
plt.fill_between(x=df['Month'], y1=mean_usage - std_deviation, y2=mean_usage + std_deviation, color='lightblue', alpha=0.3)
plt.title('Data Usage with Dynamic Bounds Based on Statistical Measures')
plt.xlabel('Month')
plt.ylabel('Data Usage (GB)')
plt.axhline(y=mean_usage, color='blue', linestyle='--')
plt.legend()
plt.show()

Output:

Implement Dynamic Upper and Lower Bounds for Filling

This shaded area represents the typical range of data usage, with the dashed line indicating the mean. The fill dynamically adjusts to the data.

Leave a Reply

Your email address will not be published. Required fields are marked *