Handle NaN values in Seaborn heatmap

One common challenge faced when creating heatmaps is the presence of NaN (Not a Number) values.

These NaNs can impact the visual output, leading to misleading interpretations or an unclear representation of the data.

In this tutorial, you’ll learn how to handle NaN values in Seaborn heatmaps.

Whether you are dealing with sparse NaNs that can be easily dropped, or require more nuanced approaches like filling, interpolating, or visually distinguishing NaNs, this tutorial covers it all.

Table of Contents hide

1 Remove NaN Values
2 Fill NaNs with a Specific Value
3 Interpolate NaNs
4 Mask NaNs in the Heatmap
5 Use a Different Color for NaNs

Remove NaN Values

In scenarios where NaNs are sparse and not critical to your analysis, one effective method is to remove them using the dropna() method.

First, you’ll need to import the necessary libraries and create a sample dataset.

import seaborn as sns
import pandas as pd
import numpy as np
data = {
    'Call Quality': [3.5, np.nan, 4.2, 3.8],
    'Internet Speed': [50, 45, np.nan, 60],
    'Customer Satisfaction': [4.0, 3.6, np.nan, 4.5]
}
df = pd.DataFrame(data)
print(df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0           3.5            50.0                    4.0
1           NaN            45.0                    3.6
2           4.2             NaN                    NaN
3           3.8            60.0                    4.5

Note the NaN values in the dataset.

Next, let’s remove these NaN values:

cleaned_df = df.dropna()
print(cleaned_df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0           3.5            50.0                    4.0
3           3.8            60.0                    4.5

In the above step, dropna() removes rows with any NaN values.

Let’s plot the heatmap using Seaborn after we’ve dropped the NaN values.

This will help visualize how the removal of NaN values impacts the heatmap representation.

import matplotlib.pyplot as plt
plt.figure(figsize=(8, 4))
sns.heatmap(cleaned_df, annot=True, cmap='viridis')
plt.title('Heatmap (Without NaN Values)')
plt.show()

Output:

Fill NaNs with a Specific Value

In certain situations, you want to retain the original size of your dataset, especially when the presence of each row is significant for your analysis.

In such cases, rather than removing NaN values, you can fill them with a specific value.

The .fillna() function in Pandas allows you to do this.

filled_df = df.fillna(0)
print(filled_df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0           3.5            50.0                    4.0
1           0.0            45.0                    3.6
2           4.2             0.0                    0.0
3           3.8            60.0                    4.5

In the code above, fillna(0) replaces all NaN values in the dataset with 0.

Now, let’s create a heatmap with the NaN values filled:

plt.figure(figsize=(8, 4))
sns.heatmap(filled_df, annot=True, cmap='viridis')
plt.title('Heatmap (NaNs Filled with 0)')
plt.show()

Output:

Interpolate NaNs

Interpolation is another method to handle NaN values in datasets, especially when a linear relationship can be assumed between data points.

By interpolating, you estimate the NaN values based on neighboring data points.

This method is useful in time-series data or when the data points have a logical sequence.

interpolated_df = df.interpolate()
print(interpolated_df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0          3.50            50.0                   4.00
1          3.85            45.0                   3.60
2          4.20            52.5                   4.05
3          3.80            60.0                   4.50

Here, interpolate() function calculates the NaN values by estimating them based on adjacent values. For example, in the ‘Call Quality’ column, the NaN value is replaced with an average of its neighboring values.

Now, let’s visualize this interpolated data using a heatmap:

# Plotting the heatmap with interpolated values
plt.figure(figsize=(8, 4))
sns.heatmap(interpolated_df, annot=True, cmap='viridis')
plt.title('Heatmap (Interpolated NaN Values)')
plt.show()

Output:

Mask NaNs in the Heatmap

Masking NaNs in a heatmap allows you to visually distinguish these values from the rest of the data.

This method is useful when you want to maintain the original dataset’s structure, including the NaN values, while still providing a clear visual representation of where data is missing.

nan_mask = df.isna()
plt.figure(figsize=(8, 4))
sns.heatmap(df, annot=True, cmap='viridis', mask=nan_mask)
plt.title('Heatmap (NaNs Masked)')
plt.show()

Output:

The masked areas do not have annotations and color, clearly indicating the absence of data.

Use a Different Color for NaNs

Another method to handle NaNs in heatmaps is using a different color for these values.

This method highlights the NaNs distinctly and makes them easily identifiable.

First, we’ll prepare a colormap that distinguishes NaNs:

from matplotlib.colors import ListedColormap

# Custom colormap: NaNs will be shown in grey
cmap = ListedColormap(sns.color_palette("viridis", as_cmap=True).colors + [(0.75, 0.75, 0.75)])

# Preparing the data: NaNs are set to a unique number
unique_number_for_nans = -1
heatmap_data = df.fillna(unique_number_for_nans)
plt.figure(figsize=(8, 4))
sns.heatmap(heatmap_data, annot=True, cmap=cmap, cbar=False)
plt.title('Heatmap (Different Color for NaNs)')
plt.show()

Output:

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Remove NaN Values

Fill NaNs with a Specific Value

Interpolate NaNs

Mask NaNs in the Heatmap

Use a Different Color for NaNs

Related posts

Leave a Reply Cancel reply