Handle NaN values in Seaborn heatmap

One common challenge faced when creating heatmaps is the presence of NaN (Not a Number) values.

These NaNs can impact the visual output, leading to misleading interpretations or an unclear representation of the data.

In this tutorial, you’ll learn how to handle NaN values in Seaborn heatmaps.

Whether you are dealing with sparse NaNs that can be easily dropped, or require more nuanced approaches like filling, interpolating, or visually distinguishing NaNs, this tutorial covers it all.

 

 

Remove NaN Values

In scenarios where NaNs are sparse and not critical to your analysis, one effective method is to remove them using the dropna() method.

First, you’ll need to import the necessary libraries and create a sample dataset.

import seaborn as sns
import pandas as pd
import numpy as np
data = {
    'Call Quality': [3.5, np.nan, 4.2, 3.8],
    'Internet Speed': [50, 45, np.nan, 60],
    'Customer Satisfaction': [4.0, 3.6, np.nan, 4.5]
}
df = pd.DataFrame(data)
print(df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0           3.5            50.0                    4.0
1           NaN            45.0                    3.6
2           4.2             NaN                    NaN
3           3.8            60.0                    4.5

Note the NaN values in the dataset.

Next, let’s remove these NaN values:

cleaned_df = df.dropna()
print(cleaned_df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0           3.5            50.0                    4.0
3           3.8            60.0                    4.5

In the above step, dropna() removes rows with any NaN values.

Let’s plot the heatmap using Seaborn after we’ve dropped the NaN values.

This will help visualize how the removal of NaN values impacts the heatmap representation.

import matplotlib.pyplot as plt
plt.figure(figsize=(8, 4))
sns.heatmap(cleaned_df, annot=True, cmap='viridis')
plt.title('Heatmap (Without NaN Values)')
plt.show()

Output:

Remove NaN Values

 

Fill NaNs with a Specific Value

In certain situations, you want to retain the original size of your dataset, especially when the presence of each row is significant for your analysis.

In such cases, rather than removing NaN values, you can fill them with a specific value.

The .fillna() function in Pandas allows you to do this.

filled_df = df.fillna(0)
print(filled_df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0           3.5            50.0                    4.0
1           0.0            45.0                    3.6
2           4.2             0.0                    0.0
3           3.8            60.0                    4.5

In the code above, fillna(0) replaces all NaN values in the dataset with 0.

Now, let’s create a heatmap with the NaN values filled:

plt.figure(figsize=(8, 4))
sns.heatmap(filled_df, annot=True, cmap='viridis')
plt.title('Heatmap (NaNs Filled with 0)')
plt.show()

Output:

Fill NaNs with a Specific Value

 

Interpolate NaNs

Interpolation is another method to handle NaN values in datasets, especially when a linear relationship can be assumed between data points.

By interpolating, you estimate the NaN values based on neighboring data points.

This method is useful in time-series data or when the data points have a logical sequence.

interpolated_df = df.interpolate()
print(interpolated_df)

Output:

   Call Quality  Internet Speed  Customer Satisfaction
0          3.50            50.0                   4.00
1          3.85            45.0                   3.60
2          4.20            52.5                   4.05
3          3.80            60.0                   4.50

Here, interpolate() function calculates the NaN values by estimating them based on adjacent values. For example, in the ‘Call Quality’ column, the NaN value is replaced with an average of its neighboring values.

Now, let’s visualize this interpolated data using a heatmap:

# Plotting the heatmap with interpolated values
plt.figure(figsize=(8, 4))
sns.heatmap(interpolated_df, annot=True, cmap='viridis')
plt.title('Heatmap (Interpolated NaN Values)')
plt.show()

Output:

Interpolate NaNs

 

Mask NaNs in the Heatmap

Masking NaNs in a heatmap allows you to visually distinguish these values from the rest of the data.

This method is useful when you want to maintain the original dataset’s structure, including the NaN values, while still providing a clear visual representation of where data is missing.

nan_mask = df.isna()
plt.figure(figsize=(8, 4))
sns.heatmap(df, annot=True, cmap='viridis', mask=nan_mask)
plt.title('Heatmap (NaNs Masked)')
plt.show()

Output:

Mask NaNs in the Heatmap

The masked areas do not have annotations and color, clearly indicating the absence of data.

 

Use a Different Color for NaNs

Another method to handle NaNs in heatmaps is using a different color for these values.

This method highlights the NaNs distinctly and makes them easily identifiable.

First, we’ll prepare a colormap that distinguishes NaNs:

from matplotlib.colors import ListedColormap

# Custom colormap: NaNs will be shown in grey
cmap = ListedColormap(sns.color_palette("viridis", as_cmap=True).colors + [(0.75, 0.75, 0.75)])

# Preparing the data: NaNs are set to a unique number
unique_number_for_nans = -1
heatmap_data = df.fillna(unique_number_for_nans)
plt.figure(figsize=(8, 4))
sns.heatmap(heatmap_data, annot=True, cmap=cmap, cbar=False)
plt.title('Heatmap (Different Color for NaNs)')
plt.show()

Output:

Use a Different Color for NaNs

Leave a Reply

Your email address will not be published. Required fields are marked *