Convert NumPy array to Pandas Series

Imagine you have a dataset stored in a NumPy array. You might want to convert this to a Pandas Series for better analysis.

By the end of this tutorial, you’ll convert NumPy arrays to Pandas Series using real-world data.

 

 

Using pd.Series()

You can use the pd.Series() function to convert a NumPy array to a Pandas Series:

import numpy as np
import pandas as pd
numpy_array = np.array([120, 450, 200, 300, 180])

# Convert NumPy array to Pandas Series
pandas_series = pd.Series(numpy_array)
print(pandas_series)

Output:

0    120
1    450
2    200
3    300
4    180
dtype: int64

 

Preserve the Index while Converting

Imagine you have a list of user IDs corresponding to each call duration. Let’s see how you can preserve this user index during the conversion:

# Sample data
user_ids = ["U101", "U102", "U103", "U104", "U105"]

# Convert NumPy array to Pandas Series while preserving the user IDs as indices
pandas_series_with_index = pd.Series(numpy_array, index=user_ids)
print(pandas_series_with_index)

Output:

U101    120
U102    450
U103    200
U104    300
U105    180
dtype: int64

Now, you not only have the call durations as values but also the respective user IDs as indices.

 

Convert 2D NumPy Array to Multiple Pandas Series

When converting this to Pandas, each column can be viewed as an individual Series.

Let’s explore this using a 2D array containing both call durations and data usage:

# Sample data: call durations and data usage (in MB) from a telecom company
numpy_2d_array = np.array([[120, 450, 200, 300, 180],  # Call durations
                          [50, 120, 300, 250, 100]])  # Data usage

# Convert 2D NumPy array to Pandas DataFrame
pandas_dataframe = pd.DataFrame(numpy_2d_array.T, columns=["Call Duration", "Data Usage (MB)"], index=user_ids)
print(pandas_dataframe)

Output:

     Call Duration  Data Usage (MB)
U101           120               50
U102           450              120
U103           200              300
U104           300              250
U105           180              100

The .T method is used to transpose the original 2D array, making it align correctly with our desired DataFrame structure. Each column in the DataFrame represents a Pandas Series.

 

Convert Multi-dimensional Array to Pandas Series

You can convert multi-dimensional NumPy array by flattening it and then converting it to a Pandas Series:

# Sample data: call durations and data usage (in MB) for two days from a telecom company
numpy_2d_array = np.array([[120, 200, 300],  # Day 1 call durations
                          [50, 70, 100],    # Day 1 data usage
                          [140, 220, 280],  # Day 2 call durations
                          [60, 80, 90]])    # Day 2 data usage

# Flatten the 2D NumPy array
flattened_array = numpy_2d_array.flatten()

# Convert flattened NumPy array to Pandas Series
pandas_series_flattened = pd.Series(flattened_array)
print(pandas_series_flattened)

Output:

0     120
1     200
2     300
3      50
4      70
5     100
6     140
7     220
8     280
9      60
10     80
11     90
dtype: int64

 

Setting Column Names during Conversion

When you have a 2D NumPy array and you want to convert each row (or column) into individual Pandas Series, naming each Series can provide clarity about the data it represents.

Here’s how you can achieve this using our telecom dataset:

# Convert each row of the 2D NumPy array to separate Pandas Series and set names
series_names = ["Call Duration", "Data Usage (MB)"]
series_list = [pd.Series(numpy_2d_array[i], name=series_names[i]) for i in range(numpy_2d_array.shape[0])]

# Displaying the series
for series in series_list:
    print(series.name)
    print(series)
    print("\n")

Output:

Call Duration
0    120
1    450
2    200
3    300
4    180
Name: Call Duration, dtype: int32


Data Usage (MB)
0     50
1    120
2    300
3    250
4    100
Name: Data Usage (MB), dtype: int32

 

Data Type Preservation

When converting between NumPy arrays and Pandas Series, it’s essential to ensure the preservation of data types.

Let’s start with a simple example where our telecom data, representing the number of SMS sent by users, is stored as integers:

# Sample data: SMS counts from a telecom company
numpy_array_int = np.array([20, 25, 30, 15, 28])

# Convert NumPy array to Pandas Series
sms_series = pd.Series(numpy_array_int)
print(sms_series)
print("Data type:", sms_series.dtype)

Output:

0    20
1    25
2    30
3    15
4    28
dtype: int64
Data type: int64

As you can see, the integer data type from the NumPy array (int64) was seamlessly carried over to the Pandas Series.

Explicitly Setting the Data Type during Conversion

You can use the  dtype parameter to set the data type of the Pandas series.

# Convert NumPy array to Pandas Series with explicit data type
sms_series_float = pd.Series(numpy_array_int, dtype='float64')

print(sms_series_float)
print("Data type:", sms_series_float.dtype)

Output:

0    20.0
1    25.0
2    30.0
3    15.0
4    28.0
dtype: float64
Data type: float64

Here we set the integers to floating-point numbers in the Pandas Series.

 

Common Errors During Conversion

When transforming NumPy arrays to Pandas Series, developers might face a few common challenges.

Shape Mismatch

If you’re trying to provide custom indices during conversion, ensure that the length of indices matches the length of the NumPy array.

# Sample data: call durations from a telecom company
numpy_array = np.array([120, 450, 200])

# Incorrect user IDs list
user_ids = ["U101", "U102"]

# This will throw an error
try:
    series_with_index = pd.Series(numpy_array, index=user_ids)
except ValueError as e:
    print(f"Error: {e}")

Output:

Error: Length of values (3) does not match length of index (2)

Inappropriate Data Type Conversion

Directly setting an inappropriate data type might lead to data loss or errors.

# Sample data: data usage (in MB) from a telecom company
numpy_array_float = np.array([50.5, 120.7, 300.3])

# Incorrectly setting data type to integer will truncate decimals
series_int = pd.Series(numpy_array_float, dtype='int64')
print(series_int)

Output:

ValueError: Trying to coerce float values to integers

The correct method is to set the conversion to float64:

series_int = pd.Series(numpy_array_float, dtype='float64')
Leave a Reply

Your email address will not be published. Required fields are marked *