Convert NumPy array to Pandas Series
Imagine you have a dataset stored in a NumPy array. You might want to convert this to a Pandas Series for better analysis.
By the end of this tutorial, you’ll convert NumPy arrays to Pandas Series using real-world data.
Using pd.Series()
You can use the pd.Series()
function to convert a NumPy array to a Pandas Series:
import numpy as np import pandas as pd numpy_array = np.array([120, 450, 200, 300, 180]) # Convert NumPy array to Pandas Series pandas_series = pd.Series(numpy_array) print(pandas_series)
Output:
0 120 1 450 2 200 3 300 4 180 dtype: int64
Preserve the Index while Converting
Imagine you have a list of user IDs corresponding to each call duration. Let’s see how you can preserve this user index during the conversion:
# Sample data user_ids = ["U101", "U102", "U103", "U104", "U105"] # Convert NumPy array to Pandas Series while preserving the user IDs as indices pandas_series_with_index = pd.Series(numpy_array, index=user_ids) print(pandas_series_with_index)
Output:
U101 120 U102 450 U103 200 U104 300 U105 180 dtype: int64
Now, you not only have the call durations as values but also the respective user IDs as indices.
Convert 2D NumPy Array to Multiple Pandas Series
When converting this to Pandas, each column can be viewed as an individual Series.
Let’s explore this using a 2D array containing both call durations and data usage:
# Sample data: call durations and data usage (in MB) from a telecom company numpy_2d_array = np.array([[120, 450, 200, 300, 180], # Call durations [50, 120, 300, 250, 100]]) # Data usage # Convert 2D NumPy array to Pandas DataFrame pandas_dataframe = pd.DataFrame(numpy_2d_array.T, columns=["Call Duration", "Data Usage (MB)"], index=user_ids) print(pandas_dataframe)
Output:
Call Duration Data Usage (MB) U101 120 50 U102 450 120 U103 200 300 U104 300 250 U105 180 100
The .T
method is used to transpose the original 2D array, making it align correctly with our desired DataFrame structure. Each column in the DataFrame represents a Pandas Series.
Convert Multi-dimensional Array to Pandas Series
You can convert multi-dimensional NumPy array by flattening it and then converting it to a Pandas Series:
# Sample data: call durations and data usage (in MB) for two days from a telecom company numpy_2d_array = np.array([[120, 200, 300], # Day 1 call durations [50, 70, 100], # Day 1 data usage [140, 220, 280], # Day 2 call durations [60, 80, 90]]) # Day 2 data usage # Flatten the 2D NumPy array flattened_array = numpy_2d_array.flatten() # Convert flattened NumPy array to Pandas Series pandas_series_flattened = pd.Series(flattened_array) print(pandas_series_flattened)
Output:
0 120 1 200 2 300 3 50 4 70 5 100 6 140 7 220 8 280 9 60 10 80 11 90 dtype: int64
Setting Column Names during Conversion
When you have a 2D NumPy array and you want to convert each row (or column) into individual Pandas Series, naming each Series can provide clarity about the data it represents.
Here’s how you can achieve this using our telecom dataset:
# Convert each row of the 2D NumPy array to separate Pandas Series and set names series_names = ["Call Duration", "Data Usage (MB)"] series_list = [pd.Series(numpy_2d_array[i], name=series_names[i]) for i in range(numpy_2d_array.shape[0])] # Displaying the series for series in series_list: print(series.name) print(series) print("\n")
Output:
Call Duration 0 120 1 450 2 200 3 300 4 180 Name: Call Duration, dtype: int32 Data Usage (MB) 0 50 1 120 2 300 3 250 4 100 Name: Data Usage (MB), dtype: int32
Data Type Preservation
When converting between NumPy arrays and Pandas Series, it’s essential to ensure the preservation of data types.
Let’s start with a simple example where our telecom data, representing the number of SMS sent by users, is stored as integers:
# Sample data: SMS counts from a telecom company numpy_array_int = np.array([20, 25, 30, 15, 28]) # Convert NumPy array to Pandas Series sms_series = pd.Series(numpy_array_int) print(sms_series) print("Data type:", sms_series.dtype)
Output:
0 20 1 25 2 30 3 15 4 28 dtype: int64 Data type: int64
As you can see, the integer data type from the NumPy array (int64
) was seamlessly carried over to the Pandas Series.
Explicitly Setting the Data Type during Conversion
You can use the dtype
parameter to set the data type of the Pandas series.
# Convert NumPy array to Pandas Series with explicit data type sms_series_float = pd.Series(numpy_array_int, dtype='float64') print(sms_series_float) print("Data type:", sms_series_float.dtype)
Output:
0 20.0 1 25.0 2 30.0 3 15.0 4 28.0 dtype: float64 Data type: float64
Here we set the integers to floating-point numbers in the Pandas Series.
Common Errors During Conversion
When transforming NumPy arrays to Pandas Series, developers might face a few common challenges.
Shape Mismatch
If you’re trying to provide custom indices during conversion, ensure that the length of indices matches the length of the NumPy array.
# Sample data: call durations from a telecom company numpy_array = np.array([120, 450, 200]) # Incorrect user IDs list user_ids = ["U101", "U102"] # This will throw an error try: series_with_index = pd.Series(numpy_array, index=user_ids) except ValueError as e: print(f"Error: {e}")
Output:
Error: Length of values (3) does not match length of index (2)
Inappropriate Data Type Conversion
Directly setting an inappropriate data type might lead to data loss or errors.
# Sample data: data usage (in MB) from a telecom company numpy_array_float = np.array([50.5, 120.7, 300.3]) # Incorrectly setting data type to integer will truncate decimals series_int = pd.Series(numpy_array_float, dtype='int64') print(series_int)
Output:
ValueError: Trying to coerce float values to integers
The correct method is to set the conversion to float64:
series_int = pd.Series(numpy_array_float, dtype='float64')
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.