Convert Pandas DataFrame Column to List
One common task that developers face is extracting data from a DataFrame column and converting it into a list.
Like when integrating with legacy code or libraries that don’t support DataFrames, having data in list format can bridge the compatibility gap.
In this tutorial, you’ll learn how to convert a Pandas DataFrame column into a list.
Using the tolist()
Method
The tolist()
method allows you to convert a DataFrame column to a list.
First, let’s start by importing Pandas and creating a sample DataFrame:
import pandas as pd data = { 'CustomerID': [101, 102, 103, 104], 'PlanType': ['Basic', 'Premium', 'Basic', 'Unlimited'], 'MonthlyCharge': [30, 50, 30, 70] } df = pd.DataFrame(data) print(df)
Output:
CustomerID PlanType MonthlyCharge 0 101 Basic 30 1 102 Premium 50 2 103 Basic 30 3 104 Unlimited 70
To convert the ‘MonthlyCharge’ column into a list.
charges_list = df['MonthlyCharge'].tolist() print(charges_list)
Output:
[30, 50, 30, 70]
Using Python list()
Method
The list()
method allows you to convert Pandas DataFrame column to list.
To convert the 'PlanType' column into a list using the list()
constructor:
plan_list = list(df['PlanType']) print(plan_list)
Output:
['Basic', 'Premium', 'Basic', 'Unlimited']
Converting Multiple Columns to Separate Lists
You can use either the tolist()
method or the native list()
constructor.
To convert both the ‘CustomerID’ and ‘PlanType’ columns:
customer_ids = df['CustomerID'].tolist() plan_types = df['PlanType'].tolist() print(customer_ids) print(plan_types)
Output:
[101, 102, 103, 104] ['Basic', 'Premium', 'Basic', 'Unlimited']
Using list()
constructor
Similarly, with the list()
constructor:
customer_ids_native = list(df['CustomerID']) plan_types_native = list(df['PlanType']) print(customer_ids_native) print(plan_types_native)
Output:
[101, 102, 103, 104] ['Basic', 'Premium', 'Basic', 'Unlimited']
Converting Multiple Columns to a List of Tuples or Lists
To convert the ‘CustomerID’ and ‘MonthlyCharge’ columns into a list of tuples:
tuples_list = list(df[['CustomerID', 'MonthlyCharge']].itertuples(index=False, name=None)) print(tuples_list)
Output:
[(101, 30), (102, 50), (103, 30), (104, 70)]
This method makes use of itertuples()
which iterates over DataFrame rows as namedtuples.
By setting index=False
, we exclude the index from the result, and with name=None
, we get plain tuples.
Creating a List of Lists from DataFrame Columns
You can use the values
attribute to extract the data in the DataFrame as an array, and tolist()
then convert this array into a list of lists:
lists_list = df[['CustomerID', 'MonthlyCharge']].values.tolist() print(lists_list)
Output:
[[101, 30], [102, 50], [103, 30], [104, 70]]
Converting DataFrame Index to List
The most straightforward way to convert the index of a DataFrame to a list is to use the tolist()
method directly on the index object.
Considering our dataset:
import pandas as pd data = { 'CustomerID': [101, 102, 103, 104], 'PlanType': ['Basic', 'Premium', 'Basic', 'Unlimited'], 'MonthlyCharge': [30, 50, 30, 70] } df = pd.DataFrame(data)
To convert the DataFrame’s index to a list:
index_list = df.index.tolist() print(index_list)
Output:
[0, 1, 2, 3]
As expected, since we didn’t specify a custom index for our DataFrame, the default integer index is returned.
Performance Comparison
Let’s perform a benchmark test to compare the efficiency of tolist()
and list()
:
import pandas as pd import numpy as np import time df_large = pd.DataFrame({ 'Numbers': np.random.randint(1, 100, 1_0000_0000) }) start_time_tolist = time.time() list_using_tolist = df_large['Numbers'].tolist() end_time_tolist = time.time() print(f"Time taken using tolist(): {end_time_tolist - start_time_tolist:.6f} seconds") start_time_list = time.time() list_using_list = list(df_large['Numbers']) end_time_list = time.time() print(f"Time taken using list() constructor: {end_time_list - start_time_list:.6f} seconds")
Output:
Time taken using tolist(): 1.100703 seconds Time taken using list() constructor: 10.370464 seconds
The tolist()
method is significantly faster.
Real-world use case
Imagine you’re a data analyst at a large telecom company. The marketing department has noticed an uptick in the number of customers leaving the company.
They need a detailed report analyzing the last communication the company had with these customers, specifically:
- The list of customer IDs who have left in the last three months.
- The dates of the last communication with these customers.
- A list of offers or packages discussed during these communications.
The data is stored in a DataFrame with the following columns: CustomerID
, ChurnDate
, LastCommunicationDate
, and OffersDiscussed
(which contains a list of offers discussed during the communication).
Filtering Relevant Data: First, filter out customers who have left in the last three months.
current_date = pd.Timestamp.now() three_months_ago = current_date - pd.DateOffset(months=3) churned_customers = df[df['ChurnDate'] > three_months_ago]
Converting Columns to Lists: Convert the CustomerID
and LastCommunicationDate
columns to lists.
churned_ids = churned_customers['CustomerID'].tolist() last_communication_dates = churned_customers['LastCommunicationDate'].dt.strftime('%Y-%m-%d').tolist()
Handling Complex Data Types: Flatten the OffersDiscussed
column to get a consolidated list of all offers discussed with churned customers.
all_offers = [offer for sublist in churned_customers['OffersDiscussed'] for offer in sublist]
With this data at hand, the analyst can now provide the marketing department with insights into the last interactions the company had with churned customers.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.