Convert Pandas DataFrame Column to List

One common task that developers face is extracting data from a DataFrame column and converting it into a list.

Like when integrating with legacy code or libraries that don’t support DataFrames, having data in list format can bridge the compatibility gap.

In this tutorial, you’ll learn how to convert a Pandas DataFrame column into a list.

Table of Contents hide

1 Using the tolist() Method
2 Using Python list() Method
3 Converting Multiple Columns to Separate Lists
- 3.1 Using list() constructor
4 Converting Multiple Columns to a List of Tuples or Lists
- 4.1 Creating a List of Lists from DataFrame Columns
5 Converting DataFrame Index to List
6 Performance Comparison
7 Real-world use case

Using the `tolist()` Method

The tolist() method allows you to convert a DataFrame column to a list.

First, let’s start by importing Pandas and creating a sample DataFrame:

import pandas as pd
data = {
    'CustomerID': [101, 102, 103, 104],
    'PlanType': ['Basic', 'Premium', 'Basic', 'Unlimited'],
    'MonthlyCharge': [30, 50, 30, 70]
}
df = pd.DataFrame(data)
print(df)

Output:

   CustomerID   PlanType  MonthlyCharge
0         101      Basic             30
1         102    Premium             50
2         103      Basic             30
3         104  Unlimited             70

To convert the ‘MonthlyCharge’ column into a list.

charges_list = df['MonthlyCharge'].tolist()
print(charges_list)

Output:

[30, 50, 30, 70]

Using Python `list()` Method

The list() method allows you to convert Pandas DataFrame column to list.

To convert the 'PlanType' column into a list using the list() constructor:

plan_list = list(df['PlanType'])
print(plan_list)

Output:

['Basic', 'Premium', 'Basic', 'Unlimited']

Converting Multiple Columns to Separate Lists

You can use either the tolist() method or the native list() constructor.

To convert both the ‘CustomerID’ and ‘PlanType’ columns:

customer_ids = df['CustomerID'].tolist()
plan_types = df['PlanType'].tolist()
print(customer_ids)
print(plan_types)

Output:

[101, 102, 103, 104]
['Basic', 'Premium', 'Basic', 'Unlimited']

Using `list()` constructor

Similarly, with the list() constructor:

customer_ids_native = list(df['CustomerID'])
plan_types_native = list(df['PlanType'])
print(customer_ids_native)
print(plan_types_native)

Output:

[101, 102, 103, 104]
['Basic', 'Premium', 'Basic', 'Unlimited']

Converting Multiple Columns to a List of Tuples or Lists

To convert the ‘CustomerID’ and ‘MonthlyCharge’ columns into a list of tuples:

tuples_list = list(df[['CustomerID', 'MonthlyCharge']].itertuples(index=False, name=None))
print(tuples_list)

Output:

[(101, 30), (102, 50), (103, 30), (104, 70)]

This method makes use of itertuples() which iterates over DataFrame rows as namedtuples.

By setting index=False, we exclude the index from the result, and with name=None, we get plain tuples.

Creating a List of Lists from DataFrame Columns

You can use the values attribute to extract the data in the DataFrame as an array, and tolist() then convert this array into a list of lists:

lists_list = df[['CustomerID', 'MonthlyCharge']].values.tolist()
print(lists_list)

Output:

[[101, 30], [102, 50], [103, 30], [104, 70]]

Converting DataFrame Index to List

The most straightforward way to convert the index of a DataFrame to a list is to use the tolist() method directly on the index object.

Considering our dataset:

import pandas as pd
data = {
    'CustomerID': [101, 102, 103, 104],
    'PlanType': ['Basic', 'Premium', 'Basic', 'Unlimited'],
    'MonthlyCharge': [30, 50, 30, 70]
}
df = pd.DataFrame(data)

To convert the DataFrame’s index to a list:

index_list = df.index.tolist()
print(index_list)

Output:

[0, 1, 2, 3]

As expected, since we didn’t specify a custom index for our DataFrame, the default integer index is returned.

Performance Comparison

Let’s perform a benchmark test to compare the efficiency of tolist() and list():

import pandas as pd
import numpy as np
import time

df_large = pd.DataFrame({
    'Numbers': np.random.randint(1, 100, 1_0000_0000)
})

start_time_tolist = time.time()
list_using_tolist = df_large['Numbers'].tolist()
end_time_tolist = time.time()
print(f"Time taken using tolist(): {end_time_tolist - start_time_tolist:.6f} seconds")

start_time_list = time.time()
list_using_list = list(df_large['Numbers'])
end_time_list = time.time()
print(f"Time taken using list() constructor: {end_time_list - start_time_list:.6f} seconds")

Output:

Time taken using tolist(): 1.100703 seconds
Time taken using list() constructor: 10.370464 seconds

The tolist() method is significantly faster.

Real-world use case

Imagine you’re a data analyst at a large telecom company. The marketing department has noticed an uptick in the number of customers leaving the company.

They need a detailed report analyzing the last communication the company had with these customers, specifically:

The list of customer IDs who have left in the last three months.
The dates of the last communication with these customers.
A list of offers or packages discussed during these communications.

The data is stored in a DataFrame with the following columns: CustomerID, ChurnDate, LastCommunicationDate, and OffersDiscussed (which contains a list of offers discussed during the communication).

Filtering Relevant Data: First, filter out customers who have left in the last three months.

current_date = pd.Timestamp.now()
three_months_ago = current_date - pd.DateOffset(months=3)
churned_customers = df[df['ChurnDate'] > three_months_ago]

Converting Columns to Lists: Convert the CustomerID and LastCommunicationDate columns to lists.

churned_ids = churned_customers['CustomerID'].tolist()
last_communication_dates = churned_customers['LastCommunicationDate'].dt.strftime('%Y-%m-%d').tolist()

Handling Complex Data Types: Flatten the OffersDiscussed column to get a consolidated list of all offers discussed with churned customers.

all_offers = [offer for sublist in churned_customers['OffersDiscussed'] for offer in sublist]

With this data at hand, the analyst can now provide the marketing department with insights into the last interactions the company had with churned customers.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Using the tolist() Method

Using Python list() Method

Converting Multiple Columns to Separate Lists

Using list() constructor

Converting Multiple Columns to a List of Tuples or Lists

Creating a List of Lists from DataFrame Columns

Converting DataFrame Index to List

Performance Comparison

Real-world use case

Related posts

Leave a Reply Cancel reply

Using the `tolist()` Method

Using Python `list()` Method

Using `list()` constructor