Convert Pandas DataFrame Column to List

One common task that developers face is extracting data from a DataFrame column and converting it into a list.

Like when integrating with legacy code or libraries that don’t support DataFrames, having data in list format can bridge the compatibility gap.

In this tutorial, you’ll learn how to convert a Pandas DataFrame column into a list.

 

 

Using the tolist() Method

The tolist() method allows you to convert a DataFrame column to a list.

First, let’s start by importing Pandas and creating a sample DataFrame:

import pandas as pd
data = {
    'CustomerID': [101, 102, 103, 104],
    'PlanType': ['Basic', 'Premium', 'Basic', 'Unlimited'],
    'MonthlyCharge': [30, 50, 30, 70]
}
df = pd.DataFrame(data)
print(df)

Output:

   CustomerID   PlanType  MonthlyCharge
0         101      Basic             30
1         102    Premium             50
2         103      Basic             30
3         104  Unlimited             70

To convert the ‘MonthlyCharge’ column into a list.

charges_list = df['MonthlyCharge'].tolist()
print(charges_list)

Output:

[30, 50, 30, 70]

 

Using Python list() Method

The list() method allows you to convert Pandas DataFrame column to list.

To convert the 'PlanType' column into a list using the list() constructor:
plan_list = list(df['PlanType'])
print(plan_list)

Output:

['Basic', 'Premium', 'Basic', 'Unlimited']

 

Converting Multiple Columns to Separate Lists

You can use either the tolist() method or the native list() constructor.

To convert both the ‘CustomerID’ and ‘PlanType’ columns:

customer_ids = df['CustomerID'].tolist()
plan_types = df['PlanType'].tolist()
print(customer_ids)
print(plan_types)

Output:

[101, 102, 103, 104]
['Basic', 'Premium', 'Basic', 'Unlimited']

Using list() constructor

Similarly, with the list() constructor:

customer_ids_native = list(df['CustomerID'])
plan_types_native = list(df['PlanType'])
print(customer_ids_native)
print(plan_types_native)

Output:

[101, 102, 103, 104]
['Basic', 'Premium', 'Basic', 'Unlimited']

 

Converting Multiple Columns to a List of Tuples or Lists

To convert the ‘CustomerID’ and ‘MonthlyCharge’ columns into a list of tuples:

tuples_list = list(df[['CustomerID', 'MonthlyCharge']].itertuples(index=False, name=None))
print(tuples_list)

Output:

[(101, 30), (102, 50), (103, 30), (104, 70)]

This method makes use of itertuples() which iterates over DataFrame rows as namedtuples.

By setting index=False, we exclude the index from the result, and with name=None, we get plain tuples.

Creating a List of Lists from DataFrame Columns

You can use the values attribute to extract the data in the DataFrame as an array, and tolist() then convert this array into a list of lists:

lists_list = df[['CustomerID', 'MonthlyCharge']].values.tolist()
print(lists_list)

Output:

[[101, 30], [102, 50], [103, 30], [104, 70]]

 

Converting DataFrame Index to List

The most straightforward way to convert the index of a DataFrame to a list is to use the tolist() method directly on the index object.

Considering our dataset:

import pandas as pd
data = {
    'CustomerID': [101, 102, 103, 104],
    'PlanType': ['Basic', 'Premium', 'Basic', 'Unlimited'],
    'MonthlyCharge': [30, 50, 30, 70]
}
df = pd.DataFrame(data)

To convert the DataFrame’s index to a list:

index_list = df.index.tolist()
print(index_list)

Output:

[0, 1, 2, 3]

As expected, since we didn’t specify a custom index for our DataFrame, the default integer index is returned.

 

Performance Comparison

Let’s perform a benchmark test to compare the efficiency of tolist() and list():

import pandas as pd
import numpy as np
import time

df_large = pd.DataFrame({
    'Numbers': np.random.randint(1, 100, 1_0000_0000)
})

start_time_tolist = time.time()
list_using_tolist = df_large['Numbers'].tolist()
end_time_tolist = time.time()
print(f"Time taken using tolist(): {end_time_tolist - start_time_tolist:.6f} seconds")

start_time_list = time.time()
list_using_list = list(df_large['Numbers'])
end_time_list = time.time()
print(f"Time taken using list() constructor: {end_time_list - start_time_list:.6f} seconds")

Output:

Time taken using tolist(): 1.100703 seconds
Time taken using list() constructor: 10.370464 seconds

The tolist() method is significantly faster.

 

Real-world use case

Imagine you’re a data analyst at a large telecom company. The marketing department has noticed an uptick in the number of customers leaving the company.

They need a detailed report analyzing the last communication the company had with these customers, specifically:

  1. The list of customer IDs who have left in the last three months.
  2. The dates of the last communication with these customers.
  3. A list of offers or packages discussed during these communications.

The data is stored in a DataFrame with the following columns: CustomerID, ChurnDate, LastCommunicationDate, and OffersDiscussed (which contains a list of offers discussed during the communication).

Filtering Relevant Data: First, filter out customers who have left in the last three months.

current_date = pd.Timestamp.now()
three_months_ago = current_date - pd.DateOffset(months=3)
churned_customers = df[df['ChurnDate'] > three_months_ago]

Converting Columns to Lists: Convert the CustomerID and LastCommunicationDate columns to lists.

churned_ids = churned_customers['CustomerID'].tolist()
last_communication_dates = churned_customers['LastCommunicationDate'].dt.strftime('%Y-%m-%d').tolist()

Handling Complex Data Types: Flatten the OffersDiscussed column to get a consolidated list of all offers discussed with churned customers.

all_offers = [offer for sublist in churned_customers['OffersDiscussed'] for offer in sublist]

With this data at hand, the analyst can now provide the marketing department with insights into the last interactions the company had with churned customers.

Leave a Reply

Your email address will not be published. Required fields are marked *