Python

Convert Pandas DataFrame to NumPy array

Pandas & NumPy are the two most common data analysis libraries that data science professionals leverage for handling a large amount of structured data of various types that reside in a tabular structure. Often we work on Pandas where we prefer working with DataFrame (2D data structure).

Sometimes we had to convert the data within the DataFrame into some other data type so that we can leverage the functions and methods of NumPy.

In that case, we convert the DataFrame to NumPy arrays (ndarrays) to make our data analyses convenient as per the need of the libraries.

While DataFrame is a two-dimensional data structure & can be seen as a collection of multiple Series (data structure – 1 dimensional); NumPy array can be one dimensional or two dimensional.

In a previous tutorial, we learned how to convert a NumPy array to a Pandas DataFrame. In this tutorial, we will take a closer look at some of the common approaches we can use to convert a DataFrame into a NumPy array. We will also witness some common tricks to convert a column or potion of the DataFrame to a NumPy array.

This tutorial will also explain some of the easiest built-in methods and techniques that can ease this DataFrame to NumPy array conversion for you.

 

 

Creating Pandas DataFrame

DataFrame is a linear data structure that stores data in a tabular fashion, having rows & column names. It is the most prominent data structure that data analysts use for representing data of a CSV, XLSX, TSVs, JSON, and other file formats.

We can use the DataFrame constructor to create a DataFrame in pandas. Here is a code snippet showing how to create an empty DataFrame.

import pandas as pd
# Using DataFrame constructor to create an empty DataFrame
df = pd.DataFrame()
print(df)

Output

This output shows how to create a dataframe using Pandas in Python
Other than creating an empty DataFrame, we will need to provide some other value as a parameter inside the DataFrame() constructor. Here is a code snippet showing how to create a DataFrame with a single column.

import pandas as pd
li1 = ['Karlos', 'Ray', 'Iris', 'Gaurav',
       'Sue', 'Dee', 'Mohit']
dfr = pd.DataFrame(li1)
print(dfr)

Output

This output shows how to create a dataframe using Pandas in Python

We can use a nested list or a dictionary with a nested list to create a DataFrame. Let us now create a DataFrame with multiple columns. The code snippet is shown below.

import pandas as pd
dict1 = {'Employee_Name': ['Karlos', 'Ray', 'Iris', 'Gaurav', 'Sue', 'Dee', 'Mohit'], 'Emp_Age': [29, 27, 26, 28, 25, 29, 36]}
df = pd.DataFrame(dict1)
print(df)

Output

This output shows how to create a dataframe using Pandas in Python

 

Converting using DataFrame.to_numpy()

The to_numpy() method is the most common and efficient method to convert a DataFrame into a NumPy array. It comes as a part of the Pandas module. It accepts three optional parameters:

  • dtype: It helps in specifying the data type the values are having within the array.
  • copy: Setting the value of copy as “True” will make a new copy of the array. By default the value is set to “False” and it will return a view of another array (if exist).
  • na_value: It helps in determining whether we have to use a value against any missing value in the array.

Here is a Python script explaining the basic conversion of a DataFrame to a NumPy array.

import pandas as pd
dict1 = {'Employee_Name': ['Karlos', 'Ray', 'Iris', 'Gaurav', 'Sue', 'Dee', 'Mohit'], 'Emp_Age': [29, 27, 26, 28, 25, 29, 36]}
dfr = pd.DataFrame(dict1)
ndarr = dfr.to_numpy()
print(dfr)
print()
print(ndarr)

Output

This output shows how to convert a dataframe DataFrame.to_numpy() in Python
We can also extract a single row for conversion using to_numpy() method along with a square [] bracket. Here is a code snippet showing how to use it.

import pandas as pd
d = [['John', "E1", 'Very nice seminar', 900000],
     ['Dee', "E2", 'Nice seminar', 870000],
     ['Daizy', "E3", 'A good exposure', 660000],
     ['Karlos', "E4", 'Very amazing', 560000],
     ['Sue', "E5", 'Learned a lot', 870000],
     ['Iris', "E6", 'It was exiting', 990000],
     ['Stefen', "E7", 'Had fun', 810000],
     ['Iris', "E8", 'Informative seminar', 820000]]
df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
print(df.to_numpy()[1])
print(df.to_numpy()[3])

Output

This output shows how to convert a dataframe DataFrame.to_numpy() in Python

 

Filling missing values using to_numpy(na_value) while converting

We can set certain values in the na_value parameter of the to_numpy() method. This will specify the mentioned value to all those gaps where no value is present.

Here is a code example showing how to replace NA values with some other fixed value during DataFrame to NumPy array conversion.

import pandas as pd
ary = [['India', 91, 1985], ['UK', 2, ], ['France', 33, 1991], ['Pak', 92, ], ['Germany', 38, 1998], ['Russia', 5, 1996]]
df = pd.DataFrame(ary, columns = ['Country Name', 'Phone Code', 'TeleMarket Estd.'])
array = df.to_numpy(na_value = '0')
print(df)
print(array)

Output

This output shows how to convert a dataframe to_numpy(na_value) in Python

 

Check whether the object is a NumPy array or a DataFrame

We can simply use the type() function to determine whether a specific object we are working on is a DataFrame or a NumPy array.

We have to put the object name within the type() function as a parameter. Here is a code snippet showing how to extract the type of an object.

import pandas as pd
ary = [['India', 91, 1985], ['UK', 2, ], ['France', 33, 1991], ['Pak', 92, ], ['Germany', 38, 1998], ['Russia', 5, 1996]]
df = pd.DataFrame(ary, columns = ['Country Name', 'Phone Code', 'TeleMarket Estd.'])
array = df.to_numpy(na_value = '0')
print(type(df))
print(type(array))
print(array)

Output

This output shows how to Check whether the object is a NumPy array or a DataFrame in Python

 

Converting DataFrame to Ndarray having heterogeneous data type

It is certainly possible to convert a heterogeneous DataFrame into ndarray. We can use the to_numpy() method to do so.

Here is a Python script showing the conversion technique of a heterogeneous DataFrame to a NumPy array.

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 96000, 28],
    'Dee': ['4.5/5.0', 70000, 29],
    'Paul': ['3.8/5.0', 50000, 21],
    'Joe': ['3.5/5.0', 35000, 24],
    'Lee': ['4.2/5.0', 20000, 31],
    'Ray': ['4.1/5.0', 27000, 33],
    'Steve': ['3.2/5.0', 31000, 29],
    'Dev': ['3.6/5.0', 44000, 29]
})
print(df)
print()
datf = df.to_numpy()
print(datf)

Output

This output shows how to convert DataFrame to Ndarray having heterogeneous data type in Python

 

Converting a portion (column-wise) of DataFrame to NumPy array

We can convert some specific columns based on the column labels.

We have to specify the column labels in a separate DataFrame object and then convert that DataFrame object into a NumPy array using the to_numpy() method.

Here is a code showing how to implement this column-wise conversion.

import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 96000, 28],
    'Dee': ['4.5/5.0', 70000, 29],
    'Paul': ['3.8/5.0', 50000, 21],
    'Joe': ['3.5/5.0', 35000, 24],
    'Lee': ['4.2/5.0', 20000, 31],
    'Ray': ['4.1/5.0', 27000, 33],
    'Steve': ['3.2/5.0', 31000, 29],
    'Dev': ['3.6/5.0', 44000, 29]
})
col_conversion = df[["Karl", "Dee", "Paul", "Ray"]]
ndarr = col_conversion.to_numpy()
print(col_conversion)
print(ndarr)

Output

This output shows how to convert a portion (column-wise) of DataFrame to NumPy array in Python

 

Converting an empty DataFrame to a NumPy array

We can create an empty DataFrame and convert it to a NumPy array using the to_numpy() method. Here is a code snippet showing how to implement it.

import pandas as pd
df = pd.DataFrame({
    'Karl': [],
    'Dee': [],
    'Ray': []
})
print(df)
ndarr = df.to_numpy()
print("The empty NumPy array is: ", ndarr)

Output

This output shows how to convert an empty DataFrame to NumPy array in Python

We can also set an empty DataFrame with NaN values with zeroes during conversion. Here is a code snippet showing how to implement it.

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'Karl': [np.nan, np.nan],
    'Dee': [np.nan, np.nan],
    'Ray': [np.nan, np.nan]
})
print(df)
ndarr = df.to_numpy(na_value = 0)
print("The empty NumPy array is: \n", ndarr)

Output

This output shows how to convert an empty DataFrame to NumPy array in Python

 

Convert a specific portion of the DataFrame dataset to a NumPy array

We can extract a specific portion of a large DataFrame through various filtering mechanisms and convert it to a NumPy array.

We can use the head() and tail() function to extract the topmost or bottom-most rows and then apply the conversion method to get the NumPy array. Here is the code snippet showing how to use it.

import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 99500, 'CTO', 28],
    'Dee': ['4.5/5.0', 78000, 'CFO', 29],
    'Paul': ['3.8/5.0', 80000, 'Security Head', 21],
    'Joe': ['3.9/5.0', 85000, 'IT Manager', 24],
    'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31],
    'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33],
    'Steve': ['4.2/5.0', 71000, 'Security Head', 29],
    'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29]
})
df1 = pd.DataFrame(df, columns = ['Rating', 'Salary', 'Designation', 'Age'])
print(df)
x = df.head(2)
print(x.to_numpy())

Output

This output shows how to convert specific portion of the DataFrame dataset to NumPy array in Python

Another example where we are filtering out data and converting a portion of the DataFrame to a NumPy array. Here is a code snippet showing how to implement it.

import pandas as pd
d = [['John', "E1", 'Very nice seminar', 900000],
     ['Dee', "E2", 'Nice seminar', 870000],
     ['Daizy', "E3", 'A good exposure', 660000],
     ['Karlos', "E4", 'Very amazing', 560000],
     ['Sue', "E5", 'Learned a lot', 870000],
     ['Iris', "E6", 'It was exiting', 990000],
     ['Stefen', "E7", 'Had fun', 810000],
     ['Iris', "E8", 'Informative seminar', 820000]]
df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
x = df[~df.Name.str.startswith('S')]
print(x.to_numpy())

Output

This output shows how to convert specific portion of the DataFrame dataset to NumPy array in Python

NOTE: There are various other filtering mechanisms we can use to convert a portion of the DataFrame to NumPy array.

 

Convert DataFrame to Ndarrays using DataFrame.to_records()

Pandas also come with another method to convert DataFrame into a NumPy array. This method is the to_records() method that will convert a DataFrame into a NumPy records array.

It takes various parameters like:

  • index: We can use this flag parameter to identify and include the index column in the resultant record array. It remains True by default.
  • column_dtypes – It determines the data type of the columns in the resultant record array.
  • index_dtypes: It specifies the data type to be used for the index columns.

Here is a code snippet showing its implementation.

import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 99500, 'CTO', 28],
    'Dee': ['4.5/5.0', 78000, 'CFO', 29],
    'Paul': ['3.8/5.0', 80000, 'Security Head', 21],
    'Joe': ['3.9/5.0', 85000, 'IT Manager', 24],
    'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31],
    'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33],
    'Steve': ['4.2/5.0', 71000, 'Security Head', 29],
    'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29]
})
df1 = pd.DataFrame(df, columns = ['Rating', 'Salary', 'Designation', 'Age'])
print(df.to_records())

Output

This output shows how to convert DataFrame to Ndarrays using DataFrame.to_records() in Python

 

Convert DataFrame to Ndarrays using asarray() method

NumPy module also provides a method called asarray() that helps to convert a DataFrame to a NumPy array. Here is a code snippet showing how to use it.

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 99500, 'CTO', 28],
    'Dee': ['4.5/5.0', 78000, 'CFO', 29],
    'Paul': ['3.8/5.0', 80000, 'Security Head', 21],
    'Joe': ['3.9/5.0', 85000, 'IT Manager', 24],
    'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31],
    'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33],
    'Steve': ['4.2/5.0', 71000, 'Security Head', 29],
    'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29]
})
df1 = pd.DataFrame(df, columns = ['Rating', 'Salary', 'Designation', 'Age'])
z = np.asarray(df)
print(z)

Output

This output shows how to convert DataFrame to Ndarrays using asarray() method in Python

 

Convert to NumPy array using dataframe.values

Another way to convert a DataFrame to a NumPy array is by using the DataFrame attribute/property “.values”. It shows the NumPy array the same way other methods do.

But, official documentation recommends not using this technique of converting or representing NumPy arrays. It is because the behavior of .values property is inconsistent and might vary on different factors. Here is a Python script showing how to use it.

import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 99500, 'CTO', 28],
    'Dee': ['4.5/5.0', 78000, 'CFO', 29],
    'Paul': ['3.8/5.0', 80000, 'Security Head', 21],
    'Joe': ['3.9/5.0', 85000, 'IT Manager', 24],
    'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31],
    'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33],
    'Steve': ['4.2/5.0', 71000, 'Security Head', 29],
    'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29]
})
print(df.values)

Output

This output shows how to convert DataFrame to NumPy array using dataframe.values in Python

 

Extracting a single DataFrame row as ndarray using DataFrame.values[]

We can also use the .values property to extract a single row of DataFrame and straightway convert it to a NumPy array. We have to specify the row number in square brackets [].

The row number starts from zero, meaning the first row will be 0, followed by 1, 2, and so on. Here is a code snippet showing how to use it.

import pandas as pd
d = [['John', "E1", 'Very nice seminar', 900000],
     ['Dee', "E2", 'Nice seminar', 870000],
     ['Daizy', "E3", 'A good exposure', 660000],
     ['Karlos', "E4", 'Very amazing', 560000],
     ['Sue', "E5", 'Learned a lot', 870000],
     ['Iris', "E6", 'It was exiting', 990000],
     ['Stefen', "E7", 'Had fun', 810000],
     ['Iris', "E8", 'Informative seminar', 820000]]
df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
print(df.values[1])
print(df.values[3])

Output

This output shows how to extract a single DataFrame row as ndarray using DataFrame.values[] in Python

 

Convert DataFrame to NumPy Array using tolist()

There are indirect ways to convert a DataFrame to NumPy arrays. We can convert a DataFrame to a list and then convert the list to a NumPy array. Here is a code snippet showing how to implement it.

import numpy as np
import pandas as pd
d = [['John', "E1", 'Very nice seminar', 900000],
     ['Dee', "E2", 'Nice seminar', 870000],
     ['Daizy', "E3", 'A good exposure', 660000],
     ['Karlos', "E4", 'Very amazing', 560000],
     ['Sue', "E5", 'Learned a lot', 870000],
     ['Iris', "E6", 'It was exiting', 990000],
     ['Stefen', "E7", 'Had fun', 810000],
     ['Iris', "E8", 'Informative seminar', 820000]]
df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
x = df.values.tolist()
ndarr = np.asarray(x)
print(ndarr)

Output

This output shows how to convert DataFrame to NumPy Array using tolist() in Python

 

Convert DataFrame to NumPy array using reset_index()

We can also use the reset_index() method to convert a DataFrame object to a NumPy array or record ndarray. Here are two different examples of how we can use this method to perform the conversion. Example 1:

import pandas as pd
arry = [[25, 'Karlos', 2015], [21, 'Ray', 2016], [22, 'Dee', 2018]]
df = pd.DataFrame(arry, columns = ['Age', 'Student_Name', 'Passing Year'], index = [1, 2, 3])
v = df.reset_index().values.ravel().view()
print(v)
print(type(v))

Output

This output shows how to convert DataFrame to NumPy Array using tolist() in Python

Example 2:

import numpy as np
import pandas as pd
d = [['John', "E1", 'Very nice seminar', 900000],
     ['Dee', "E2", 'Nice seminar', 870000],
     ['Daizy', "E3", 'A good exposure', 660000],
     ['Karlos', "E4", 'Very amazing', 560000],
     ['Sue', "E5", 'Learned a lot', 870000],
     ['Iris', "E6", 'It was exiting', 990000],
     ['Stefen', "E7", 'Had fun', 810000],
     ['Iris', "E8", 'Informative seminar', 820000]]
df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
x = df.reset_index()
x = np.rec.fromrecords(x, names = x.columns.tolist())
print(x)
print(type(x))

Output

This output shows how to convert DataFrame to NumPy Array using tolist() in Python

 

Convert DataFrame columns to Ndarray using iloc[]

We can also use the .values along with iloc[] to convert a DataFrame portion or a complete DataFrame to a NumPy array. Here is a code snippet showing how to implement it.

import pandas as pd
d = [['John', "E1", 'Very nice seminar', 900000],
     ['Dee', "E2", 'Nice seminar', 870000],
     ['Daizy', "E3", 'A good exposure', 660000],
     ['Karlos', "E4", 'Very amazing', 560000],
     ['Sue', "E5", 'Learned a lot', 870000],
     ['Iris', "E6", 'It was exiting', 990000],
     ['Stefen', "E7", 'Had fun', 810000],
     ['Iris', "E8", 'Informative seminar', 820000]]
df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
x = df.iloc[:,1:]
b = df.iloc[:,1:].values
print(type(x))
print(type(b))
print(b)

Output

This output shows how to convert DataFrame columns to Ndarray using iloc[] in Python

 

Visualizing a converted NumPy array from DataFrame

We can visualize the converted NumPy array from DataFrame using the Matplotlib library. Here is a code sample showing how to implement one.

import matplotlib.pyplot as mplib
import numpy as np
import pandas as pd
df = pd.DataFrame({
    'Karl': ['4.4/5.0', 99500, 'CTO', 28],
    'Dee': ['4.5/5.0', 78000, 'CFO', 29],
    'Paul': ['3.8/5.0', 80000, 'Security Head', 21],
    'Joe': ['3.9/5.0', 85000, 'IT Manager', 24],
    'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31],
    'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33],
    'Steve': ['4.2/5.0', 71000, 'Security Head', 29],
    'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29]
})
no_of_days = np.array([20, 10, 23, 28, 50, 60, 70, 89])
zz = df.values[1]
mplib.plot(no_of_days, zz)
mplib.xlabel("Number of Days")
mplib.ylabel("Salary Count")
mplib.show()

Output

This output shows how to visualize a converted NumPy array from DataFrame in Python

 

Conclusion

Converting data from a DataFrame to NumPy has become a substantial grind that we as data analysts have to do every day. Often we need to work on NumPy arrays to perform Fourier transform, linear algebra, and determinants, and matrices operations.

That is where DataFrame is not useful. Rather, NumPy arrays play a significant role. Thus, this tutorial highlighted some significant ways and methods through which we can convert a complete DataFrame or a portion of it to a NumPy array.

This tutorial highlighted some notable methods like to_numpy(), to_records(), asarray(), tolist(), etc., and techniques like to_numpy(na_value), .values, iloc[], etc., that helps in NumPy array conversion.

Leave a Reply

Your email address will not be published. Required fields are marked *