Convert Pandas DataFrame to NumPy array
Sometimes we had to convert the data within the DataFrame into some other data type so that we can leverage the functions and methods of NumPy.
In that case, we convert the DataFrame to NumPy arrays (ndarrays) to make our data analyses convenient as per the need of the libraries.
While DataFrame is a two-dimensional data structure & can be seen as a collection of multiple Series (data structure – 1 dimensional); NumPy array can be one dimensional or two dimensional.
In a previous tutorial, we learned how to convert a NumPy array to a Pandas DataFrame. In this tutorial, we will take a closer look at some of the common approaches we can use to convert a DataFrame into a NumPy array. We will also witness some common tricks to convert a column or potion of the DataFrame to a NumPy array.
This tutorial will also explain some of the easiest built-in methods and techniques that can ease this DataFrame to NumPy array conversion for you.
Creating Pandas DataFrame
DataFrame is a linear data structure that stores data in a tabular fashion, having rows & column names. It is the most prominent data structure that data analysts use for representing data of a CSV, XLSX, TSVs, JSON, and other file formats.
We can use the DataFrame constructor to create a DataFrame in pandas. Here is a code snippet showing how to create an empty DataFrame.
import pandas as pd # Using DataFrame constructor to create an empty DataFrame df = pd.DataFrame() print(df)
Output
Other than creating an empty DataFrame, we will need to provide some other value as a parameter inside the DataFrame() constructor. Here is a code snippet showing how to create a DataFrame with a single column.
import pandas as pd li1 = ['Karlos', 'Ray', 'Iris', 'Gaurav', 'Sue', 'Dee', 'Mohit'] dfr = pd.DataFrame(li1) print(dfr)
Output
We can use a nested list or a dictionary with a nested list to create a DataFrame. Let us now create a DataFrame with multiple columns. The code snippet is shown below.
import pandas as pd dict1 = {'Employee_Name': ['Karlos', 'Ray', 'Iris', 'Gaurav', 'Sue', 'Dee', 'Mohit'], 'Emp_Age': [29, 27, 26, 28, 25, 29, 36]} df = pd.DataFrame(dict1) print(df)
Output
Converting using DataFrame.to_numpy()
The to_numpy() method is the most common and efficient method to convert a DataFrame into a NumPy array. It comes as a part of the Pandas module. It accepts three optional parameters:
- dtype: It helps in specifying the data type the values are having within the array.
- copy: Setting the value of copy as “True” will make a new copy of the array. By default the value is set to “False” and it will return a view of another array (if exist).
- na_value: It helps in determining whether we have to use a value against any missing value in the array.
Here is a Python script explaining the basic conversion of a DataFrame to a NumPy array.
import pandas as pd dict1 = {'Employee_Name': ['Karlos', 'Ray', 'Iris', 'Gaurav', 'Sue', 'Dee', 'Mohit'], 'Emp_Age': [29, 27, 26, 28, 25, 29, 36]} dfr = pd.DataFrame(dict1) ndarr = dfr.to_numpy() print(dfr) print() print(ndarr)
Output
We can also extract a single row for conversion using to_numpy() method along with a square [] bracket. Here is a code snippet showing how to use it.
import pandas as pd d = [['John', "E1", 'Very nice seminar', 900000], ['Dee', "E2", 'Nice seminar', 870000], ['Daizy', "E3", 'A good exposure', 660000], ['Karlos', "E4", 'Very amazing', 560000], ['Sue', "E5", 'Learned a lot', 870000], ['Iris', "E6", 'It was exiting', 990000], ['Stefen', "E7", 'Had fun', 810000], ['Iris', "E8", 'Informative seminar', 820000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) print(df.to_numpy()[1]) print(df.to_numpy()[3])
Output
You can read more about the .to_numpy() from here
Filling missing values using to_numpy(na_value) while converting
We can set certain values in the na_value parameter of the to_numpy() method. This will specify the mentioned value to all those gaps where no value is present.
Here is a code example showing how to replace NA values with some other fixed value during DataFrame to NumPy array conversion.
import pandas as pd ary = [['India', 91, 1985], ['UK', 2, ], ['France', 33, 1991], ['Pak', 92, ], ['Germany', 38, 1998], ['Russia', 5, 1996]] df = pd.DataFrame(ary, columns = ['Country Name', 'Phone Code', 'TeleMarket Estd.']) array = df.to_numpy(na_value = '0') print(df) print(array)
Output
Check whether the object is a NumPy array or a DataFrame
We can simply use the type() function to determine whether a specific object we are working on is a DataFrame or a NumPy array.
We have to put the object name within the type() function as a parameter. Here is a code snippet showing how to extract the type of an object.
import pandas as pd ary = [['India', 91, 1985], ['UK', 2, ], ['France', 33, 1991], ['Pak', 92, ], ['Germany', 38, 1998], ['Russia', 5, 1996]] df = pd.DataFrame(ary, columns = ['Country Name', 'Phone Code', 'TeleMarket Estd.']) array = df.to_numpy(na_value = '0') print(type(df)) print(type(array)) print(array)
Output
Converting DataFrame to Ndarray having heterogeneous data type
It is certainly possible to convert a heterogeneous DataFrame into ndarray. We can use the to_numpy() method to do so.
Here is a Python script showing the conversion technique of a heterogeneous DataFrame to a NumPy array.
import numpy as np import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 96000, 28], 'Dee': ['4.5/5.0', 70000, 29], 'Paul': ['3.8/5.0', 50000, 21], 'Joe': ['3.5/5.0', 35000, 24], 'Lee': ['4.2/5.0', 20000, 31], 'Ray': ['4.1/5.0', 27000, 33], 'Steve': ['3.2/5.0', 31000, 29], 'Dev': ['3.6/5.0', 44000, 29] }) print(df) print() datf = df.to_numpy() print(datf)
Output
Converting a portion (column-wise) of DataFrame to NumPy array
We can convert some specific columns based on the column labels.
We have to specify the column labels in a separate DataFrame object and then convert that DataFrame object into a NumPy array using the to_numpy() method.
Here is a code showing how to implement this column-wise conversion.
import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 96000, 28], 'Dee': ['4.5/5.0', 70000, 29], 'Paul': ['3.8/5.0', 50000, 21], 'Joe': ['3.5/5.0', 35000, 24], 'Lee': ['4.2/5.0', 20000, 31], 'Ray': ['4.1/5.0', 27000, 33], 'Steve': ['3.2/5.0', 31000, 29], 'Dev': ['3.6/5.0', 44000, 29] }) col_conversion = df[["Karl", "Dee", "Paul", "Ray"]] ndarr = col_conversion.to_numpy() print(col_conversion) print(ndarr)
Output
Converting an empty DataFrame to a NumPy array
We can create an empty DataFrame and convert it to a NumPy array using the to_numpy() method. Here is a code snippet showing how to implement it.
import pandas as pd df = pd.DataFrame({ 'Karl': [], 'Dee': [], 'Ray': [] }) print(df) ndarr = df.to_numpy() print("The empty NumPy array is: ", ndarr)
Output
We can also set an empty DataFrame with NaN values with zeroes during conversion. Here is a code snippet showing how to implement it.
import numpy as np import pandas as pd df = pd.DataFrame({ 'Karl': [np.nan, np.nan], 'Dee': [np.nan, np.nan], 'Ray': [np.nan, np.nan] }) print(df) ndarr = df.to_numpy(na_value = 0) print("The empty NumPy array is: \n", ndarr)
Output
Convert a specific portion of the DataFrame dataset to a NumPy array
We can extract a specific portion of a large DataFrame through various filtering mechanisms and convert it to a NumPy array.
We can use the head() and tail() function to extract the topmost or bottom-most rows and then apply the conversion method to get the NumPy array. Here is the code snippet showing how to use it.
import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 99500, 'CTO', 28], 'Dee': ['4.5/5.0', 78000, 'CFO', 29], 'Paul': ['3.8/5.0', 80000, 'Security Head', 21], 'Joe': ['3.9/5.0', 85000, 'IT Manager', 24], 'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31], 'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33], 'Steve': ['4.2/5.0', 71000, 'Security Head', 29], 'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29] }) df1 = pd.DataFrame(df, columns = ['Rating', 'Salary', 'Designation', 'Age']) print(df) x = df.head(2) print(x.to_numpy())
Output
Another example where we are filtering out data and converting a portion of the DataFrame to a NumPy array. Here is a code snippet showing how to implement it.
import pandas as pd d = [['John', "E1", 'Very nice seminar', 900000], ['Dee', "E2", 'Nice seminar', 870000], ['Daizy', "E3", 'A good exposure', 660000], ['Karlos', "E4", 'Very amazing', 560000], ['Sue', "E5", 'Learned a lot', 870000], ['Iris', "E6", 'It was exiting', 990000], ['Stefen', "E7", 'Had fun', 810000], ['Iris', "E8", 'Informative seminar', 820000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) x = df[~df.Name.str.startswith('S')] print(x.to_numpy())
Output
NOTE: There are various other filtering mechanisms we can use to convert a portion of the DataFrame to NumPy array.
Convert DataFrame to Ndarrays using DataFrame.to_records()
Pandas also come with another method to convert DataFrame into a NumPy array. This method is the to_records() method that will convert a DataFrame into a NumPy records array.
It takes various parameters like:
- index: We can use this flag parameter to identify and include the index column in the resultant record array. It remains True by default.
- column_dtypes – It determines the data type of the columns in the resultant record array.
- index_dtypes: It specifies the data type to be used for the index columns.
Here is a code snippet showing its implementation.
import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 99500, 'CTO', 28], 'Dee': ['4.5/5.0', 78000, 'CFO', 29], 'Paul': ['3.8/5.0', 80000, 'Security Head', 21], 'Joe': ['3.9/5.0', 85000, 'IT Manager', 24], 'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31], 'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33], 'Steve': ['4.2/5.0', 71000, 'Security Head', 29], 'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29] }) df1 = pd.DataFrame(df, columns = ['Rating', 'Salary', 'Designation', 'Age']) print(df.to_records())
Output
Convert DataFrame to NumPy using asarray() method
NumPy module also provides a method called asarray() that helps to convert a DataFrame to a NumPy array. Here is a code snippet showing how to use it.
import numpy as np import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 99500, 'CTO', 28], 'Dee': ['4.5/5.0', 78000, 'CFO', 29], 'Paul': ['3.8/5.0', 80000, 'Security Head', 21], 'Joe': ['3.9/5.0', 85000, 'IT Manager', 24], 'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31], 'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33], 'Steve': ['4.2/5.0', 71000, 'Security Head', 29], 'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29] }) df1 = pd.DataFrame(df, columns = ['Rating', 'Salary', 'Designation', 'Age']) z = np.asarray(df) print(z)
Output
Convert to NumPy array using dataframe.values
Another way to convert a DataFrame to a NumPy array is by using the DataFrame attribute/property “.values”. It shows the NumPy array the same way other methods do.
But, official documentation recommends not using this technique of converting or representing NumPy arrays. It is because the behavior of .values property is inconsistent and might vary on different factors. Here is a Python script showing how to use it.
import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 99500, 'CTO', 28], 'Dee': ['4.5/5.0', 78000, 'CFO', 29], 'Paul': ['3.8/5.0', 80000, 'Security Head', 21], 'Joe': ['3.9/5.0', 85000, 'IT Manager', 24], 'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31], 'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33], 'Steve': ['4.2/5.0', 71000, 'Security Head', 29], 'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29] }) print(df.values)
Output
Extracting a single DataFrame row as ndarray using DataFrame.values[]
We can also use the .values property to extract a single row of DataFrame and straightway convert it to a NumPy array. We have to specify the row number in square brackets [].
The row number starts from zero, meaning the first row will be 0, followed by 1, 2, and so on. Here is a code snippet showing how to use it.
import pandas as pd d = [['John', "E1", 'Very nice seminar', 900000], ['Dee', "E2", 'Nice seminar', 870000], ['Daizy', "E3", 'A good exposure', 660000], ['Karlos', "E4", 'Very amazing', 560000], ['Sue', "E5", 'Learned a lot', 870000], ['Iris', "E6", 'It was exiting', 990000], ['Stefen', "E7", 'Had fun', 810000], ['Iris', "E8", 'Informative seminar', 820000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) print(df.values[1]) print(df.values[3])
Output
Convert DataFrame to NumPy Array using tolist()
There are indirect ways to convert a DataFrame to NumPy arrays. We can convert a DataFrame to a list and then convert the list to a NumPy array. Here is a code snippet showing how to implement it.
import numpy as np import pandas as pd d = [['John', "E1", 'Very nice seminar', 900000], ['Dee', "E2", 'Nice seminar', 870000], ['Daizy', "E3", 'A good exposure', 660000], ['Karlos', "E4", 'Very amazing', 560000], ['Sue', "E5", 'Learned a lot', 870000], ['Iris', "E6", 'It was exiting', 990000], ['Stefen', "E7", 'Had fun', 810000], ['Iris', "E8", 'Informative seminar', 820000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) x = df.values.tolist() ndarr = np.asarray(x) print(ndarr)
Output
Convert DataFrame to NumPy array using reset_index()
We can also use the reset_index() method to convert a DataFrame object to a NumPy array or record ndarray. Here are two different examples of how we can use this method to perform the conversion. Example 1:
import pandas as pd arry = [[25, 'Karlos', 2015], [21, 'Ray', 2016], [22, 'Dee', 2018]] df = pd.DataFrame(arry, columns = ['Age', 'Student_Name', 'Passing Year'], index = [1, 2, 3]) v = df.reset_index().values.ravel().view() print(v) print(type(v))
Output
Example 2:
import numpy as np import pandas as pd d = [['John', "E1", 'Very nice seminar', 900000], ['Dee', "E2", 'Nice seminar', 870000], ['Daizy', "E3", 'A good exposure', 660000], ['Karlos', "E4", 'Very amazing', 560000], ['Sue', "E5", 'Learned a lot', 870000], ['Iris', "E6", 'It was exiting', 990000], ['Stefen', "E7", 'Had fun', 810000], ['Iris', "E8", 'Informative seminar', 820000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) x = df.reset_index() x = np.rec.fromrecords(x, names = x.columns.tolist()) print(x) print(type(x))
Output
Convert DataFrame columns to Ndarray using iloc[]
We can also use the .values along with iloc[] to convert a DataFrame portion or a complete DataFrame to a NumPy array. Here is a code snippet showing how to implement it.
import pandas as pd d = [['John', "E1", 'Very nice seminar', 900000], ['Dee', "E2", 'Nice seminar', 870000], ['Daizy', "E3", 'A good exposure', 660000], ['Karlos', "E4", 'Very amazing', 560000], ['Sue', "E5", 'Learned a lot', 870000], ['Iris', "E6", 'It was exiting', 990000], ['Stefen', "E7", 'Had fun', 810000], ['Iris', "E8", 'Informative seminar', 820000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Comments', 'Annual Package'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) x = df.iloc[:,1:] b = df.iloc[:,1:].values print(type(x)) print(type(b)) print(b)
Output
Visualizing a converted NumPy array from DataFrame
We can visualize the converted NumPy array from DataFrame using the Matplotlib library. Here is a code sample showing how to implement one.
import matplotlib.pyplot as mplib import numpy as np import pandas as pd df = pd.DataFrame({ 'Karl': ['4.4/5.0', 99500, 'CTO', 28], 'Dee': ['4.5/5.0', 78000, 'CFO', 29], 'Paul': ['3.8/5.0', 80000, 'Security Head', 21], 'Joe': ['3.9/5.0', 85000, 'IT Manager', 24], 'Lee': ['4.3/5.0', 69000, 'Recruiter Head', 31], 'Ray': ['4.5/5.0', 87000, 'Principal ML engineer', 33], 'Steve': ['4.2/5.0', 71000, 'Security Head', 29], 'Dev': ['3.9/5.0', 44000, 'Lead Architect', 29] }) no_of_days = np.array([20, 10, 23, 28, 50, 60, 70, 89]) zz = df.values[1] mplib.plot(no_of_days, zz) mplib.xlabel("Number of Days") mplib.ylabel("Salary Count") mplib.show()
Output
Conclusion
Converting data from a DataFrame to NumPy has become a substantial grind that we as data analysts have to do every day. Often we need to work on NumPy arrays to perform Fourier transform, linear algebra, and determinants, and matrices operations.
That is where DataFrame is not useful. Rather, NumPy arrays play a significant role. Thus, this tutorial highlighted some significant ways and methods through which we can convert a complete DataFrame or a portion of it to a NumPy array.
This tutorial highlighted some notable methods like to_numpy(), to_records(), asarray(), tolist(), etc., and techniques like to_numpy(na_value), .values, iloc[], etc., that helps in NumPy array conversion.
Gaurav is a Full-stack (Sr.) Tech Content Engineer (6.5 years exp.) & has a sumptuous passion for curating articles, blogs, e-books, tutorials, infographics, and other web content. Apart from that, he is into security research and found bugs for many govt. & private firms across the globe. He has authored two books and contributed to more than 500+ articles and blogs. He is a Computer Science trainer and loves to spend time with efficient programming, data science, Information privacy, and SEO. Apart from writing, he loves to play foosball, read novels, and dance.