Export Python Pandas DataFrame to Excel using to_excel
The to_excel
function allows you to export a Pandas DataFrame to an Excel file. It’s highly beneficial for every data scientist.
This tutorial will give you a comprehensive understanding of how the to_excel
function works, and how you can customize its behavior to suit your needs.
- 1 Basic Syntax and Parameters
- 2 Exporting Pandas DataFrame to Excel file
- 3 Selecting the Engine for Writing
- 4 Write Multiple Pandas DataFrames to Multiple sheets
- 5 Handling Missing Data with na_rep parameter
- 6 Formatting Floating Point Numbers
- 7 Choosing DataFrame Columns for Excel Export
- 8 Customizing Column Headers
- 9 Manipulating Indexes
- 10 Controlling Excel Output Location
- 11 Controlling Cell Merging using to_excel
- 12 Excel Export Savior
- 13 Further Reading
Basic Syntax and Parameters
The basic syntax for the to_excel
function in Pandas is as follows:
DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
Let’s briefly discuss the most commonly used parameters:
excel_writer
: This is the target file name or ExcelWriter object.sheet_name
: Name of the sheet in the Excel file. Defaults to ‘Sheet1’.na_rep
: This is the string representation of NaN to use.float_format
: Format to use for floating point numbers.columns
: Columns to write to the Excel file.header
: Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.index
: Write row names (index).startrow
&startcol
: Upper left cell row and column to dump DataFrame.engine
: Write engine to use, ‘openpyxl’ or ‘xlsxwriter’.
The to_excel
function provides robust capabilities for customizing your data export to Excel. Next, we will dive into examples of how to use it effectively.
Exporting Pandas DataFrame to Excel file
First, let’s import Pandas and create a DataFrame as a starting point:
import pandas as pd # import numpy for sample data import numpy as np df = pd.DataFrame({ 'A': 1., 'B': pd.Timestamp('20130620'), 'C': pd.Series(1, index=list(range(4)), dtype='float32'), 'D': np.array([3] * 4, dtype='int32'), 'E': pd.Categorical(["test", "train", "test", "train"]), 'F': 'foo' }) print(df)
Output:
A B C D E F 0 1.0 2013-06-20 1.0 3 test foo 1 1.0 2013-06-20 1.0 3 train foo 2 1.0 2013-06-20 1.0 3 test foo 3 1.0 2013-06-20 1.0 3 train foo
Here, we’ve created a DataFrame with a mixture of numeric types, strings, and categorical data.
We can use the to_excel
function to write the DataFrame to an Excel sheet like this:
df.to_excel("example.xlsx")
This will write the DataFrame df
to an Excel file named example.xlsx
. By default, both the header and index are included in the exported Excel file. Here’s what the resulting Excel file looks like:
| A B C D E F --|-------------------------------------- 0 | 1.0 2013-06-20 1.0 3 test foo 1 | 1.0 2013-06-20 1.0 3 train foo 2 | 1.0 2013-06-20 1.0 3 test foo 3 | 1.0 2013-06-20 1.0 3 train foo
It’s important to note that the index of the DataFrame has been written as the first column in the Excel file.
The rest of the columns in the Excel file (A
to F
) correspond to the DataFrame’s columns.
In this case, the to_excel
function creates a new file. If the file name that already exists, the function will overwrite the contents of the existing file.
Selecting the Engine for Writing
Pandas to_excel
function allows you to choose the engine used to write to Excel files. You can use the engine
parameter to specify the engine you want to use.
Here are the engines:
- openpyxl
- xlsxwriter
Let’s write our DataFrame using the xlsxwriter
engine:
df.to_excel("example_xlsxwriter.xlsx", engine='xlsxwriter')
This will create an Excel file named example_xlsxwriter.xlsx
using the xlsxwriter
engine.
The choice of engine can affect the capabilities available for writing to Excel files, like adding charts, conditional formatting, etc.
Remember, the chosen engine needs to be installed in your Python environment, so make sure to install it using pip if it’s not already installed.
pip install xlsxwriter
Write Multiple Pandas DataFrames to Multiple sheets
To write multiple pandas DataFrames to multiple sheets of an Excel file, you can use the ExcelWriter
object, which allows you to specify the name of the sheet for each DataFrame.
Let’s create two different DataFrames and write them to two different sheets in the same Excel workbook:
df_sheet1 = pd.DataFrame({ 'A': ['foo', 'bar', 'baz', 'qux'], 'B': ['alpha', 'beta', 'gamma', 'delta'], 'C': np.random.rand(4), 'D': np.random.randint(10, size=4) }) df_sheet2 = pd.DataFrame({ 'E': ['apple', 'banana', 'cherry', 'date'], 'F': ['elephant', 'fox', 'giraffe', 'hippo'], 'G': np.random.rand(4), 'H': np.random.randint(10, size=4) }) # create an ExcelWriter object with pd.ExcelWriter("example_multiple_dataframes.xlsx") as writer: df_sheet1.to_excel(writer, sheet_name='FirstSheet') df_sheet2.to_excel(writer, sheet_name='SecondSheet')
In the resulting Excel file, you will find two sheets named ‘FirstSheet’ and ‘SecondSheet’, each containing data from the respective DataFrame:
FirstSheet:
| A | B | C | D -- |--------------------------------------- 0 | foo | alpha | 0.618012313202 | 6 1 | bar | beta | 0.912971647978 | 8 2 | baz | gamma | 0.344552180436 | 0 3 | qux | delta | 0.982718734163 | 9
SecondSheet:
| E | F | G | H -- |------------------------------------------ 0 | apple | elephant| 0.329873331859 | 7 1 | banana| fox | 0.751442750918 | 3 2 | cherry| giraffe | 0.510241492448 | 2 3 | date | hippo | 0.920831290307 | 0
Handling Missing Data with na_rep parameter
You can use the na_rep
parameter to represent the missing values when exporting a DataFrame to an Excel file.
Let’s create a DataFrame with some missing values to demonstrate this:
df_na = pd.DataFrame({ 'A': 1., 'B': pd.Timestamp('20130620'), 'C': pd.Series(1, index=list(range(4)), dtype='float32'), 'D': np.array([3] * 4, dtype='int32'), 'E': pd.Categorical(["test", "train", np.nan, "train"]), 'F': 'foo' }) print(df_na)
Output:
A B C D E F 0 1.0 2013-06-20 1.0 3 test foo 1 1.0 2013-06-20 1.0 3 train foo 2 1.0 2013-06-20 1.0 3 NaN foo 3 1.0 2013-06-20 1.0 3 train foo
You can see that there is a NaN value in the ‘E’ column. When we write this DataFrame to an Excel file, we can choose how to represent this NaN value.
df_na.to_excel("example_na_rep.xlsx", na_rep='NULL')
In the Excel file, the NaN value in the ‘E’ column is replaced with ‘NULL’.
Formatting Floating Point Numbers
The float_format
parameter of the to_excel
function provides control over formatting floating point numbers.
This parameter takes a formatting string or a function that will dictate how floating point numbers are converted to strings. Let’s create a DataFrame with floating numbers:
df_float = pd.DataFrame({ 'A': 1.123456789, 'B': np.array([3.1415926535] * 4, dtype='float64'), }, index=list('abcd')) print(df_float)
Output:
A B a 1.123457 3.141593 b 1.123457 3.141593 c 1.123457 3.141593 d 1.123457 3.141593
For example, if we want all floating point numbers in our DataFrame to be rounded to two decimal places, we can use a formatting string as follows:
df_float.to_excel("example_float_format.xlsx", float_format="%.2f")
In the resulting Excel file, all floating point numbers are represented with two decimal places.
The data in the Excel file should look like this:
| A B --|----------------- a | 1.12 3.14 b | 1.12 3.14 c | 1.12 3.14 d | 1.12 3.14
Choosing DataFrame Columns for Excel Export
You can use the columns
parameter of the to_excel
function to export specific columns of the DataFrame. Consider a DataFrame with several columns:
df_multi = pd.DataFrame({ 'A': pd.date_range(start='2020-01-01', periods=4), 'B': pd.Series(range(4), dtype='float32'), 'C': np.array([3] * 4, dtype='int32'), 'D': pd.Categorical(["test", "train", "check", "go"]), 'E': 'foo' }) print(df_multi)
Output:
A B C D E 0 2020-01-01 0.0 3 test foo 1 2020-01-02 1.0 3 train foo 2 2020-01-03 2.0 3 check foo 3 2020-01-04 3.0 3 go foo
If we want to export only the ‘A’ and ‘D’ columns to the Excel file, we can specify these columns in the to_excel
function:
df_multi.to_excel("example_select_columns.xlsx", columns=['A', 'D'])
The resulting Excel file will only contain the ‘A’ and ‘D’ columns from the DataFrame:
| A D --|----------------- 0 | 2020-01-01 test 1 | 2020-01-02 train 2 | 2020-01-03 check 3 | 2020-01-04 go
Customizing Column Headers
The header
parameter of the to_excel
function allows you to customize the column headers of your Excel file.
You can pass a list of strings to the header
parameter, which are assumed to be aliases for the column names.
Let’s consider our previously created DataFrame, and say we want to change the column headers while exporting to an Excel file:
print(df_multi)
Output:
A B C D E 0 2020-01-01 0.0 3 test foo 1 2020-01-02 1.0 3 train foo 2 2020-01-03 2.0 3 check foo 3 2020-01-04 3.0 3 go foo
We can change the column headers as follows:
df_multi.to_excel("example_custom_headers.xlsx", header=['Date', 'Value_1', 'Value_2', 'Category', 'Label'])
In the resulting Excel file, the column headers are replaced with the names provided in the list:
| Date Value_1 Value_2 Category Label --|------------------------------------------- 0 | 2020-01-01 0.0 3 test foo 1 | 2020-01-02 1.0 3 train foo 2 | 2020-01-03 2.0 3 check foo 3 | 2020-01-04 3.0 3 go foo
Manipulating Indexes
The index
parameter allows you to control whether the DataFrame’s indexes should be included in the resulting Excel file.
By default, the index is written to the Excel file. If you want to exclude it, you can set index=False
.
Let’s take an example DataFrame:
df_index = pd.DataFrame({ 'A': ['foo', 'bar', 'baz', 'qux'], 'B': ['alpha', 'beta', 'gamma', 'delta'], 'C': np.random.rand(4) }, index=['I1', 'I2', 'I3', 'I4']) print(df_index)
Output:
A B C I1 foo alpha 0.647687 I2 bar beta 0.377563 I3 baz gamma 0.013572 I4 qux delta 0.658623
We can export this DataFrame to an Excel file without the index:
df_index.to_excel("example_no_index.xlsx", index=False)
The resulting Excel file will just contain the data, without the index:
A B C foo alpha 0.647687 bar beta 0.377563 baz gamma 0.013572 qux delta 0.658623
Controlling Excel Output Location
The to_excel
function provides options to control where in the Excel sheet your DataFrame starts being written.
You can use the startrow
and startcol
parameters to adjust the position where your DataFrame is.
Let’s write a DataFrame to an Excel file, but this time we will start at the third row and the second column:
df_position = pd.DataFrame({ 'A': ['foo', 'bar', 'baz'], 'B': ['alpha', 'beta', 'gamma'] }) df_position.to_excel("example_output_location.xlsx", startrow=2, startcol=1, index=False)
In the resulting Excel file, the data from the DataFrame starts from the third row (rows are 0-indexed) and the second column (columns are 0-indexed):
| 1 | 2 | 3 | --|------------------------ 1 | | | | 2 | | | | 3 | | A | B | 4 | | foo | alpha | 5 | | bar | beta | 6 | | baz | gamma |
As you can see, the DataFrame df_position
is written starting from the cell B3 in Excel notation (row 3, column 2).
Controlling Cell Merging using to_excel
By default, cells are merged when dealing with a DataFrame that uses MultiIndex.
However, you can use the merge_cells
parameter to control whether cells with the same value are merged in the resulting Excel file or not. Let’s create a DataFrame with MultiIndex:
df_merge = pd.DataFrame({ 'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12] }, index=[['a', 'a', 'b', 'b'], ['foo', 'bar', 'foo', 'bar']]) print(df_merge)
Output:
A B C a foo 1 5 9 bar 2 6 10 b foo 3 7 11 bar 4 8 12
If we write this DataFrame to an Excel file with merge_cells=True
, the cells with the same index value will be merged:
df_merge.to_excel("example_merge_cells_true.xlsx", merge_cells=True)
However, if we set merge_cells=False
, the cells will not be merged:
df_merge.to_excel("example_merge_cells_false.xlsx", merge_cells=False)
Output:
| 1 | 2 | 3 | 4 | --|--------------------------- a | foo | 1 | 5 | 9 | a | bar | 2 | 6 | 10 | b | foo | 3 | 7 | 11 | b | bar | 4 | 8 | 12 |
In the first case, the ‘a’ and ‘b’ index values will span two rows each in the resulting Excel file, as they are repeated for the ‘foo’ and ‘bar’ sub-index. In the second case, the ‘a’ and ‘b’ values will appear in each row without being merged.
Excel Export Savior
I was contracted by a fintech company that had introduced a new financial product to the market. The company, after six months of operation, wanted to analyze the performance data to draw insights and shape their future strategies.
The task was to perform the complex analysis in Python, then present the findings in an Excel report to our non-technical stakeholders. I used Python’s psycopg2 to connect with our PostgreSQL database and Pandas for data manipulation.
My job was to translate the analysis into Excel while preserving the data’s structure and formatting. This is where the to_excel
function in Pandas became invaluable.
Consider a DataFrame df_customer_demographics
containing a detailed demographic-based analysis. I exported it to an Excel file with:
df_customer_demographics.to_excel("analysis_report.xlsx", sheet_name='Customer Demographics', index=False)
Next, I had another DataFrame df_product_performance
, which I needed to put in the same Excel file, but on a different sheet:
with pd.ExcelWriter('analysis_report.xlsx', engine='openpyxl', mode='a') as writer: df_product_performance.to_excel(writer, sheet_name='Product Performance', index=False)
In a few hours, I created a comprehensive, multi-sheet Excel report from our large-scale data using Pandas to_excel
.
Further Reading
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.