Reverse Pandas DataFrame Rows and Columns
While working with DataFrames, you may need to reverse the order of rows, columns, or even values to visualize data from a different perspective or simply to alter the sequence.
This tutorial will guide you through different techniques to reverse DataFrames in various contexts.
We’ll explore benchmarks and real-world applications of reversing DataFrames.
- 1 Reversing Row Order
- 2 Reversing Column Order
- 3 Reversing Both Rows and Columns
- 4 Reversing rows with sort_index
- 5 Reversing columns with sort_index
- 6 Custom Reversing
- 7 Reversing levels in multi-index DataFrames
- 8 Reverse String Column Content Using .str[]
- 9 Benchmark of Reversal Methods
- 10 Reversing DataFrames in Real-world Applications
Reversing Row Order
Reversing row order can be done using various methods:
Using the [::-1] slicing technique
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12] }) reversed_df = df[::-1] print(reversed_df)
Output:
A B C 3 4 8 12 2 3 7 11 1 2 6 10 0 1 5 9
Using iloc property for row-wise reversal
reversed_df_iloc = df.iloc[::-1] print(reversed_df_iloc)
Output:
A B C 3 4 8 12 2 3 7 11 1 2 6 10 0 1 5 9
iloc
property is a integer-location based indexing for selection by position.
Reversing Column Order
You can use the loc
property, combined with the [::-1] slicing technique to reverse column order in a similar way we reversed the row order.
reversed_columns_df = df.loc[:, ::-1] print(reversed_columns_df)
Output:
C B A 0 9 5 1 1 10 6 2 2 11 7 3 3 12 8 4
Reversing Both Rows and Columns
Reversing both rows and columns results in a DataFrame that is a mirror image of the original.
reversed_rows_and_columns = df.iloc[::-1, ::-1] print(reversed_rows_and_columns)
Output:
C B A 3 12 8 4 2 11 7 3 1 10 6 2 0 9 5 1
Reversing rows with sort_index
The sort_index
method, when used with the axis=0
argument, it sorts the DataFrame based on the row indices.
reversed_rows_sort_index = df.sort_index(axis=0, ascending=False) print(reversed_rows_sort_index)
Output:
A B C 3 4 8 12 2 3 7 11 1 2 6 10 0 1 5 9
Setting the ascending=False
argument ensures that the rows are sorted in descending order, reversing them.
This method is useful when you want to reverse the rows of a DataFrame that may not necessarily have a default integer index.
Reversing columns with sort_index
The sort_index
method, when applied with axis=1
, targets the DataFrame columns.
This technique is useful when dealing with DataFrames that have non-sequential or custom column labels.
reversed_columns_sort_index = df.sort_index(axis=1, ascending=False) print(reversed_columns_sort_index)
Output:
C B A 0 9 5 1 1 10 6 2 2 11 7 3 3 12 8 4
By setting the ascending=False
parameter, the columns are arranged in descending order based on their labels, effectively reversing them.
Custom Reversing
You can use the sort_values
method to reverse a DataFrame based on custom criteria.
Imagine you have a DataFrame with sales data, and you want to reverse the order based on the sales column.
sales_df = pd.DataFrame({ 'Product': ['A', 'B', 'C', 'D'], 'Sales': [100, 250, 75, 300] }) # Reverse based on Sales column reversed_sales_df = sales_df.sort_values(by='Sales', ascending=False) print(reversed_sales_df)
Output:
Product Sales 3 D 300 1 B 250 0 A 100 2 C 75
By using the sort_values
method with the by
parameter set to ‘Sales’, we’ve reversed the DataFrame based on the sales values in descending order.
Reverse based on multiple criteria
You can add all criteria items to the by parameter.
# Adding a row with the same sales figure for demonstration sales_df = sales_df._append({'Product': 'E', 'Sales': 100}, ignore_index=True) # Reverse based on Sales and then Product reversed_sales_multi_df = sales_df.sort_values(by=['Sales', 'Product'], ascending=[False, True]) print(reversed_sales_multi_df)
Output:
Product Sales 3 D 300 1 B 250 0 A 100 4 E 100 2 C 75
Reversing levels in multi-index DataFrames
Let’s first create a sample multi-index DataFrame for context.
arrays = [ ['A', 'A', 'B', 'B'], [1, 2, 1, 2] ] index = pd.MultiIndex.from_arrays(arrays, names=('letters', 'numbers')) multi_df = pd.DataFrame({ 'Data': [10, 20, 30, 40] }, index=index) print(multi_df)
Output:
Data letters numbers A 1 10 2 20 B 1 30 2 40
Our DataFrame multi_df
is indexed by two levels: ‘letters’ and ‘numbers’.
Reversing Multiindex levels
To reverse the levels of a multi-index DataFrame:
reversed_levels_df = multi_df.swaplevel('letters', 'numbers') print(reversed_levels_df)
Output:
Data numbers letters 1 A 10 2 A 20 1 B 30 2 B 40
The swaplevel
method allows you to swap the positions of two levels. Here, we’ve swapped the ‘letters’ and ‘numbers’ levels, reversing their order.
Reverse String Column Content Using .str[]
To begin with, let’s set up a sample DataFrame that contains string data.
str_df = pd.DataFrame({ 'Names': ['Alice', 'Bob', 'Charlie', 'David'] }) print(str_df)
Output:
Names 0 Alice 1 Bob 2 Charlie 3 David
You can use str[::-1] to reverse each string in the ‘Names’ column:
str_df['Reversed_Names'] = str_df['Names'].str[::-1] print(str_df)
Output:
Names Reversed_Names 0 Alice ecilA 1 Bob boB 2 Charlie eilrahC 3 David divaD
Benchmark of Reversal Methods
For this benchmark, consider a large DataFrame with 500 million rows.
Row Reversal
import numpy as np import pandas as pd import time # Create a large sample DataFrame large_df = pd.DataFrame({ 'A': np.random.rand(500000000), 'B': np.random.rand(500000000) }) # Measure time for [::-1] slicing technique start_time = time.time() reversed_df_slice = large_df[::-1] slice_duration = time.time() - start_time # Measure time for iloc method start_time = time.time() reversed_df_iloc = large_df.iloc[::-1] iloc_duration = time.time() - start_time # Measure time for sort_index method start_time = time.time() reversed_df_sort = large_df.sort_index(ascending=False) sort_duration = time.time() - start_time print(f"Slicing [::-1] took: {slice_duration:.5f} seconds") print(f"iloc method took: {iloc_duration:.5f} seconds") print(f"sort_index method took: {sort_duration:.5f} seconds")
Output:
Slicing [::-1] took: 0.14657 seconds iloc method took: 0.01631 seconds sort_index method took: 190.51526 seconds
Slicing and iloc methods are significantly faster in reversing DataFrames.
Column Reversal
import numpy as np import time import pandas as pd # Create a large sample DataFrame with many columns data = {f'col_{i}': np.random.rand(1000) for i in range(1000000)} large_df_cols = pd.DataFrame(data) # Measure time for loc method start_time = time.time() reversed_df_cols_loc = large_df_cols.loc[:, ::-1] loc_duration = time.time() - start_time # Measure time for sort_index method on columns start_time = time.time() reversed_df_cols_sort = large_df_cols.sort_index(axis=1, ascending=False) sort_duration = time.time() - start_time print(f"Using loc directly took: {loc_duration:.5f} seconds") print(f"sort_index method on columns took: {sort_duration:.5f} seconds")
Output:
Using loc directly took: 0.00100 seconds sort_index method on columns took: 48.66070 seconds
Using loc is much faster in column reversal.
Reversing DataFrames in Real-world Applications
One of the applications where reversing is crucial is in the area of Sequence-to-Sequence (Seq2Seq) models, especially in machine translation.
Research has shown that reversing the order of the input sequences (but not the target sequences) leads to faster convergence and better overall performance in some Seq2Seq models.
The hypothesis is that this makes it easier for the model to establish a strong connection between the input’s early parts (now at the end of the reversed sequence) and the target sequence during training.
import pandas as pd # Sample DataFrame with English and French sentence pairs data = { 'English': ['Hello world', 'I love Python', 'Pandas is great'], 'French': ['Bonjour le monde', 'J’aime Python', 'Pandas est génial'] } df = pd.DataFrame(data) # Reversing the English sentences df['English_Reversed'] = df['English'].str.split().apply(lambda x: ' '.join(x[::-1])) print(df[['English_Reversed', 'French']])
Output:
English_Reversed French 0 world Hello Bonjour le monde 1 Python love J’aime Python 2 great is Pandas Pandas est génial
This reversed English sequence can be fed into the Seq2Seq model during training.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.