20+ Examples of filtering Pandas DataFrame
Data filtering has become an integral phase of data analysis wherein we have to study the data to exclude, sort, rearrange, extract specific data, or apportion data according to certain criteria.
In other words, data filtering is the process of extracting a smaller part of the larger dataset to perform specific operations on it. Pandas provides a wide range of tools, methods, and techniques to perform filtering on data.
We often perform data filtering to extract or operate on a small set of data for a particular time, calculate certain operations for a group of data as per interest, exclude bad, missing, or junk data for data cleansing, or validate statistical models.
This tutorial will give you a comprehensive idea of various data filtering techniques and methods one can use on DataFrames in Pandas.
Table of Contents
Examples of Data filtering
In data analysis, data filtering – also known as ‘Subsetting Data,’ is an initial phase for predictive modeling or any reporting data project.
Before digging deep into the practical aspects of applying data filtering through various techniques, let us first explore some practical examples of data filtering.
- We can filter out data of all active customers who opened their accounts on the first day of April 2022.
- Extract specific details of the customer as per region, geolocation, address, etc.
- Fetch details of employees who have spent more than 4 years in the company.
- Analyze complaint data or identify customers who have complained more than five times a week using filtering.
- Filter out details of metro cities where per-capita income is greater than 40,000.
Using filter() method
It is an easy-to-use method for filtering data from a DataFrame. It returns the subset rows or columns of DataFrame as per the labels mentioned in the specified items parameter.
We can also use filter() for regex (regular expression) in Python. Its syntax is:
DataFrame.filter(self, items = [label1, label2, ….], like = None, regex = None, axis = None)
Here is a code snippet showing how to use it.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jacky', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df.filter(items = ['Name', 'Designation']))
Output
Filtering using head() and tail()
The very basic approach to filtering the top-most or bottom-most data from a DataFrame is leveraging the head() and tail() functions.
Let’s understand what the head() and tail() do and how we can use them by passing and without passing parameters.
Using head()
The head(no) is a well-known DataFrame function of the Pandas that allows us to extract or filter out DataFrame values by returning the first set of rows specified in the parameter.
In case, the parameter is left blank, it will return the first five rows from the dataframe by default (top 5 rows).
To return a greater or lesser number of rows from the DataFrame, the number of rows needs to be specified. The code shown below is an example of how to use head().
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 400000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stephen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp ID', 'Designation', 'Monthly Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) # calling head() without parameter print(df.head()) print() # calling head() with parameter print(df.head(2))
Output
Using tail()
The tail(no) is another well-known DataFrame function of the Pandas that allows us to extract or filter out DataFrame values by returning the last set of rows specified in the parameter.
In case, the parameter is left blank, it will return the last five rows from the dataframe by default (top 5 rows).
To return a greater or lesser number of rows from the last from within the DataFrame, the number of rows needs to be specified. The code shown below is an example of how to use tail().
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 400000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stephen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp ID', 'Designation', 'Monthly Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) # calling head() without parameter print(df.tail()) print() # calling head() with parameter print(df.tail(2))
Output
Slicing and dicing DataFrame using .loc and .iloc
Slicing and dicing a large dataset into a smaller dataset gives us a clarification of what to use & what not to use.
It is a filtering mechanism to extract the precise value that can help in identifying a concept or getting granular insight from massive data.
In Pandas, slicing is possible using the .loc[ ] and .iloc[ ] methods.
Using .loc[]:
The loc property, used through the .loc[] helps in accessing and filtering the group of rows & columns by slicing them or returning a Boolean array.
By passing the “starting value: ending value” within the square brackets, we can specify the number of rows or columns it will slice to filter out specific data.
Syntax –
loc[row-label, column-label] iloc[row-position, column-position]
Here is the code snippet that shows how to implement .loc[]
import pandas as pd a = pd.DataFrame ({'Brand': ['Royal Enfield', 'Honda', 'Orbea', 'SCOTT', 'Hero', 'BMC', 'Suzuki', 'KTM', 'Giant', 'Kona'], 'Year': [2016, 2018, 2014, 2015, 2018, 2018, 2019, 2020, 2022, 2021], 'Kms Driven': [22000, 34000, 23000, 20000, 25000, 24000, 23000, 26000, 29000, 30000], 'Price': ['Rs. 1.4K', 'Rs. 1.8K', 'Rs. 2.1K', 'Rs. 1.5K', 'Rs. 0.9K', 'Rs. 1.4K','Rs. 2.2K', 'Rs. 2.1K', 'Rs. 1.45K', 'Rs. 1.65K'], 'Mileage': [21, 22, 21, 23, 24, 21, 24, 23, 24, 27]}) # Here, we print the DataFrame print (a.loc[3: 8])
Output
Using .iloc[] :
The iloc[] is an indexed-based data selecting method in Python Pandas. It helps in fetching only the specific row and column value intersecting as specified within the [] of the iloc.
In other words, the users need to pass two integers (row and column intersection – that starts from 0) as an index value within the [] to specify which row or column value they want to select specifically.
Syntax:
loc[row-number, column-number]
Here is the code snippet that shows how to implement .iloc[]
import pandas as pd x = pd.DataFrame ({'Brand': ['Royal Enfield', 'Honda', 'Orbea', 'SCOTT', 'Hero', 'BMC', 'Suzuki', 'KTM', 'Giant', 'Kona'], 'Year': [2016, 2018, 2014, 2015, 2018, 2018, 2019, 2020, 2022, 2021], 'Kms Driven': [22000, 34000, 23000, 20000, 25000, 24000, 23000, 26000, 29000, 30000], 'Price': ['Rs. 1.4K', 'Rs. 1.8K', 'Rs. 2.1K', 'Rs. 1.5K', 'Rs. 0.9K', 'Rs. 1.4K','Rs. 2.2K', 'Rs. 2.1K', 'Rs. 1.45K', 'Rs. 1.65K'], 'Mileage': [21, 22, 21, 23, 24, 21, 24, 23, 24, 27]}) # Here, we print the DataFrame print(x.iloc[6, 2])
Output
Filtering Pandas DataFrame using Operators
There are different operators we can use to filter specific data. We have to create those data filtering techniques through expressions with operators.
Various operators allow us to filter data from a DataFrame. Here are 2 code snippets that are showing how to implement relational operators.
Method 1: Using Relational operators
We can also use relational operators like >, >=, <, <=, etc. to perform the filtering of specific rows from a DataFrame.
Here is the code snippet that shows how to implement the relational operator.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 400000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stephen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df[df.Monthly_Salary > 320000])
Output
The relational operator also works with strings. If the string.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 400000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stephen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df[df.Name > 'Jack'])
Output
Method 2: Using logical operators
We can also use logical operators like & and | to perform the filtering of specific rows from a DataFrame. Here is the code snippet that shows how to implement the logical operator.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 400000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stephen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df[(df.Monthly_Salary > 320000) | (df.Name == 'Iris')])
Output
Method 3: Using the Not logic (~)
The not logic will negate an expression or filtering mechanism so that all other conditions to extract the rows except that negated condition get fetched from the DataFrame.
Here is the code snippet that shows how to implement not logic.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 400000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df[~df.Name.str.startswith('S')])
Output
Filtering using isin() method
The isin() method is another significant way of filtering data with multiple conditions applied. In other words, it can extract those data from DataFrame rows that have the specific condition mentioned within the isin() method.
Here is the code snippet that shows how to implement isin() method.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jack', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df) print() f = df[df["Monthly_Salary"].isin([320000])] print(f)
Output
Filtering using str accessor
Pandas is known for its efficient textual data handling. The str accessor provides adjustable methods for filtering rows based on strings.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jacky', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df[df.Name.str.endswith('y')])
Output
Filter DataFrame using query() method
The query method provides us with clear and multi-level flexibility for extracting filtered data by composing the conditions for extracting data from DataFrame. Mostly, we pass the conditions as a string.
Here is a code snippet showing how to implement it.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jacky', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df.query('Emp_ID == "A102" or Monthly_Salary > 400000'))
Output
Using nlargest or nsmallest for filtering
Often we come across situations where filtering data from a range is not what we need. There are situations like finding those rows from the DataFrame that have the largest & smallest values.
In that case, we prefer to go with specific methods like nlargest() and nsmallest().
These methods will fetch the ‘n’ largest or smallest records from the DataFrame based on the parameters passed.
The first parameter determines the number of rows to extract, while the second parameter determines based on which column’s value the data will get extracted.
- nlargest(): Fetches the largest value-based rows as per the parameters given. The syntax is:
nlargest(number_of_rows, column_name)
Here is a code snippet showing the use of both.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jacky', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df.nlargest(3, 'Monthly_Salary'))
Output
- nsmallest(): Fetches the smallest value-based rows as per the parameters given. The syntax is:
nsmallest(number_of_rows, column_name)
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jacky', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df.nsmallest(3, 'Monthly_Salary'))
Output
Filtering data using the between()
The between method is another essential method that filters out a specific set of rows and columns from a DataFrame based on the data specified as a given range as its parameter. Here is a code snippet showing how to use it.
import pandas as pd d = [['Ray', "A101", 'CTO', 530000], ['Gaurav', "A102", 'CMO', 410000], ['Dee', "A103", 'Marketing Head', 370000], ['Jacky', "A104", 'Security Head', 320000], ['Sue', "A105", 'CFO', 320000], ['Iris', "A106", 'IT Manager', 210000], ['Stefen', "A107", 'Lead Developer', 310000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) print(df[df["Monthly_Salary"].between(300000, 400000)])
Output
Filter non-missing data from DataFrame
We can filter out those rows that have missing data within a DataFrame. We can do it using the dropna() method.
It is also feasible to drop a row or multiple rows if it has 2 NaN values within it by specifying the tresh value. df.dropna(thresh = 2): This operation will drop all rows where there are at least two NaN values.
Now, we can print out the filtered DataFrame after dropping all NaN values. Here is a code snippet that shows how to use it.
import pandas as pd d = [['Ray', 101, 'CTO', 530000], ['Gaurav', 102, 'CMO', 410000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', None, 'Security Head'], ['Sue', 105, 'CFO', 320000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer']] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) z = df.dropna() print(z)
Output
Filter missing values using the notnull() method
The notnull() method helps in detecting non-missing values for objects like Series and DataFrame. It will return a true/false value and from that, we can filter out those rows that have missing values.
Here is an example of how to use it.
import pandas as pd d = [['Ray', 101, 'CTO', 530000], ['Gaurav', 102, 'CMO', 410000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', 104, 'Security Head'], ['Sue', 105, 'CFO', 320000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer']] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) z = df[df.Monthly_Salary.notnull()] print(z)
Output
Filtering DataFrame value using .at[]
We can use the .at[] to pin-point specific values from the DataFrame so that we can filter out what is there in that particular position or intersecting point. Here is a code showing how we can use it through a loop.
import pandas as pd d = [['Ray', 101, 'Chief Technical Officer', 530000], ['Gaurav', 102, 'Chief Marketing Officer', 450000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', 104, 'Security Head', 390000], ['Sue', 105, 'Chief Finance Officer', 410000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer', 250000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = [1, 2, 3, 4, 5, 6, 7]) for row in range(1,8): z = df.at[row,'Designation'] print(z) print() for row in range(1,8): z = df.at[row,'Monthly_Salary'] print(z)
Output
Filter strings and substrings
We can also filter specific rows of a DataFrame based on the strings and substrings. For this, we can use the contains() method.
The contains() is similar to that of the LIKE in SQL. Here is a code explaining how to filter rows from a DataFrame using substrings.
import pandas as pd d = [['Ray', 101, 'Chief Technical Officer', 530000], ['Gaurav', 102, 'Chief Marketing Officer', 410000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', 104, 'Security Head'], ['Sue', 105, 'Chief Finance Officer', 320000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer']] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) z = df[df['Designation'].str.contains("Officer")] print(z)
Output
Filter DataFrame by rows position and column names
We can specify the index locations followed by the column names to filter out the exact set of rows from the DataFrame.
In this filtering technique, we will need the .loc[] and .index[] to perform the operation. Here is an example showing how to use it.
import pandas as pd d = [['Ray', 101, 'Chief Technical Officer', 530000], ['Gaurav', 102, 'Chief Marketing Officer', 410000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', 104, 'Security Head'], ['Sue', 105, 'Chief Finance Officer', 320000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer']] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) z= df.loc[df.index[0:4], ["Name", "Emp_ID"]] print(z)
Output
Use Lambda method for Filtering
Lambda is a shortcut approach to define various user-defined functions. With the help of Lambda, we can perform various operations through a single line of code. Here is an example showing how to use it.
import pandas as pd d = [['Ray', 101, 'Chief Technical Officer', 530000], ['Gaurav', 102, 'Chief Marketing Officer', 450000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', 104, 'Security Head'], ['Sue', 105, 'Chief Finance Officer', 410000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer']] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) z= df[df.apply(lambda x: x["Designation"] == 'Chief Marketing Officer' and x["Monthly_Salary"] >= 400000, axis = 1)] print(z) z= df[df.apply(lambda x: x["Designation"] == 'Chief Finance Officer' and x["Monthly_Salary"] >= 400000, axis = 1)] print(z)
Output
Filtering through the list comprehension technique
List comprehension is a syntactic construct that helps to extract data from an array-type object into a list. It uses the set-builder notation of mathematics to generate a new data object from an existing one.
It is another essential way to filter out a certain set of data.
We can also perform other basic operations on those filtered data within that list comprehension. Here is an example showing how to use it.
import pandas as pd data = {'name': ['Ray', 'Gaurav', 'Sue', 'Iris', 'Dee'], 'year': [2018, 2019, 2022, 2020, 2021], 'joining': [4, 2, 3, 1, 4]} df = pd.DataFrame(data, index = ['New Delhi', 'Noida', 'Mumbai', 'Bangaluru', 'Chennai']) print(df) print() z = df['next_year'] = [row+1 for row in df['year']] print("They have/will complete one year in: ", z, " respectively")
Output
Filter DataFrame using regex (Regular expressions)
The regular expression is a special programming feature that helps us sequence the characters to form a search pattern based on strings/characters.
Regular expression helps us determine whether a string contains the specified search pattern. We can also use it to filter specific rows from a DataFrame. Here is a code snippet showing how to implement it.
import pandas as pd d = [['Ray', 101, 'Chief Technical Officer', 530000], ['Gaurav', 102, 'Chief Marketing Officer', 450000], ['Dee', 103, 'Marketing Head', 370000], ['George', 104, 'Security Head', 390000], ['Sue', 105, 'Chief Finance Officer', 410000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer', 250000]] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = [1, 2, 3, 4, 5, 6, 7]) regex = 'G.*' print ("Applying ", regex, " on DataFrame to extract names starting with G: \n", df[df.Name.str.match(regex)]) print() regex = 'S.*' print ("Applying ", regex, " on DataFrame to extract names starting with S: \n", df[df.Name.str.match(regex)])
Output
Visualizing Filtered data
In data science, apart from data filtering, it is also essential to visualize the data so that professionals can get a clear picture of how the data is communicating.
For data visualization, we can use the Matplotlib library. Here is a code snippet showing how to use it.
import pandas as pd import matplotlib.pyplot as plt d = [['Ray', 101, 'Chief Technical Officer', 530000], ['Gaurav', 102, 'Chief Marketing Officer', 450000], ['Dee', 103, 'Marketing Head', 370000], ['Jacky', 104, 'Security Head', 390000], ['Sue', 105, 'Chief Finance Officer', 410000], ['Iris', 106, 'IT Manager', 210000], ['Stefen', 107, 'Lead Developer']] df = pd.DataFrame(d, columns = ['Name', 'Emp_ID', 'Designation', 'Monthly_Salary'], index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']) df= df[df.apply(lambda x: x["Designation"] == 'Chief Marketing Officer' or x["Monthly_Salary"] >= 300000, axis = 1)] print(df) plt.bar(df.Name, df.Monthly_Salary, width = 0.4) plt.show()
Output
Conclusion
We hope this tutorial has given a crisp idea of the various ways through which we can perform filtering data in DataFrame. It is always essential to filter data from a large dataset so that, we can precisely analyze & deal with the right set of data.
Here we have highlighted some methods that can help filter out the data based on certain conditions. We have also used various operators, techniques, and expressions to perform data filtering.
Lastly, we have seen how to visualize the filtered data of a DataFrame using a visualization library.
Further Reading
For further reading about filtering DataFrames, check this:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html
Gaurav is a Full-stack (Sr.) Tech Content Engineer (6.5 years exp.) & has a sumptuous passion for curating articles, blogs, e-books, tutorials, infographics, and other web content. Apart from that, he is into security research and found bugs for many govt. & private firms across the globe. He has authored two books and contributed to more than 500+ articles and blogs. He is a Computer Science trainer and loves to spend time with efficient programming, data science, Information privacy, and SEO. Apart from writing, he loves to play foosball, read novels, and dance.