Filtering rows in a pandas DataFrame by column value is a common data manipulation task. Whether you’re cleaning data, analyzing trends, or preparing data for visualization, understanding how to efficiently filter your DataFrame is crucial. In this guide, we’ll explore various methods to filter rows based on column values in a pandas DataFrame.
Why Filter Rows in Pandas?
Filtering is essential for:
- Extracting specific data points.
- Cleaning and preprocessing data.
- Reducing data size for optimization.
- Preparing data for analysis or visualization.
Prerequisites
Make sure you have pandas installed. You can install it using pip:
1
|
pip install pandas
|
Creating a Sample DataFrame
Let’s start by creating a sample DataFrame:
1 2 3 4 5 6 7 8 9 |
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], 'Age': [24, 30, 22, 35, 29], 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'] } df = pd.DataFrame(data) |
Method 1: Filter Using Boolean Indexing
This is one of the most straightforward ways to filter DataFrame rows.
Example: Filter by Age
Suppose you want to filter out people older than 25:
1
|
filtered_df = df[df['Age'] > 25]
|
Method 2: Using the query()
Method
The query()
method provides an expressive SQL-like syntax.
Example: Filter by City
Let’s filter by people living in New York:
1
|
filtered_df = df.query('City == "New York"')
|
Method 3: Using the loc
Method
The loc
method is great when you need to filter by labels.
Example: Filter by Name
Filter records where the name is ‘Alice’:
1
|
filtered_df = df.loc[df['Name'] == 'Alice']
|
Advanced Filtering
You can also combine multiple conditions using logical operators such as &
(and) and |
(or).
Example: Combined Conditions
Filter records where Age is more than 25 and City is either ‘New York’ or ‘Chicago’:
1
|
filtered_df = df[(df['Age'] > 25) & ((df['City'] == 'New York') | (df['City'] == 'Chicago'))]
|
Related Topics
- Explore how to get the range of values in a secondary index of pandas DataFrame
- Learn how to iterate over a DataFrame from another one
- Understand how to parse XML data in a pandas DataFrame
- Find out how to get the previous item in a DataFrame
- Discover ways to display base64 images in a DataFrame
By mastering these filtering techniques, you can efficiently manage and manipulate your data within pandas DataFrames. Whether you’re conducting data analysis or preparing data sets for machine learning, these methods will significantly enhance your data processing capabilities.