Category

How to Filter Rows in a Pandas Dataframe by Column Value?

2 minutes read

Filtering rows in a pandas DataFrame by column value is a common data manipulation task. Whether you’re cleaning data, analyzing trends, or preparing data for visualization, understanding how to efficiently filter your DataFrame is crucial. In this guide, we’ll explore various methods to filter rows based on column values in a pandas DataFrame.

Why Filter Rows in Pandas?

Filtering is essential for:

  • Extracting specific data points.
  • Cleaning and preprocessing data.
  • Reducing data size for optimization.
  • Preparing data for analysis or visualization.

Prerequisites

Make sure you have pandas installed. You can install it using pip:

1
pip install pandas

Creating a Sample DataFrame

Let’s start by creating a sample DataFrame:

1
2
3
4
5
6
7
8
9
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [24, 30, 22, 35, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

Method 1: Filter Using Boolean Indexing

This is one of the most straightforward ways to filter DataFrame rows.

Example: Filter by Age

Suppose you want to filter out people older than 25:

1
filtered_df = df[df['Age'] > 25]

Method 2: Using the query() Method

The query() method provides an expressive SQL-like syntax.

Example: Filter by City

Let’s filter by people living in New York:

1
filtered_df = df.query('City == "New York"')

Method 3: Using the loc Method

The loc method is great when you need to filter by labels.

Example: Filter by Name

Filter records where the name is ‘Alice’:

1
filtered_df = df.loc[df['Name'] == 'Alice']

Advanced Filtering

You can also combine multiple conditions using logical operators such as & (and) and | (or).

Example: Combined Conditions

Filter records where Age is more than 25 and City is either ‘New York’ or ‘Chicago’:

1
filtered_df = df[(df['Age'] > 25) & ((df['City'] == 'New York') | (df['City'] == 'Chicago'))]

Related Topics

By mastering these filtering techniques, you can efficiently manage and manipulate your data within pandas DataFrames. Whether you’re conducting data analysis or preparing data sets for machine learning, these methods will significantly enhance your data processing capabilities.