How do I select rows from a DataFrame based on column values?

In Pandas, you can select rows from a DataFrame based on column values using Boolean indexing, the query() method, or other filtering techniques. Here are the most common and efficient approaches with examples:

1. Boolean Indexing

Use logical conditions to create a Boolean mask and filter rows.
Syntax:

python

df[df['column'] condition]

Examples:

python

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['NY', 'SF', 'LA', 'TX']
}
df = pd.DataFrame(data)

# Select rows where Age > 30
result = df[df['Age'] > 30]
print(result)
# Output:
#      Name  Age City
# 2  Charlie   35   LA
# 3    David   40   TX

# Combine conditions with & (and), | (or), ~ (not)
result = df[(df['Age'] > 25) & (df['City'] == 'SF')]
# Output:
#   Name  Age City
# 1  Bob   30   SF

2. Using `loc`

Explicitly filter rows (and optionally select columns) using loc:

python

# Select rows where City is 'LA' and show the 'Name' column
result = df.loc[df['City'] == 'LA', 'Name']
# Output:
# 2    Charlie
# Name: Name, dtype: object

# Multiple conditions
result = df.loc[(df['Age'] < 40) & (df['Name'].str.startswith('C')]
# Output:
#      Name  Age City
# 2  Charlie   35   LA

3. `query()` Method

Write SQL-like syntax for readability (especially for complex conditions):

python

result = df.query("Age > 30 and City in ['LA', 'TX']")
# Output:
#      Name  Age City
# 2  Charlie   35   LA
# 3    David   40   TX

4. Filter with `isin()`

Select rows where a column value is in a list:

python

cities = ['NY', 'LA']
result = df[df['City'].isin(cities)]
# Output:
#      Name  Age City
# 0   Alice   25   NY
# 2  Charlie   35   LA

5. String Operations

Use string methods for text filtering:

python

# Select rows where Name contains 'a' (case-sensitive)
result = df[df['Name'].str.contains('a')]
# Output:
#      Name  Age City
# 0   Alice   25   NY
# 2  Charlie   35   LA

6. `between()` for Ranges

Filter rows where a value falls within a range:

python

result = df[df['Age'].between(30, 35, inclusive='both')]
# Output:
#      Name  Age City
# 1     Bob   30   SF
# 2  Charlie   35   LA

Key Tips:

Avoid Chained Indexing: Use df.loc[] instead of df[df.column > x]['other_column'] to prevent SettingWithCopyWarning.
Performance: Boolean indexing and query() are optimized for speed.
Null Values: Use df[df['column'].notna()] or df[df['column'].isna()] for handling NaN.

Summary

Method	Use Case
Boolean Indexing	Simple to moderate conditions.
`loc`/`iloc`	Explicit row/column selection.
`query()`	Readable syntax for complex conditions.
`isin()`/`between()`	Filtering against lists or ranges.

Example for Multiple Conditions:

python

result = df.loc[
    (df['Age'] > 25) & 
    ~df['City'].isin(['TX']) & 
    df['Name'].str.contains('e')
]
# Output:
#      Name  Age City
# 2  Charlie   35   LA

Choose the method that best fits your code’s readability and performance needs!

1. Boolean Indexing

2. Using loc

3. query() Method

4. Filter with isin()

5. String Operations

6. between() for Ranges

Key Tips:

Summary

Leave a Reply Cancel reply

2. Using `loc`

3. `query()` Method

4. Filter with `isin()`

6. `between()` for Ranges