In Pandas, you can select rows from a DataFrame based on column values using Boolean indexing, the query()
method, or other filtering techniques. Here are the most common and efficient approaches with examples:
1. Boolean Indexing
Use logical conditions to create a Boolean mask and filter rows.
Syntax:
python
df[df['column'] condition]
Examples:
python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['NY', 'SF', 'LA', 'TX']
}
df = pd.DataFrame(data)
# Select rows where Age > 30
result = df[df['Age'] > 30]
print(result)
# Output:
# Name Age City
# 2 Charlie 35 LA
# 3 David 40 TX
# Combine conditions with & (and), | (or), ~ (not)
result = df[(df['Age'] > 25) & (df['City'] == 'SF')]
# Output:
# Name Age City
# 1 Bob 30 SF
2. Using loc
Explicitly filter rows (and optionally select columns) using loc
:
python
# Select rows where City is 'LA' and show the 'Name' column
result = df.loc[df['City'] == 'LA', 'Name']
# Output:
# 2 Charlie
# Name: Name, dtype: object
# Multiple conditions
result = df.loc[(df['Age'] < 40) & (df['Name'].str.startswith('C')]
# Output:
# Name Age City
# 2 Charlie 35 LA
3. query()
Method
Write SQL-like syntax for readability (especially for complex conditions):
python
result = df.query("Age > 30 and City in ['LA', 'TX']")
# Output:
# Name Age City
# 2 Charlie 35 LA
# 3 David 40 TX
4. Filter with isin()
Select rows where a column value is in a list:
python
cities = ['NY', 'LA']
result = df[df['City'].isin(cities)]
# Output:
# Name Age City
# 0 Alice 25 NY
# 2 Charlie 35 LA
5. String Operations
Use string methods for text filtering:
python
# Select rows where Name contains 'a' (case-sensitive)
result = df[df['Name'].str.contains('a')]
# Output:
# Name Age City
# 0 Alice 25 NY
# 2 Charlie 35 LA
6. between()
for Ranges
Filter rows where a value falls within a range:
python
result = df[df['Age'].between(30, 35, inclusive='both')]
# Output:
# Name Age City
# 1 Bob 30 SF
# 2 Charlie 35 LA
Key Tips:
- Avoid Chained Indexing: Use
df.loc[]
instead ofdf[df.column > x]['other_column']
to preventSettingWithCopyWarning
. - Performance: Boolean indexing and
query()
are optimized for speed. - Null Values: Use
df[df['column'].notna()]
ordf[df['column'].isna()]
for handling NaN.
Summary
Method | Use Case |
---|---|
Boolean Indexing | Simple to moderate conditions. |
loc /iloc | Explicit row/column selection. |
query() | Readable syntax for complex conditions. |
isin() /between() | Filtering against lists or ranges. |
Example for Multiple Conditions:
python
result = df.loc[
(df['Age'] > 25) &
~df['City'].isin(['TX']) &
df['Name'].str.contains('e')
]
# Output:
# Name Age City
# 2 Charlie 35 LA
Choose the method that best fits your code’s readability and performance needs!