To select rows from a Pandas DataFrame using a list of values, you can use the following methods depending on your specific use case:
1. Single Column Filtering
Use isin()
to filter rows where a specific column matches values in your list:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
})
# List of values to filter
values = [2, 3]
# Filter rows where column 'A' is in the list
filtered_df = df[df['A'].isin(values)]
Output:
A B
1 2 b
2 3 c
2. Multiple Columns Filtering
To filter rows based on multiple columns (e.g., matching pairs of values):
# List of tuples representing (A, B) pairs
values = [(1, 'a'), (3, 'c')]
# Create a DataFrame from the list of tuples
temp_df = pd.DataFrame(values, columns=['A', 'B'])
# Merge to find matching rows
filtered_df = df.merge(temp_df, on=['A', 'B'])
Output:
A B
0 1 a
1 3 c
3. Filter Rows Where Any Column Matches
Check if any column in a row contains a value from the list:
values = [2, 'c']
# Check across all columns
filtered_df = df[df.isin(values).any(axis=1)]
Output:
A B
1 2 b
2 3 c
4. Handling NaN Values
If your list includes NaN
, combine isin
with isna()
:
import numpy as np
# DataFrame with NaN
df = pd.DataFrame({'A': [1, np.nan, 3, 4]})
values = [1, np.nan]
# Filter rows where 'A' is in the list or NaN
mask = df['A'].isin(values) | df['A'].isna()
filtered_df = df[mask]
Output:
A
0 1.0
1 NaN
5. Inverse Filter (Exclude List Values)
Use ~
to exclude rows where a column value is in the list:
values = [2, 3]
filtered_df = df[~df['A'].isin(values)]
Output:
A B
0 1 a
3 4 d
6. Using query()
for Readability
Filter rows using a SQL-like syntax:
values = [2, 3]
filtered_df = df.query("A in @values")
Key Considerations
- Performance:
isin()
is vectorized and efficient for large DataFrames. - Data Types: Ensure the list values match the column dtype (e.g.,
int
vs.string
). - NaN Handling: Use
isna()
alongsideisin()
to include/exclude NaN explicitly.
Summary Table
Use Case | Method | Example |
---|---|---|
Filter single column | df[df['col'].isin(values)] | df[df['A'].isin([2, 3])] |
Filter multiple columns | merge with a DataFrame of tuples | df.merge(temp_df, on=['A', 'B']) |
Check any column | df.isin(values).any(axis=1) | df[df.isin([2, 'c']).any(1)] |
Exclude values | ~df['col'].isin(values) | df[~df['A'].isin([2, 3])] |
Handle NaN | Combine isin with isna() | mask = df['A'].isin([1]) | df['A'].isna() |
Choose the method that best fits your data structure and requirements!