To filter a pandas DataFrame using “IN” and “NOT IN” logic (similar to SQL), you can use the isin()
method combined with boolean indexing. Here’s how:
1. Filter Using IN
(Select Rows Where Column Value is in a List)
Use df[df['column'].isin(list_of_values)]
:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'age': [25, 30, 35, 40, 45]
})
# Values to filter
selected_names = ['Alice', 'Charlie', 'Eve']
# Filter rows where 'name' is in the list
filtered_df = df[df['name'].isin(selected_names)]
Output:
name age
0 Alice 25
2 Charlie 35
4 Eve 45
2. Filter Using NOT IN
(Exclude Rows Where Column Value is in a List)
Use the ~
operator to negate isin()
:
# Exclude rows where 'name' is in the list
excluded_df = df[~df['name'].isin(selected_names)]
Output:
name age
1 Bob 30
3 David 40
3. Filter with Multiple Columns
If you need to check values across multiple columns, combine conditions:
# Values to filter
ages_to_exclude = [25, 40]
# Filter where 'name' is in selected_names AND 'age' not in ages_to_exclude
filtered_df = df[
df['name'].isin(selected_names) &
~df['age'].isin(ages_to_exclude)
]
Output:
name age
2 Charlie 35
4 Eve 45
4. Using query()
for SQL-Like Syntax
For a more SQL-like approach, use df.query()
:
# Filter using "IN"
filtered_df = df.query("name in @selected_names")
# Filter using "NOT IN"
excluded_df = df.query("name not in @selected_names")
Key Notes:
isin()
works with lists, NumPy arrays, or pandas Series.- Use
~
to invert the condition (equivalent toNOT IN
). - For large DataFrames,
isin()
is faster than usingapply
or loops. - Combine multiple conditions with
&
(AND) or|
(OR) and wrap each condition in parentheses.
Example with Multiple Conditions:
# Filter rows where 'name' is in selected_names OR 'age' > 35
filtered_df = df[
df['name'].isin(selected_names) |
(df['age'] > 35)
]
Output:
name age
0 Alice 25
2 Charlie 35
3 David 40
4 Eve 45
This approach mirrors SQL’s IN
and NOT IN
functionality while leveraging pandas’ vectorized operations for efficiency.