How to filter Pandas dataframe using 'in' and 'not in' like in SQL?

To filter a pandas DataFrame using “IN” and “NOT IN” logic (similar to SQL), you can use the isin() method combined with boolean indexing. Here’s how:

1. Filter Using `IN` (Select Rows Where Column Value is in a List)

Use df[df['column'].isin(list_of_values)]:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 45]
})

# Values to filter
selected_names = ['Alice', 'Charlie', 'Eve']

# Filter rows where 'name' is in the list
filtered_df = df[df['name'].isin(selected_names)]

Output:

      name  age
0    Alice   25
2  Charlie   35
4      Eve   45

2. Filter Using `NOT IN` (Exclude Rows Where Column Value is in a List)

Use the ~ operator to negate isin():

# Exclude rows where 'name' is in the list
excluded_df = df[~df['name'].isin(selected_names)]

Output:

    name  age
1    Bob   30
3  David   40

3. Filter with Multiple Columns

If you need to check values across multiple columns, combine conditions:

# Values to filter
ages_to_exclude = [25, 40]

# Filter where 'name' is in selected_names AND 'age' not in ages_to_exclude
filtered_df = df[
    df['name'].isin(selected_names) & 
    ~df['age'].isin(ages_to_exclude)
]

Output:

      name  age
2  Charlie   35
4      Eve   45

4. Using `query()` for SQL-Like Syntax

For a more SQL-like approach, use df.query():

# Filter using "IN"
filtered_df = df.query("name in @selected_names")

# Filter using "NOT IN"
excluded_df = df.query("name not in @selected_names")

Key Notes:

isin() works with lists, NumPy arrays, or pandas Series.
Use ~ to invert the condition (equivalent to NOT IN).
For large DataFrames, isin() is faster than using apply or loops.
Combine multiple conditions with & (AND) or | (OR) and wrap each condition in parentheses.

Example with Multiple Conditions:

# Filter rows where 'name' is in selected_names OR 'age' > 35
filtered_df = df[
    df['name'].isin(selected_names) | 
    (df['age'] > 35)
]

Output:

      name  age
0    Alice   25
2  Charlie   35
3    David   40
4      Eve   45

This approach mirrors SQL’s IN and NOT IN functionality while leveraging pandas’ vectorized operations for efficiency.

How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL?

1. Filter Using `IN` (Select Rows Where Column Value is in a List)

2. Filter Using `NOT IN` (Exclude Rows Where Column Value is in a List)

3. Filter with Multiple Columns

4. Using `query()` for SQL-Like Syntax

Key Notes:

Example with Multiple Conditions:

Leave a Reply Cancel reply

1. Filter Using IN (Select Rows Where Column Value is in a List)

2. Filter Using NOT IN (Exclude Rows Where Column Value is in a List)

3. Filter with Multiple Columns

4. Using query() for SQL-Like Syntax

Key Notes:

Example with Multiple Conditions:

Leave a Reply Cancel reply

1. Filter Using `IN` (Select Rows Where Column Value is in a List)

2. Filter Using `NOT IN` (Exclude Rows Where Column Value is in a List)

4. Using `query()` for SQL-Like Syntax