How to use a list of values to select rows from a Pandas dataframe ?

To select rows from a Pandas DataFrame using a list of values, you can use the following methods depending on your specific use case:

1. Single Column Filtering

Use isin() to filter rows where a specific column matches values in your list:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd']
})

# List of values to filter
values = [2, 3]

# Filter rows where column 'A' is in the list
filtered_df = df[df['A'].isin(values)]

Output:

   A  B
1  2  b
2  3  c

2. Multiple Columns Filtering

To filter rows based on multiple columns (e.g., matching pairs of values):

# List of tuples representing (A, B) pairs
values = [(1, 'a'), (3, 'c')]

# Create a DataFrame from the list of tuples
temp_df = pd.DataFrame(values, columns=['A', 'B'])

# Merge to find matching rows
filtered_df = df.merge(temp_df, on=['A', 'B'])

Output:

   A  B
0  1  a
1  3  c

3. Filter Rows Where Any Column Matches

Check if any column in a row contains a value from the list:

values = [2, 'c']

# Check across all columns
filtered_df = df[df.isin(values).any(axis=1)]

Output:

   A  B
1  2  b
2  3  c

4. Handling NaN Values

If your list includes NaN, combine isin with isna():

import numpy as np

# DataFrame with NaN
df = pd.DataFrame({'A': [1, np.nan, 3, 4]})
values = [1, np.nan]

# Filter rows where 'A' is in the list or NaN
mask = df['A'].isin(values) | df['A'].isna()
filtered_df = df[mask]

Output:

     A
0  1.0
1  NaN

5. Inverse Filter (Exclude List Values)

Use ~ to exclude rows where a column value is in the list:

values = [2, 3]
filtered_df = df[~df['A'].isin(values)]

Output:

   A  B
0  1  a
3  4  d

6. Using query() for Readability

Filter rows using a SQL-like syntax:

values = [2, 3]
filtered_df = df.query("A in @values")

Key Considerations

  • Performance: isin() is vectorized and efficient for large DataFrames.
  • Data Types: Ensure the list values match the column dtype (e.g., int vs. string).
  • NaN Handling: Use isna() alongside isin() to include/exclude NaN explicitly.

Summary Table

Use CaseMethodExample
Filter single columndf[df['col'].isin(values)]df[df['A'].isin([2, 3])]
Filter multiple columnsmerge with a DataFrame of tuplesdf.merge(temp_df, on=['A', 'B'])
Check any columndf.isin(values).any(axis=1)df[df.isin([2, 'c']).any(1)]
Exclude values~df['col'].isin(values)df[~df['A'].isin([2, 3])]
Handle NaNCombine isin with isna()mask = df['A'].isin([1]) | df['A'].isna()

Choose the method that best fits your data structure and requirements!

Leave a Reply

Your email address will not be published. Required fields are marked *