How to delete rows from a pandas DataFrame based on a conditional expression ?

To delete rows from a pandas DataFrame based on a conditional expression, you primarily use boolean indexing or the drop() method. Below is a detailed explanation with examples:

Core Concept: Boolean Indexing

The most efficient approach is to select rows that do not meet the deletion condition and overwrite the DataFrame:

df = df[~condition]  # Keep rows where condition is False

Step-by-Step Methods & Examples

1. Basic Conditional Deletion

Delete rows where a column meets a specific criterion.

Example 1: Delete rows where Score < 60.

import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Score': [75, 50, 65, 42]}
df = pd.DataFrame(data)

# Delete rows with Score < 60
df = df[df['Score'] >= 60]  # Keep rows where Score >= 60

Result:

      Name  Score
0    Alice     75
2  Charlie     65

2. Delete Rows Using drop()

Get indices of rows matching the condition and remove them:

indices = df[df['Score'] < 60].index
df = df.drop(indices)

3. Complex Conditions (AND/OR)

Combine conditions using & (AND), | (OR), and parentheses.

Example 2: Delete rows where Score < 60 OR Name == 'David'.

df = df[(df['Score'] >= 60) & (df['Name'] != 'David')]

4. Handle Missing Values (NaN)

Delete rows with NaN in a specific column using dropna():

# Delete rows where 'Score' is NaN
df = df.dropna(subset=['Score'])

5. Invert Condition with ~

Use ~ to negate a condition (keep rows where the condition is False).

Example 3: Delete rows where Name contains “li”.

df = df[~df['Name'].str.contains('li')]

Result (from original data):

    Name  Score
1    Bob     50
3  David     42

Key Notes

  1. Modify vs. Create New DataFrame:
  • Operations return a new DataFrame by default. Assign back to df (e.g., df = df[condition]) to persist changes.
  • Use inplace=True in methods like drop() to modify in-place (not recommended—chaining can cause issues).
  1. Reset Index:
    After deletion, reset the index to avoid gaps:
   df = df.reset_index(drop=True)  # Drop old index
  1. Performance: Boolean indexing is faster than drop() for large datasets.

Complete Example Workflow

# Sample DataFrame
data = {'Product': ['Apple', 'Banana', 'Cherry', 'Date'],
        'Price': [1.2, 0.5, 2.5, np.nan],
        'Stock': [10, 0, 25, 5]}
df = pd.DataFrame(data)

# Delete rows where: Price is NaN OR Stock is 0
condition = (df['Price'].isna()) | (df['Stock'] == 0)
df = df[~condition]  # Keep rows that don't match condition
df.reset_index(drop=True, inplace=True)

Output:

  Product  Price  Stock
0  Cherry    2.5     25

Summary

  • Basic Deletion: df = df[df['column'] > value]
  • Complex Conditions: Use &, |, and ~ with parentheses.
  • Handling NaNs: dropna(subset=['column'])
  • Reset Index: reset_index(drop=True)

This approach ensures efficient and readable row deletion in pandas.

Leave a Reply

Your email address will not be published. Required fields are marked *