How to delete DataFrame row in Pandas based on column value ?

To delete rows in a Pandas DataFrame based on a column value, you can use boolean indexing or the drop() method. Below are detailed methods with examples:

1. Using Boolean Indexing

Filter rows by excluding those that match the condition.

Example DataFrame:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 17, 22],
    'Gender': ['F', 'M', 'M', 'M']
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age Gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   17      M
3    David   22      M

Delete Rows Where Age < 18

df_filtered = df[df['Age'] >= 18]  # Keep rows where Age >= 18
print(df_filtered)

Output:

    Name  Age Gender
0  Alice   25      F
1    Bob   30      M
3  David   22      M

2. Using drop() with Index

Delete rows by index positions that meet the condition.

Example:

# Get indices of rows to delete
indices_to_drop = df[df['Gender'] == 'M'].index

# Drop rows by index
df_filtered = df.drop(indices_to_drop)
print(df_filtered)

Output:

    Name  Age Gender
0  Alice   25      F

3. Using query() Method

Filter rows using a query string (useful for complex conditions).

Example:

df_filtered = df.query("Age >= 18 and Gender == 'M'")
print(df_filtered)

Output:

    Name  Age Gender
1    Bob   30      M
3  David   22      M

4. Delete Rows with Specific Values

Use isin() to target multiple values.

Example:

# Delete rows where Name is 'Bob' or 'David'
df_filtered = df[~df['Name'].isin(['Bob', 'David'])]
print(df_filtered)

Output:

      Name  Age Gender
0    Alice   25      F
2  Charlie   17      M

5. Handle Missing Values

Delete rows where a column has NaN values.

Example:

import numpy as np

# Add a row with NaN
df.loc[4] = ['Eva', np.nan, 'F']

# Drop rows where 'Age' is NaN
df_filtered = df.dropna(subset=['Age'])
print(df_filtered)

Output:

      Name   Age Gender
0    Alice  25.0      F
1      Bob  30.0      M
2  Charlie  17.0      M
3    David  22.0      M

6. Invert Conditions with ~

Use the tilde operator (~) to negate a condition.

Example:

# Delete rows where Age is even
df_filtered = df[~(df['Age'] % 2 == 0)]
print(df_filtered)

Output:

      Name  Age Gender
0    Alice   25      F
2  Charlie   17      M

7. Modify DataFrame In-Place

Use inplace=True to modify the original DataFrame (use cautiously).

df.drop(df[df['Age'] < 18].index, inplace=True)
print(df)

Output:

    Name  Age Gender
0  Alice   25      F
1    Bob   30      M
3  David   22      M

Key Takeaways

MethodUse CaseExample
Boolean IndexingSimple row exclusion.df[df['Age'] >= 18]
drop()Delete by index.df.drop(indices_to_drop)
query()Complex conditions (SQL-like syntax).df.query("Age > 20")
isin()Filter multiple values.~df['Name'].isin(['Bob'])
dropna()Remove rows with missing values.df.dropna(subset=['Age'])

Common Pitfalls

  1. Chained Indexing: Avoid df[df['Age'] > 18]['Name'] (use .loc instead).
  2. In-Place Modification: Overwrites the original DataFrame (use with caution).
  3. Operator Precedence: Use parentheses for compound conditions:
   df[(df['Age'] > 18) & (df['Gender'] == 'M')]

By using these methods, you can efficiently filter or delete rows in a Pandas DataFrame based on column values.

Leave a Reply

Your email address will not be published. Required fields are marked *