To delete rows in a Pandas DataFrame based on a column value, you can use boolean indexing or the drop()
method. Below are detailed methods with examples:
1. Using Boolean Indexing
Filter rows by excluding those that match the condition.
Example DataFrame:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 17, 22],
'Gender': ['F', 'M', 'M', 'M']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Gender
0 Alice 25 F
1 Bob 30 M
2 Charlie 17 M
3 David 22 M
Delete Rows Where Age < 18
df_filtered = df[df['Age'] >= 18] # Keep rows where Age >= 18
print(df_filtered)
Output:
Name Age Gender
0 Alice 25 F
1 Bob 30 M
3 David 22 M
2. Using drop()
with Index
Delete rows by index positions that meet the condition.
Example:
# Get indices of rows to delete
indices_to_drop = df[df['Gender'] == 'M'].index
# Drop rows by index
df_filtered = df.drop(indices_to_drop)
print(df_filtered)
Output:
Name Age Gender
0 Alice 25 F
3. Using query()
Method
Filter rows using a query string (useful for complex conditions).
Example:
df_filtered = df.query("Age >= 18 and Gender == 'M'")
print(df_filtered)
Output:
Name Age Gender
1 Bob 30 M
3 David 22 M
4. Delete Rows with Specific Values
Use isin()
to target multiple values.
Example:
# Delete rows where Name is 'Bob' or 'David'
df_filtered = df[~df['Name'].isin(['Bob', 'David'])]
print(df_filtered)
Output:
Name Age Gender
0 Alice 25 F
2 Charlie 17 M
5. Handle Missing Values
Delete rows where a column has NaN
values.
Example:
import numpy as np
# Add a row with NaN
df.loc[4] = ['Eva', np.nan, 'F']
# Drop rows where 'Age' is NaN
df_filtered = df.dropna(subset=['Age'])
print(df_filtered)
Output:
Name Age Gender
0 Alice 25.0 F
1 Bob 30.0 M
2 Charlie 17.0 M
3 David 22.0 M
6. Invert Conditions with ~
Use the tilde operator (~
) to negate a condition.
Example:
# Delete rows where Age is even
df_filtered = df[~(df['Age'] % 2 == 0)]
print(df_filtered)
Output:
Name Age Gender
0 Alice 25 F
2 Charlie 17 M
7. Modify DataFrame In-Place
Use inplace=True
to modify the original DataFrame (use cautiously).
df.drop(df[df['Age'] < 18].index, inplace=True)
print(df)
Output:
Name Age Gender
0 Alice 25 F
1 Bob 30 M
3 David 22 M
Key Takeaways
Method | Use Case | Example |
---|---|---|
Boolean Indexing | Simple row exclusion. | df[df['Age'] >= 18] |
drop() | Delete by index. | df.drop(indices_to_drop) |
query() | Complex conditions (SQL-like syntax). | df.query("Age > 20") |
isin() | Filter multiple values. | ~df['Name'].isin(['Bob']) |
dropna() | Remove rows with missing values. | df.dropna(subset=['Age']) |
Common Pitfalls
- Chained Indexing: Avoid
df[df['Age'] > 18]['Name']
(use.loc
instead). - In-Place Modification: Overwrites the original DataFrame (use with caution).
- Operator Precedence: Use parentheses for compound conditions:
df[(df['Age'] > 18) & (df['Gender'] == 'M')]
By using these methods, you can efficiently filter or delete rows in a Pandas DataFrame based on column values.