To delete rows from a pandas DataFrame based on a conditional expression, you primarily use boolean indexing or the drop()
method. Below is a detailed explanation with examples:
Core Concept: Boolean Indexing
The most efficient approach is to select rows that do not meet the deletion condition and overwrite the DataFrame:
df = df[~condition] # Keep rows where condition is False
Step-by-Step Methods & Examples
1. Basic Conditional Deletion
Delete rows where a column meets a specific criterion.
Example 1: Delete rows where Score < 60
.
import pandas as pd
import numpy as np
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Score': [75, 50, 65, 42]}
df = pd.DataFrame(data)
# Delete rows with Score < 60
df = df[df['Score'] >= 60] # Keep rows where Score >= 60
Result:
Name Score
0 Alice 75
2 Charlie 65
2. Delete Rows Using drop()
Get indices of rows matching the condition and remove them:
indices = df[df['Score'] < 60].index
df = df.drop(indices)
3. Complex Conditions (AND/OR)
Combine conditions using &
(AND), |
(OR), and parentheses.
Example 2: Delete rows where Score < 60
OR Name == 'David'
.
df = df[(df['Score'] >= 60) & (df['Name'] != 'David')]
4. Handle Missing Values (NaN
)
Delete rows with NaN
in a specific column using dropna()
:
# Delete rows where 'Score' is NaN
df = df.dropna(subset=['Score'])
5. Invert Condition with ~
Use ~
to negate a condition (keep rows where the condition is False).
Example 3: Delete rows where Name
contains “li”.
df = df[~df['Name'].str.contains('li')]
Result (from original data):
Name Score
1 Bob 50
3 David 42
Key Notes
- Modify vs. Create New DataFrame:
- Operations return a new DataFrame by default. Assign back to
df
(e.g.,df = df[condition]
) to persist changes. - Use
inplace=True
in methods likedrop()
to modify in-place (not recommended—chaining can cause issues).
- Reset Index:
After deletion, reset the index to avoid gaps:
df = df.reset_index(drop=True) # Drop old index
- Performance: Boolean indexing is faster than
drop()
for large datasets.
Complete Example Workflow
# Sample DataFrame
data = {'Product': ['Apple', 'Banana', 'Cherry', 'Date'],
'Price': [1.2, 0.5, 2.5, np.nan],
'Stock': [10, 0, 25, 5]}
df = pd.DataFrame(data)
# Delete rows where: Price is NaN OR Stock is 0
condition = (df['Price'].isna()) | (df['Stock'] == 0)
df = df[~condition] # Keep rows that don't match condition
df.reset_index(drop=True, inplace=True)
Output:
Product Price Stock
0 Cherry 2.5 25
Summary
- Basic Deletion:
df = df[df['column'] > value]
- Complex Conditions: Use
&
,|
, and~
with parentheses. - Handling NaNs:
dropna(subset=['column'])
- Reset Index:
reset_index(drop=True)
This approach ensures efficient and readable row deletion in pandas.