How to drop rows of Pandas DataFrame whose value in a certain column is NaN ?

To drop rows in a Pandas DataFrame where a specific column contains NaN (missing values), you can use the dropna() method with the subset parameter. Below are detailed examples and explanations:

1. Basic Example: Drop Rows with `NaN` in a Single Column

Sample DataFrame

import pandas as pd
import numpy as np

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, np.nan, 35, 45, np.nan],
    "Salary": [50000, 60000, np.nan, 80000, 90000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  60000.0
2  Charlie  35.0      NaN
3    David  45.0  80000.0
4      Eva   NaN  90000.0

Drop Rows Where “Age” is `NaN`

df_clean = df.dropna(subset=["Age"])
print("\nDataFrame After Dropping NaN in 'Age':")
print(df_clean)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
2  Charlie  35.0      NaN
3    David  45.0  80000.0

2. Drop Rows with `NaN` in Multiple Columns

Use subset with a list of columns to drop rows where any of the specified columns have NaN:

df_clean = df.dropna(subset=["Age", "Salary"])
print(df_clean)

Output:

    Name   Age   Salary
0  Alice  25.0  50000.0
3  David  45.0  80000.0

3. Modify the DataFrame In-Place

Use inplace=True to modify the original DataFrame instead of creating a new one:

df.dropna(subset=["Salary"], inplace=True)
print(df)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  60000.0
3    David  45.0  80000.0
4      Eva   NaN  90000.0

4. Drop Rows Based on Threshold (`thresh`)

Keep rows with at least N non-NaN values in the specified subset:

# Keep rows with at least 2 non-NaN values in the subset ["Age", "Salary"]
df_clean = df.dropna(subset=["Age", "Salary"], thresh=2)
print(df_clean)

Output:

    Name   Age   Salary
0  Alice  25.0  50000.0
3  David  45.0  80000.0

5. Alternative: Boolean Indexing

Filter rows using notna():

df_clean = df[df["Age"].notna()]
print(df_clean)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
2  Charlie  35.0      NaN
3    David  45.0  80000.0

Key Parameters of `dropna()`

Parameter	Description
`subset`	Columns to check for `NaN` (e.g., `subset=["Age", "Salary"]`).
`how`	– `how='any'` (default): Drop rows if any subset column has `NaN`. – `how='all'`: Drop rows if all subset columns have `NaN`.
`thresh`	Keep rows with at least `thresh` non-`NaN` values in the subset.
`inplace`	Modify the DataFrame in-place instead of returning a new DataFrame.

Common Mistakes

Forgetting subset:

   # This drops rows with NaN in ANY column (not just "Age"):
   df.dropna()  # Incorrect if you only want to target "Age"

Ignoring inplace:

   # This does NOT modify the original DataFrame:
   df.dropna(subset=["Age"])
   # Correct approach:
   df = df.dropna(subset=["Age"])  # Or use inplace=True

Summary

Use df.dropna(subset=["column"]) to drop rows where "column" has NaN.
Combine subset with thresh to enforce a minimum number of valid values.
Prefer inplace=True to modify the DataFrame directly.

By mastering these methods, you can efficiently clean your DataFrames!

1. Basic Example: Drop Rows with NaN in a Single Column

Sample DataFrame

Drop Rows Where “Age” is NaN

2. Drop Rows with NaN in Multiple Columns

3. Modify the DataFrame In-Place

4. Drop Rows Based on Threshold (thresh)

5. Alternative: Boolean Indexing

Key Parameters of dropna()

Common Mistakes

Summary

Leave a Reply Cancel reply

1. Basic Example: Drop Rows with `NaN` in a Single Column

Drop Rows Where “Age” is `NaN`

2. Drop Rows with `NaN` in Multiple Columns

4. Drop Rows Based on Threshold (`thresh`)

Key Parameters of `dropna()`