How to drop rows of Pandas DataFrame whose value in a certain column is NaN ?

To drop rows in a Pandas DataFrame where a specific column contains NaN (missing values), you can use the dropna() method with the subset parameter. Below are detailed examples and explanations:

1. Basic Example: Drop Rows with NaN in a Single Column

Sample DataFrame

import pandas as pd
import numpy as np

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, np.nan, 35, 45, np.nan],
    "Salary": [50000, 60000, np.nan, 80000, 90000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  60000.0
2  Charlie  35.0      NaN
3    David  45.0  80000.0
4      Eva   NaN  90000.0

Drop Rows Where “Age” is NaN

df_clean = df.dropna(subset=["Age"])
print("\nDataFrame After Dropping NaN in 'Age':")
print(df_clean)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
2  Charlie  35.0      NaN
3    David  45.0  80000.0

2. Drop Rows with NaN in Multiple Columns

Use subset with a list of columns to drop rows where any of the specified columns have NaN:

df_clean = df.dropna(subset=["Age", "Salary"])
print(df_clean)

Output:

    Name   Age   Salary
0  Alice  25.0  50000.0
3  David  45.0  80000.0

3. Modify the DataFrame In-Place

Use inplace=True to modify the original DataFrame instead of creating a new one:

df.dropna(subset=["Salary"], inplace=True)
print(df)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  60000.0
3    David  45.0  80000.0
4      Eva   NaN  90000.0

4. Drop Rows Based on Threshold (thresh)

Keep rows with at least N non-NaN values in the specified subset:

# Keep rows with at least 2 non-NaN values in the subset ["Age", "Salary"]
df_clean = df.dropna(subset=["Age", "Salary"], thresh=2)
print(df_clean)

Output:

    Name   Age   Salary
0  Alice  25.0  50000.0
3  David  45.0  80000.0

5. Alternative: Boolean Indexing

Filter rows using notna():

df_clean = df[df["Age"].notna()]
print(df_clean)

Output:

      Name   Age   Salary
0    Alice  25.0  50000.0
2  Charlie  35.0      NaN
3    David  45.0  80000.0

Key Parameters of dropna()

ParameterDescription
subsetColumns to check for NaN (e.g., subset=["Age", "Salary"]).
howhow='any' (default): Drop rows if any subset column has NaN.
how='all': Drop rows if all subset columns have NaN.
threshKeep rows with at least thresh non-NaN values in the subset.
inplaceModify the DataFrame in-place instead of returning a new DataFrame.

Common Mistakes

  1. Forgetting subset:
   # This drops rows with NaN in ANY column (not just "Age"):
   df.dropna()  # Incorrect if you only want to target "Age"
  1. Ignoring inplace:
   # This does NOT modify the original DataFrame:
   df.dropna(subset=["Age"])
   # Correct approach:
   df = df.dropna(subset=["Age"])  # Or use inplace=True

Summary

  • Use df.dropna(subset=["column"]) to drop rows where "column" has NaN.
  • Combine subset with thresh to enforce a minimum number of valid values.
  • Prefer inplace=True to modify the DataFrame directly.

By mastering these methods, you can efficiently clean your DataFrames!

Leave a Reply

Your email address will not be published. Required fields are marked *