To create a new column in a pandas DataFrame where values are conditionally derived from an existing column, you can use methods like np.where()
, pandas.DataFrame.apply()
, or np.select()
, depending on the complexity of your conditions. Here are common approaches:
1. Simple Binary Conditions: np.where()
Use numpy.where(condition, value_if_true, value_if_false)
for straightforward if-else logic.
import pandas as pd
import numpy as np
# Sample DataFrame
df = pd.DataFrame({'score': [85, 45, 72, 90, 60]})
# Create a new column: 'pass' if score >= 60, else 'fail'
df['result'] = np.where(df['score'] >= 60, 'pass', 'fail')
Output:
score result
0 85 pass
1 45 fail
2 72 pass
3 90 pass
4 60 pass
2. Complex Logic: apply()
with a Custom Function
Use apply()
for multi-condition or complex operations.
# Categorize scores into grades
def assign_grade(score):
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
elif score >= 70:
return 'C'
else:
return 'F'
df['grade'] = df['score'].apply(assign_grade)
Output:
score result grade
0 85 pass B
1 45 fail F
2 72 pass C
3 90 pass A
4 60 pass F
3. Multiple Conditions: np.select()
Use numpy.select()
for multiple conditions and corresponding outputs.
# Define conditions and choices
conditions = [
df['score'] >= 90,
df['score'] >= 80,
df['score'] >= 60,
df['score'] < 60
]
choices = ['Excellent', 'Good', 'Pass', 'Fail']
df['category'] = np.select(conditions, choices, default='Unknown')
Output:
score result grade category
0 85 pass B Good
1 45 fail F Fail
2 72 pass C Pass
3 90 pass A Excellent
4 60 pass F Pass
4. Boolean Indexing with .loc
Directly assign values using boolean masks.
# Initialize a new column
df['status'] = 'Neutral'
# Update values conditionally
df.loc[df['score'] >= 80, 'status'] = 'High'
df.loc[df['score'] < 60, 'status'] = 'Low'
Output:
score result grade category status
0 85 pass B Good High
1 45 fail F Fail Low
2 72 pass C Pass Neutral
3 90 pass A Excellent High
4 60 pass F Pass Neutral
5. Mapping Values: map()
with a Dictionary
Use a dictionary to map existing values to new ones.
# Map grades to remarks
grade_to_remark = {
'A': 'Outstanding',
'B': 'Very Good',
'C': 'Average',
'F': 'Needs Improvement'
}
df['remark'] = df['grade'].map(grade_to_remark)
Output:
score result grade category status remark
0 85 pass B Good High Very Good
1 45 fail F Fail Low Needs Improvement
2 72 pass C Pass Neutral Average
3 90 pass A Excellent High Outstanding
4 60 pass F Pass Neutral Needs Improvement
Key Notes:
np.where()
: Best for simple binary conditions.apply()
: Flexible for complex logic but slower for large datasets.np.select()
: Efficient for multiple conditions..loc
: Useful for direct assignment to subsets of the DataFrame.map()
: Ideal for direct value replacement using a dictionary.
For large datasets, prioritize vectorized operations (np.where
, np.select
) over apply()
for better performance.