To create a new column in a pandas DataFrame where values are conditionally derived from an existing column, you can use methods like np.where(), pandas.DataFrame.apply(), or np.select(), depending on the complexity of your conditions. Here are common approaches:
1. Simple Binary Conditions: np.where()
Use numpy.where(condition, value_if_true, value_if_false) for straightforward if-else logic.
import pandas as pd
import numpy as np
# Sample DataFrame
df = pd.DataFrame({'score': [85, 45, 72, 90, 60]})
# Create a new column: 'pass' if score >= 60, else 'fail'
df['result'] = np.where(df['score'] >= 60, 'pass', 'fail')
Output:
score result
0 85 pass
1 45 fail
2 72 pass
3 90 pass
4 60 pass
2. Complex Logic: apply() with a Custom Function
Use apply() for multi-condition or complex operations.
# Categorize scores into grades
def assign_grade(score):
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
elif score >= 70:
return 'C'
else:
return 'F'
df['grade'] = df['score'].apply(assign_grade)
Output:
score result grade
0 85 pass B
1 45 fail F
2 72 pass C
3 90 pass A
4 60 pass F
3. Multiple Conditions: np.select()
Use numpy.select() for multiple conditions and corresponding outputs.
# Define conditions and choices
conditions = [
df['score'] >= 90,
df['score'] >= 80,
df['score'] >= 60,
df['score'] < 60
]
choices = ['Excellent', 'Good', 'Pass', 'Fail']
df['category'] = np.select(conditions, choices, default='Unknown')
Output:
score result grade category
0 85 pass B Good
1 45 fail F Fail
2 72 pass C Pass
3 90 pass A Excellent
4 60 pass F Pass
4. Boolean Indexing with .loc
Directly assign values using boolean masks.
# Initialize a new column
df['status'] = 'Neutral'
# Update values conditionally
df.loc[df['score'] >= 80, 'status'] = 'High'
df.loc[df['score'] < 60, 'status'] = 'Low'
Output:
score result grade category status
0 85 pass B Good High
1 45 fail F Fail Low
2 72 pass C Pass Neutral
3 90 pass A Excellent High
4 60 pass F Pass Neutral
5. Mapping Values: map() with a Dictionary
Use a dictionary to map existing values to new ones.
# Map grades to remarks
grade_to_remark = {
'A': 'Outstanding',
'B': 'Very Good',
'C': 'Average',
'F': 'Needs Improvement'
}
df['remark'] = df['grade'].map(grade_to_remark)
Output:
score result grade category status remark
0 85 pass B Good High Very Good
1 45 fail F Fail Low Needs Improvement
2 72 pass C Pass Neutral Average
3 90 pass A Excellent High Outstanding
4 60 pass F Pass Neutral Needs Improvement
Key Notes:
np.where(): Best for simple binary conditions.apply(): Flexible for complex logic but slower for large datasets.np.select(): Efficient for multiple conditions..loc: Useful for direct assignment to subsets of the DataFrame.map(): Ideal for direct value replacement using a dictionary.
For large datasets, prioritize vectorized operations (np.where, np.select) over apply() for better performance.