To add a new column to an existing pandas DataFrame
, you can use several methods depending on your specific use case. Below are the most common techniques with detailed explanations and examples.
1. Direct Assignment (Simplest Method)
Add a new column by assigning values directly to a new column name.
Example 1: Add a column with a constant value
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})
# Add a new column with a constant value
df['Country'] = 'USA'
print(df)
Output:
Name Age Country
0 Alice 25 USA
1 Bob 30 USA
2 Charlie 35 USA
Example 2: Add a column with calculated values
# Add a column based on existing columns
df['Birth Year'] = 2023 - df['Age']
print(df)
Output:
Name Age Country Birth Year
0 Alice 25 USA 1998
1 Bob 30 USA 1993
2 Charlie 35 USA 1988
2. Using assign()
(Method Chaining)
The assign()
method returns a new DataFrame with the added column (does not modify the original DataFrame unless you overwrite it).
Example:
# Create a new column without modifying the original DataFrame
df_new = df.assign(Salary = [70000, 80000, 90000])
print(df_new)
Output:
Name Age Country Birth Year Salary
0 Alice 25 USA 1998 70000
1 Bob 30 USA 1993 80000
2 Charlie 35 USA 1988 90000
3. Insert a Column at a Specific Position
Use df.insert(loc, column_name, values)
to add a column at a specific index position.
Example:
# Insert 'Gender' as the second column (index=1)
df.insert(1, 'Gender', ['F', 'M', 'M'])
print(df)
Output:
Name Gender Age Country Birth Year
0 Alice F 25 USA 1998
1 Bob M 30 USA 1993
2 Charlie M 35 USA 1988
4. Add a Column Conditionally
Use np.where()
or boolean logic to create conditional columns.
Example:
import numpy as np
# Add a column based on a condition
df['Is Senior'] = np.where(df['Age'] > 30, 'Yes', 'No')
print(df)
Output:
Name Gender Age Country Birth Year Is Senior
0 Alice F 25 USA 1998 No
1 Bob M 30 USA 1993 No
2 Charlie M 35 USA 1988 Yes
5. Add a Column from a List/Array
Ensure the list/array length matches the DataFrame’s row count.
Example:
# Add a column from a list
df['Department'] = ['HR', 'Engineering', 'Finance']
print(df)
Output:
Name Gender Age Country Birth Year Is Senior Department
0 Alice F 25 USA 1998 No HR
1 Bob M 30 USA 1993 No Engineering
2 Charlie M 35 USA 1988 Yes Finance
6. Add a Column Using apply()
Use a function to compute column values row-wise.
Example:
# Add a column using a custom function
df['Name Length'] = df['Name'].apply(lambda x: len(x))
print(df)
Output:
Name Gender Age ... Department Name Length
0 Alice F 25 ... HR 5
1 Bob M 30 ... Engineering 3
2 Charlie M 35 ... Finance 7
Key Notes
- Overwriting Columns:
If the column name already exists, it will be overwritten.
df['Age'] = df['Age'] + 1 # Increments all values in the 'Age' column
- Alignment by Index:
When adding aSeries
, values align by index:
bonus = pd.Series([1000, 2000], index=[0, 2])
df['Bonus'] = bonus # Row 1 (index=1) gets NaN
- Performance:
Use vectorized operations (e.g.,df['col1'] + df['col2']
) instead of loops for efficiency.
Best Practices
- Use direct assignment (
df['new_col'] = ...
) for simple column additions. - Use
assign()
when method chaining is preferred (e.g., in a pipeline). - Use
insert()
to control the column position.
Common Errors
ValueError
: Length Mismatch
Ensure the new column has the same number of rows as the DataFrame.
# This will fail if len(ages) != len(df)
df['Age'] = ages
KeyError
Check for typos in column names when referencing existing columns.
Complete Example
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [1200, 800, 300]
})
# Add columns using different methods
df['Category'] = 'Electronics' # Direct assignment
df = df.assign(
Discount = lambda x: x['Price'] * 0.1 # 10% discount
)
df.insert(2, 'In Stock', [True, False, True]) # Insert at position 2
print(df)
Output:
Product Price In Stock Category Discount
0 Laptop 1200 True Electronics 120.0
1 Phone 800 False Electronics 80.0
2 Tablet 300 True Electronics 30.0
This covers all common scenarios for adding columns to a DataFrame! Let me know if you need further clarification.