How to add a new column to an existing DataFrame ?

To add a new column to an existing pandas DataFrame, you can use several methods depending on your specific use case. Below are the most common techniques with detailed explanations and examples.

1. Direct Assignment (Simplest Method)

Add a new column by assigning values directly to a new column name.

Example 1: Add a column with a constant value

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

# Add a new column with a constant value
df['Country'] = 'USA'

print(df)

Output:

      Name  Age Country
0    Alice   25     USA
1      Bob   30     USA
2  Charlie   35     USA

Example 2: Add a column with calculated values

# Add a column based on existing columns
df['Birth Year'] = 2023 - df['Age']

print(df)

Output:

      Name  Age Country  Birth Year
0    Alice   25     USA        1998
1      Bob   30     USA        1993
2  Charlie   35     USA        1988

2. Using assign() (Method Chaining)

The assign() method returns a new DataFrame with the added column (does not modify the original DataFrame unless you overwrite it).

Example:

# Create a new column without modifying the original DataFrame
df_new = df.assign(Salary = [70000, 80000, 90000])

print(df_new)

Output:

      Name  Age Country  Birth Year  Salary
0    Alice   25     USA        1998   70000
1      Bob   30     USA        1993   80000
2  Charlie   35     USA        1988   90000

3. Insert a Column at a Specific Position

Use df.insert(loc, column_name, values) to add a column at a specific index position.

Example:

# Insert 'Gender' as the second column (index=1)
df.insert(1, 'Gender', ['F', 'M', 'M'])

print(df)

Output:

      Name Gender  Age Country  Birth Year
0    Alice      F   25     USA        1998
1      Bob      M   30     USA        1993
2  Charlie      M   35     USA        1988

4. Add a Column Conditionally

Use np.where() or boolean logic to create conditional columns.

Example:

import numpy as np

# Add a column based on a condition
df['Is Senior'] = np.where(df['Age'] > 30, 'Yes', 'No')

print(df)

Output:

      Name Gender  Age Country  Birth Year Is Senior
0    Alice      F   25     USA        1998        No
1      Bob      M   30     USA        1993        No
2  Charlie      M   35     USA        1988       Yes

5. Add a Column from a List/Array

Ensure the list/array length matches the DataFrame’s row count.

Example:

# Add a column from a list
df['Department'] = ['HR', 'Engineering', 'Finance']

print(df)

Output:

      Name Gender  Age Country  Birth Year Is Senior   Department
0    Alice      F   25     USA        1998        No           HR
1      Bob      M   30     USA        1993        No  Engineering
2  Charlie      M   35     USA        1988       Yes      Finance

6. Add a Column Using apply()

Use a function to compute column values row-wise.

Example:

# Add a column using a custom function
df['Name Length'] = df['Name'].apply(lambda x: len(x))

print(df)

Output:

      Name Gender  Age ... Department  Name Length
0    Alice      F   25 ...         HR            5
1      Bob      M   30 ... Engineering           3
2  Charlie      M   35 ...     Finance           7

Key Notes

  1. Overwriting Columns:
    If the column name already exists, it will be overwritten.
   df['Age'] = df['Age'] + 1  # Increments all values in the 'Age' column
  1. Alignment by Index:
    When adding a Series, values align by index:
   bonus = pd.Series([1000, 2000], index=[0, 2])
   df['Bonus'] = bonus  # Row 1 (index=1) gets NaN
  1. Performance:
    Use vectorized operations (e.g., df['col1'] + df['col2']) instead of loops for efficiency.

Best Practices

  • Use direct assignment (df['new_col'] = ...) for simple column additions.
  • Use assign() when method chaining is preferred (e.g., in a pipeline).
  • Use insert() to control the column position.

Common Errors

  • ValueError: Length Mismatch
    Ensure the new column has the same number of rows as the DataFrame.
  # This will fail if len(ages) != len(df)
  df['Age'] = ages
  • KeyError
    Check for typos in column names when referencing existing columns.

Complete Example

import pandas as pd

# Create DataFrame
df = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [1200, 800, 300]
})

# Add columns using different methods
df['Category'] = 'Electronics'  # Direct assignment
df = df.assign(
    Discount = lambda x: x['Price'] * 0.1  # 10% discount
)
df.insert(2, 'In Stock', [True, False, True])  # Insert at position 2

print(df)

Output:

  Product  Price  In Stock      Category  Discount
0  Laptop   1200      True  Electronics     120.0
1   Phone    800     False  Electronics      80.0
2  Tablet    300      True  Electronics      30.0

This covers all common scenarios for adding columns to a DataFrame! Let me know if you need further clarification.

Leave a Reply

Your email address will not be published. Required fields are marked *