How to change column type in pandas DataFrame?

To change the data type of a column in a pandas DataFrame, you can use methods like astype(), pd.to_numeric(), or pd.to_datetime(), depending on the target type and data consistency. Below are common scenarios and solutions:

1. Basic Conversion with astype()

Convert a column to a specific type (e.g., int, float, str):

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'Age': ['25', '30', '35'], 'Price': [100.5, 200.0, 300.9]})

# Convert 'Age' from string to integer
df['Age'] = df['Age'].astype(int)

# Convert 'Price' from float to integer (truncates decimals)
df['Price'] = df['Price'].astype(int)

2. Handle Errors with pd.to_numeric()

Convert to numeric type (e.g., int, float) and coerce invalid values to NaN:

# Sample DataFrame with problematic data
df = pd.DataFrame({'Values': ['10', '20', 'NaN', 'thirty']})

# Convert to numeric, coercing errors to NaN
df['Values'] = pd.to_numeric(df['Values'], errors='coerce')

# Result: [10.0, 20.0, NaN, NaN]

3. Convert to Datetime

Use pd.to_datetime() for date/time conversions:

# Sample DataFrame with date strings
df = pd.DataFrame({'Date': ['2023-01-01', '2023-02-15', '2023-03-30']})

# Convert 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'])

4. Convert to Categorical Data

Save memory for columns with repetitive strings:

df['Category'] = df['Category'].astype('category')

5. Convert Multiple Columns

Use a dictionary with astype() to change multiple columns:

df = df.astype({
    'Age': 'int32',
    'Price': 'float64',
    'Active': 'bool'
})

6. Handle Missing Values

Clean data before conversion (e.g., replace NaN or placeholder values):

df['Weight'] = df['Weight'].fillna(0).astype(int)  # Replace NaN with 0 first

Key Notes

  • Assign Results Back: Methods like astype() return a copy and do not modify the original DataFrame unless assigned.
  • Error Handling:
  • errors='coerce' (convert invalid data to NaN).
  • errors='ignore' (skip conversion for invalid data).
  • Memory Optimization: Use category types for low-cardinality string columns.
  • Check Data Types:
  df.dtypes  # View current column types

Example Workflow

import pandas as pd

# Create DataFrame
data = {
    'ID': ['1', '2', '3'],
    'Amount': ['$100', '$200', '$300'],
    'Date': ['2023-01-01', '2023-02-15', 'Invalid']
}

df = pd.DataFrame(data)

# Clean and convert columns
df['ID'] = df['ID'].astype(int)
df['Amount'] = df['Amount'].str.replace('$', '').astype(float)
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

print(df.dtypes)
# Output:
# ID               int32
# Amount         float64
# Date    datetime64[ns]

Choose the method that aligns with your data type and error-handling requirements!

Leave a Reply

Your email address will not be published. Required fields are marked *