To change the data type of a column in a pandas DataFrame, you can use methods like astype()
, pd.to_numeric()
, or pd.to_datetime()
, depending on the target type and data consistency. Below are common scenarios and solutions:
1. Basic Conversion with astype()
Convert a column to a specific type (e.g., int
, float
, str
):
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({'Age': ['25', '30', '35'], 'Price': [100.5, 200.0, 300.9]})
# Convert 'Age' from string to integer
df['Age'] = df['Age'].astype(int)
# Convert 'Price' from float to integer (truncates decimals)
df['Price'] = df['Price'].astype(int)
2. Handle Errors with pd.to_numeric()
Convert to numeric type (e.g., int
, float
) and coerce invalid values to NaN
:
# Sample DataFrame with problematic data
df = pd.DataFrame({'Values': ['10', '20', 'NaN', 'thirty']})
# Convert to numeric, coercing errors to NaN
df['Values'] = pd.to_numeric(df['Values'], errors='coerce')
# Result: [10.0, 20.0, NaN, NaN]
3. Convert to Datetime
Use pd.to_datetime()
for date/time conversions:
# Sample DataFrame with date strings
df = pd.DataFrame({'Date': ['2023-01-01', '2023-02-15', '2023-03-30']})
# Convert 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'])
4. Convert to Categorical Data
Save memory for columns with repetitive strings:
df['Category'] = df['Category'].astype('category')
5. Convert Multiple Columns
Use a dictionary with astype()
to change multiple columns:
df = df.astype({
'Age': 'int32',
'Price': 'float64',
'Active': 'bool'
})
6. Handle Missing Values
Clean data before conversion (e.g., replace NaN
or placeholder values):
df['Weight'] = df['Weight'].fillna(0).astype(int) # Replace NaN with 0 first
Key Notes
- Assign Results Back: Methods like
astype()
return a copy and do not modify the original DataFrame unless assigned. - Error Handling:
errors='coerce'
(convert invalid data toNaN
).errors='ignore'
(skip conversion for invalid data).- Memory Optimization: Use category types for low-cardinality string columns.
- Check Data Types:
df.dtypes # View current column types
Example Workflow
import pandas as pd
# Create DataFrame
data = {
'ID': ['1', '2', '3'],
'Amount': ['$100', '$200', '$300'],
'Date': ['2023-01-01', '2023-02-15', 'Invalid']
}
df = pd.DataFrame(data)
# Clean and convert columns
df['ID'] = df['ID'].astype(int)
df['Amount'] = df['Amount'].str.replace('$', '').astype(float)
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
print(df.dtypes)
# Output:
# ID int32
# Amount float64
# Date datetime64[ns]
Choose the method that aligns with your data type and error-handling requirements!