To write a pandas DataFrame to a CSV file, use the to_csv()
method. This method provides extensive customization for formatting, handling missing values, encoding, and more. Below are detailed explanations and examples:
Basic Syntax
df.to_csv(
path_or_buf, # File path or buffer (e.g., 'data.csv')
sep=',', # Delimiter (default: comma)
index=False, # Exclude index column (default: True)
header=True, # Include column names (default: True)
mode='w', # Write mode ('w' for write, 'a' for append)
encoding='utf-8', # File encoding (e.g., 'utf-8', 'latin-1')
na_rep='', # Representation for missing values (default: '')
columns=None, # Columns to write (default: all)
date_format=None, # Format for datetime columns (e.g., '%Y-%m-%d')
compression=None # Compression (e.g., 'gzip', 'zip')
)
Examples
1. Basic Save to CSV
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
# Save to CSV (index included by default)
df.to_csv('basic_data.csv')
2. Exclude Index Column
df.to_csv('no_index_data.csv', index=False)
3. Custom Delimiter (e.g., Tab-Separated)
df.to_csv('tab_data.tsv', sep='\t', index=False)
4. Handle Missing Values
Replace NaN
/None
with a custom string (e.g., 'NA'
):
import numpy as np
df_with_nulls = pd.DataFrame({'A': [1, np.nan, 3], 'B': [np.nan, 5, 6]})
df_with_nulls.to_csv('nulls_data.csv', na_rep='NA', index=False)
5. Write Specific Columns
df.to_csv('selected_columns.csv', columns=['Name'], index=False)
6. Append to Existing CSV
# Append new data to an existing file (no headers)
new_data = pd.DataFrame({'Name': ['Dave'], 'Age': [40]})
new_data.to_csv('basic_data.csv', mode='a', header=False, index=False)
7. Format Dates in CSV
from datetime import datetime
# DataFrame with dates
dates = pd.DataFrame({'Date': [datetime(2023, 1, 1), datetime(2023, 1, 2)]})
dates.to_csv('dates.csv', index=False, date_format='%Y-%m-%d')
8. Compress CSV (e.g., GZIP)
df.to_csv('compressed_data.csv.gz', index=False, compression='gzip')
9. Custom Encoding (e.g., Latin-1)
df.to_csv('latin_data.csv', encoding='latin-1', index=False)
Key Parameters Explained
Parameter | Purpose |
---|---|
path_or_buf | File path (e.g., 'data.csv' ) or a file-like object (e.g., StringIO ). |
sep | Delimiter (default ',' ). Use '\t' for TSV files. |
index | Omit DataFrame index if False (default True ). |
header | Include column names if True (default). Set False to exclude. |
mode | 'w' (overwrite) or 'a' (append). |
encoding | Specify encoding (e.g., 'utf-8' , 'latin-1' ). |
na_rep | Replace missing values with a string (e.g., 'NA' , 'Missing' ). |
columns | List of columns to export (e.g., ['Name', 'Age'] ). |
date_format | Format datetime columns (e.g., '%Y-%m-%d %H:%M:%S' ). |
compression | Compress output (e.g., 'gzip' , 'zip' ). |
Common Pitfalls & Solutions
- Unwanted Index Column
Always useindex=False
unless you need the index. - Encoding Errors
Useencoding='utf-8'
(default) orencoding='latin-1'
for compatibility. - Overwriting Files
Usemode='a'
to append data instead of overwriting. - Missing Data Ambiguity
Explicitly definena_rep
to avoid confusion with empty strings.
Advanced Example: Export with Quoting
Handle data containing delimiters (e.g., commas in strings) using quoting:
import csv
data = {'Text': ['Hello, World', 'Goodbye, Earth']}
df = pd.DataFrame(data)
# Quote all non-numeric fields to escape commas
df.to_csv('quoted_data.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
Summary
- Use
df.to_csv('filename.csv', index=False)
for most cases. - Customize separators, encoding, missing values, and compression as needed.
- Avoid including the index unless necessary for your use case.