How to write a pandas DataFrame to CSV file ?

To write a pandas DataFrame to a CSV file, use the to_csv() method. This method provides extensive customization for formatting, handling missing values, encoding, and more. Below are detailed explanations and examples:

Basic Syntax

df.to_csv(
    path_or_buf,      # File path or buffer (e.g., 'data.csv')
    sep=',',          # Delimiter (default: comma)
    index=False,      # Exclude index column (default: True)
    header=True,      # Include column names (default: True)
    mode='w',         # Write mode ('w' for write, 'a' for append)
    encoding='utf-8', # File encoding (e.g., 'utf-8', 'latin-1')
    na_rep='',        # Representation for missing values (default: '')
    columns=None,     # Columns to write (default: all)
    date_format=None, # Format for datetime columns (e.g., '%Y-%m-%d')
    compression=None # Compression (e.g., 'gzip', 'zip')
)

Examples

1. Basic Save to CSV

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Save to CSV (index included by default)
df.to_csv('basic_data.csv')

2. Exclude Index Column

df.to_csv('no_index_data.csv', index=False)

3. Custom Delimiter (e.g., Tab-Separated)

df.to_csv('tab_data.tsv', sep='\t', index=False)

4. Handle Missing Values

Replace NaN/None with a custom string (e.g., 'NA'):

import numpy as np

df_with_nulls = pd.DataFrame({'A': [1, np.nan, 3], 'B': [np.nan, 5, 6]})
df_with_nulls.to_csv('nulls_data.csv', na_rep='NA', index=False)

5. Write Specific Columns

df.to_csv('selected_columns.csv', columns=['Name'], index=False)

6. Append to Existing CSV

# Append new data to an existing file (no headers)
new_data = pd.DataFrame({'Name': ['Dave'], 'Age': [40]})
new_data.to_csv('basic_data.csv', mode='a', header=False, index=False)

7. Format Dates in CSV

from datetime import datetime

# DataFrame with dates
dates = pd.DataFrame({'Date': [datetime(2023, 1, 1), datetime(2023, 1, 2)]})
dates.to_csv('dates.csv', index=False, date_format='%Y-%m-%d')

8. Compress CSV (e.g., GZIP)

df.to_csv('compressed_data.csv.gz', index=False, compression='gzip')

9. Custom Encoding (e.g., Latin-1)

df.to_csv('latin_data.csv', encoding='latin-1', index=False)

Key Parameters Explained

ParameterPurpose
path_or_bufFile path (e.g., 'data.csv') or a file-like object (e.g., StringIO).
sepDelimiter (default ','). Use '\t' for TSV files.
indexOmit DataFrame index if False (default True).
headerInclude column names if True (default). Set False to exclude.
mode'w' (overwrite) or 'a' (append).
encodingSpecify encoding (e.g., 'utf-8', 'latin-1').
na_repReplace missing values with a string (e.g., 'NA', 'Missing').
columnsList of columns to export (e.g., ['Name', 'Age']).
date_formatFormat datetime columns (e.g., '%Y-%m-%d %H:%M:%S').
compressionCompress output (e.g., 'gzip', 'zip').

Common Pitfalls & Solutions

  1. Unwanted Index Column
    Always use index=False unless you need the index.
  2. Encoding Errors
    Use encoding='utf-8' (default) or encoding='latin-1' for compatibility.
  3. Overwriting Files
    Use mode='a' to append data instead of overwriting.
  4. Missing Data Ambiguity
    Explicitly define na_rep to avoid confusion with empty strings.

Advanced Example: Export with Quoting

Handle data containing delimiters (e.g., commas in strings) using quoting:

import csv

data = {'Text': ['Hello, World', 'Goodbye, Earth']}
df = pd.DataFrame(data)

# Quote all non-numeric fields to escape commas
df.to_csv('quoted_data.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)

Summary

  • Use df.to_csv('filename.csv', index=False) for most cases.
  • Customize separators, encoding, missing values, and compression as needed.
  • Avoid including the index unless necessary for your use case.

Leave a Reply

Your email address will not be published. Required fields are marked *