How can I parse a YAML file in Python ?

To parse a YAML file in Python, follow these steps using the PyYAML library:

Step 1: Install PyYAML

pip install pyyaml

Step 2: Import the yaml Module

import yaml

Step 3: Load and Parse the YAML File

Use yaml.safe_load() to safely parse YAML content into Python objects (e.g., dictionaries, lists).

Example 1: Basic Parsing

# config.yaml
database:
  host: localhost
  port: 3306
  credentials:
    username: admin
    password: secret
with open("config.yaml", "r") as file:
    data = yaml.safe_load(file)

print(data["database"]["host"])  # Output: localhost
print(data["database"]["credentials"]["username"])  # Output: admin

Example 2: Handle Multiple YAML Documents

For YAML files with ----separated documents:

# multi_doc.yaml
---
server: web01
ip: 192.168.1.1
---
server: web02
ip: 192.168.1.2
with open("multi_doc.yaml", "r") as file:
    documents = list(yaml.safe_load_all(file))

for doc in documents:
    print(doc["server"], doc["ip"])

# Output:
# web01 192.168.1.1
# web02 192.168.1.2

Example 3: Handle Data Types (Booleans, Nulls, Dates)

# data_types.yaml
settings:
  enabled: true
  max_users: 100
  last_updated: 2023-10-05
  comment: null
with open("data_types.yaml", "r") as file:
    data = yaml.safe_load(file)

print(data["settings"]["enabled"])  # Output: True (bool)
print(data["settings"]["last_updated"])  # Output: datetime.date(2023, 10, 5)
print(data["settings"]["comment"])  # Output: None

Step 4: Error Handling

Catch common errors like missing files or invalid YAML syntax:

import yaml
from yaml import YAMLError

try:
    with open("config.yaml", "r") as file:
        data = yaml.safe_load(file)
except FileNotFoundError:
    print("Error: File not found.")
except YAMLError as e:
    print(f"YAML Syntax Error: {e}")

Step 5: Advanced Features

Parse Anchors and Aliases

# anchors.yaml
defaults: &defaults
  adapter: postgres
  port: 5432

development:
  <<: *defaults
  database: dev_db

production:
  <<: *defaults
  database: prod_db
with open("anchors.yaml", "r") as file:
    data = yaml.safe_load(file)

print(data["production"]["adapter"])  # Output: postgres

Parse Custom Tags (Use with Caution!)

⚠️ Warning: Avoid yaml.load() with untrusted data due to security risks.

# custom_tag.yaml
value: !!python/name:__main__.MyClass {}
class MyClass:
    pass

# Unsafe method (only for trusted YAML):
with open("custom_tag.yaml", "r") as file:
    data = yaml.load(file, Loader=yaml.UnsafeLoader)

print(type(data["value"]))  # Output: <class '__main__.MyClass'>

Key Considerations

  • Security: Always use yaml.safe_load() for untrusted YAML files.
  • Data Types: YAML automatically converts values to Python types (e.g., trueTrue, nullNone).
  • Performance: For large files, use yaml.safe_load_all() with generators to process documents incrementally.

Summary

  1. Use yaml.safe_load() for safe parsing.
  2. Handle multiple documents with yaml.safe_load_all().
  3. Validate data types (e.g., booleans, dates).
  4. Avoid yaml.load() unless absolutely necessary and the source is trusted.

This approach ensures secure and efficient YAML parsing in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *