To parse a YAML file in Python, follow these steps using the PyYAML library:
Step 1: Install PyYAML
pip install pyyaml
Step 2: Import the yaml
Module
import yaml
Step 3: Load and Parse the YAML File
Use yaml.safe_load()
to safely parse YAML content into Python objects (e.g., dictionaries, lists).
Example 1: Basic Parsing
# config.yaml
database:
host: localhost
port: 3306
credentials:
username: admin
password: secret
with open("config.yaml", "r") as file:
data = yaml.safe_load(file)
print(data["database"]["host"]) # Output: localhost
print(data["database"]["credentials"]["username"]) # Output: admin
Example 2: Handle Multiple YAML Documents
For YAML files with ---
-separated documents:
# multi_doc.yaml
---
server: web01
ip: 192.168.1.1
---
server: web02
ip: 192.168.1.2
with open("multi_doc.yaml", "r") as file:
documents = list(yaml.safe_load_all(file))
for doc in documents:
print(doc["server"], doc["ip"])
# Output:
# web01 192.168.1.1
# web02 192.168.1.2
Example 3: Handle Data Types (Booleans, Nulls, Dates)
# data_types.yaml
settings:
enabled: true
max_users: 100
last_updated: 2023-10-05
comment: null
with open("data_types.yaml", "r") as file:
data = yaml.safe_load(file)
print(data["settings"]["enabled"]) # Output: True (bool)
print(data["settings"]["last_updated"]) # Output: datetime.date(2023, 10, 5)
print(data["settings"]["comment"]) # Output: None
Step 4: Error Handling
Catch common errors like missing files or invalid YAML syntax:
import yaml
from yaml import YAMLError
try:
with open("config.yaml", "r") as file:
data = yaml.safe_load(file)
except FileNotFoundError:
print("Error: File not found.")
except YAMLError as e:
print(f"YAML Syntax Error: {e}")
Step 5: Advanced Features
Parse Anchors and Aliases
# anchors.yaml
defaults: &defaults
adapter: postgres
port: 5432
development:
<<: *defaults
database: dev_db
production:
<<: *defaults
database: prod_db
with open("anchors.yaml", "r") as file:
data = yaml.safe_load(file)
print(data["production"]["adapter"]) # Output: postgres
Parse Custom Tags (Use with Caution!)
⚠️ Warning: Avoid yaml.load()
with untrusted data due to security risks.
# custom_tag.yaml
value: !!python/name:__main__.MyClass {}
class MyClass:
pass
# Unsafe method (only for trusted YAML):
with open("custom_tag.yaml", "r") as file:
data = yaml.load(file, Loader=yaml.UnsafeLoader)
print(type(data["value"])) # Output: <class '__main__.MyClass'>
Key Considerations
- Security: Always use
yaml.safe_load()
for untrusted YAML files. - Data Types: YAML automatically converts values to Python types (e.g.,
true
→True
,null
→None
). - Performance: For large files, use
yaml.safe_load_all()
with generators to process documents incrementally.
Summary
- Use
yaml.safe_load()
for safe parsing. - Handle multiple documents with
yaml.safe_load_all()
. - Validate data types (e.g., booleans, dates).
- Avoid
yaml.load()
unless absolutely necessary and the source is trusted.
This approach ensures secure and efficient YAML parsing in Python.