Introduction
YAML (YAML Ain't Markup Language) is a human-readable data serialization standard that has become increasingly popular for configuration files in modern programming applications. YAML combines the best features from Perl, C, XML, HTML, and JSON, creating a reliable and easily readable format.
Due to its strong compatibility with JSON format, YAML files can be easily interchanged with JSON files. This guide focuses on fundamental concepts and commonly used patterns for those beginning to learn YAML.
YAML Basic Structure
File Extensions and Rules
YAML files use the .yaml
extension. Some systems that don't support extensions longer than three characters also use .yml
.
Important Rules:
- YAML is case sensitive
- Use spaces instead of tabs for indentation
- Tabs are prohibited due to varying tab settings across different editors
Document Structure
A YAML processing sequence is called a "stream." A stream can contain multiple documents.
Document Markers:
-
---
(Triple dashes): Indicates the start of a document -
...
(Triple dots): Indicates the end of a document
While markers are optional for single documents, they are required for streams containing multiple documents.
Example - Multiple Documents:
# Baseball rankings 1998
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey
# Team rankings
---
- Chicago Cubs
- St Louis Cardinals
Comments
Comments in YAML start with the #
symbol.
# This is a comment
name: John Doe # This is also a comment
YAML Writing Styles
YAML can be written in two formats:
1. Block Style
A human-friendly format that uses indentation to represent structure.
Sequence (List) Syntax:
fruits:
- apple
- banana
- orange
Mapping (Key-Value) Syntax:
person:
name: John Doe
age: 30
city: New York
2. Flow Style
A compact format similar to JSON.
Flow Sequence:
fruits: [apple, banana, orange]
Flow Mapping:
person: {name: "John Doe", age: 30, city: "New York"}
Data Types
YAML has three main data types:
1. Scalar (Basic Data Types)
Numeric Types
Integer:
decimal: 123
octal: 0o123
hexadecimal: 0x123
Floating Point:
fixed: 123.45
exponential: 1.23e+3
Boolean:
enabled: true
disabled: false
yes_value: yes
no_value: no
on_value: on
off_value: off
String Types
Block Style Strings:
Literal Style (|
) - Preserves line breaks:
description: |
This is line one
This is line two
This is line three
Folded Style (>
) - Converts line breaks to spaces:
description: >
This long sentence will be
folded into a single line
when processed
Flow Style Strings:
# Plain style
plain_string: Hello World
# Quoted styles
single_quoted: 'Hello "World"!'
double_quoted: "Hello\nWorld" # Can contain escape sequences
2. Sequences (Lists/Arrays)
Represents ordered collections.
Simple List:
shopping_list:
- milk
- bread
- eggs
- "" # Empty string
Complex Sequences:
employees:
- name: John Doe
position: Developer
salary: 50000
- name: Jane Smith
position: Designer
salary: 45000
Nested Sequences:
matrix:
- [1, 2, 3]
- [4, 5, 6]
- [7, 8, 9]
3. Mappings (Key-Value Pairs)
Represents unordered associations.
Simple Mapping:
server:
host: localhost
port: 8080
ssl: true
Nested Mapping:
database:
primary:
host: db1.example.com
port: 5432
replica:
host: db2.example.com
port: 5432
Mixed Collections:
application:
name: MyApp
version: 1.0.0
features:
- authentication
- logging
- caching
database:
type: postgresql
settings:
pool_size: 10
timeout: 30
Advanced Features
YAML Anchors and Aliases
Anchors and aliases are used to reduce code duplication and improve reusability.
Defining Anchors: &anchor_name
Using Aliases: *anchor_name
# Define anchor
default_settings: &default
timeout: 30
retries: 3
ssl: true
# Use aliases
api_server:
host: api.example.com
<<: *default # Merge anchor content
web_server:
host: web.example.com
<<: *default
timeout: 60 # Override specific value
YAML Tags
Tags are used to specify data types or provide custom processing instructions.
Data Type Tags:
# Force string type
port: !!str 8080
version: !!str 1.0
# Explicit types
count: !!int 42
percentage: !!float 98.6
enabled: !!bool true
Custom Tags:
%TAG ! tag:example.com,2024:
---
server: !server
name: web-01
<<: *default_config
Practical Examples
Configuration File Example
# Application Configuration
application:
name: "E-Commerce API"
version: "2.1.0"
environment: production
server:
host: 0.0.0.0
port: 8080
ssl:
enabled: true
certificate: /path/to/cert.pem
private_key: /path/to/key.pem
database:
primary:
driver: postgresql
host: db-primary.example.com
port: 5432
name: ecommerce
credentials:
username: app_user
password: ${DB_PASSWORD}
redis:
host: cache.example.com
port: 6379
database: 0
features:
- user_authentication
- order_processing
- inventory_management
- payment_gateway
logging:
level: info
format: json
outputs:
- console
- file: /var/log/app.log
Docker Compose Example
version: '3.8'
services:
web:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- app
networks:
- frontend
app:
build:
context: .
dockerfile: Dockerfile
environment:
- NODE_ENV=production
- DB_HOST=database
depends_on:
- database
networks:
- frontend
- backend
database:
image: postgres:13
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: password
volumes:
- db_data:/var/lib/postgresql/data
networks:
- backend
networks:
frontend:
backend:
volumes:
db_data:
CI/CD Pipeline Example
# GitHub Actions Workflow
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
NODE_VERSION: '18'
DOCKER_REGISTRY: ghcr.io
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Run linting
run: npm run lint
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: |
docker build -t ${{ env.DOCKER_REGISTRY }}/myapp:${{ github.sha }} .
docker tag ${{ env.DOCKER_REGISTRY }}/myapp:${{ github.sha }} ${{ env.DOCKER_REGISTRY }}/myapp:latest
- name: Push to registry
run: |
echo ${{ secrets.GITHUB_TOKEN }} | docker login ${{ env.DOCKER_REGISTRY }} -u ${{ github.actor }} --password-stdin
docker push ${{ env.DOCKER_REGISTRY }}/myapp:${{ github.sha }}
docker push ${{ env.DOCKER_REGISTRY }}/myapp:latest
Common Mistakes and Best Practices
Common Errors
- Incorrect Indentation
# Wrong
parent:
child: value
# Correct
parent:
child: value
- Mixing Tabs and Spaces
# Wrong (mixing tabs and spaces)
config:
host: localhost # tab used
port: 8080 # spaces used
# Correct (consistent spacing)
config:
host: localhost
port: 8080
- Unnecessary Quotes
# Unnecessary quotes
name: "John"
age: "30"
# Better
name: John
age: 30
- Special Characters Without Quotes
# Wrong - colon in value without quotes
message: Error: Something went wrong
# Correct
message: "Error: Something went wrong"
Best Practices
Consistent Indentation: Use either 2 spaces or 4 spaces consistently throughout the file
Meaningful Comments: Add comments for important sections and complex configurations
Validation: Use YAML syntax checkers and linters to validate your files
Security: Use environment variables for sensitive data instead of hardcoding
Organization: Group related configurations together and use logical ordering
Version Control: Always specify versions for dependencies and tools
Documentation: Include inline documentation for complex configurations
Validation Tools
Online Validators:
- YAML Lint (yamllint.com)
- Online YAML Parser
- JSON to YAML converters
Command Line Tools:
# Using yamllint
pip install yamllint
yamllint config.yaml
# Using yq for parsing and validation
yq eval '.' config.yaml
Advanced Topics
Multi-line Strings
# Literal block scalar (preserves newlines)
literal: |
Line 1
Line 2
Line 3
# Folded block scalar (folds newlines to spaces)
folded: >
This is a very long line that will be
folded into a single line, which is
useful for readability.
# With block chomping indicators
literal_keep: |+
Line 1
Line 2
literal_strip: |-
Line 1
Line 2
Complex Anchors and Merging
# Complex anchor definitions
database_config: &db_config
driver: postgresql
pool_size: 10
timeout: 30
ssl: true
app_defaults: &app_defaults
restart_policy: always
memory_limit: 512m
cpu_limit: 0.5
# Merging multiple anchors
production_api:
<<: [*app_defaults, *db_config]
name: production-api
replicas: 3
environment: production
staging_api:
<<: [*app_defaults, *db_config]
name: staging-api
replicas: 1
environment: staging
memory_limit: 256m # Override default
Schema Validation
# JSON Schema for YAML validation
$schema: "http://json-schema.org/draft-07/schema#"
type: object
properties:
name:
type: string
minLength: 1
version:
type: string
pattern: "^\\d+\\.\\d+\\.\\d+$"
dependencies:
type: array
items:
type: string
required:
- name
- version
Performance Considerations
Large Files
# For large datasets, consider using references
shared_config: &shared
common_setting_1: value1
common_setting_2: value2
common_setting_3: value3
# Use references instead of repeating
service_1:
<<: *shared
specific_setting: value
service_2:
<<: *shared
specific_setting: different_value
Memory Usage
# Avoid deeply nested structures when possible
# Instead of:
deeply:
nested:
structure:
that:
goes:
very:
deep: value
# Consider flattening:
config_deep_value: value
Integration with Programming Languages
Python Example
import yaml
# Loading YAML
with open('config.yaml', 'r') as file:
config = yaml.safe_load(file)
# Writing YAML
data = {
'name': 'MyApp',
'version': '1.0.0',
'features': ['auth', 'logging']
}
with open('output.yaml', 'w') as file:
yaml.dump(data, file, default_flow_style=False)
JavaScript/Node.js Example
const yaml = require('js-yaml');
const fs = require('fs');
// Loading YAML
try {
const config = yaml.load(fs.readFileSync('config.yaml', 'utf8'));
console.log(config);
} catch (e) {
console.error('Error loading YAML:', e);
}
// Writing YAML
const data = {
name: 'MyApp',
version: '1.0.0',
features: ['auth', 'logging']
};
try {
const yamlStr = yaml.dump(data);
fs.writeFileSync('output.yaml', yamlStr);
} catch (e) {
console.error('Error writing YAML:', e);
}
Troubleshooting Common Issues
Parsing Errors
# Issue: Unquoted strings with special characters
url: http://example.com:8080/path?param=value # May cause issues
# Solution: Quote strings with special characters
url: "http://example.com:8080/path?param=value"
Indentation Problems
# Issue: Inconsistent indentation
config:
database:
host: localhost
port: 5432 # Wrong indentation
# Solution: Consistent indentation
config:
database:
host: localhost
port: 5432
Type Coercion Issues
# Issue: Unintended type conversion
version: 1.0 # Becomes float
port: 08080 # May become octal
# Solution: Explicit string typing
version: "1.0"
port: "8080"
Conclusion
YAML has become an essential format for configuration management, data serialization, and API documentation. Its human-readable nature and powerful features make it popular among developers and system administrators.
After understanding the basic concepts, you can explore more advanced features such as anchors, aliases, tags, and complex data structures. The key to mastering YAML is practice and understanding the specific requirements of your tools and applications.
For more detailed documentation and examples, visit yaml.org and explore the extensive ecosystem of YAML tools and libraries available for your programming language of choice.
Additional Resources
- Official YAML Specification: yaml.org/spec
- YAML Lint Online: yamllint.com
- JSON to YAML Converter: Various online tools available
- Community Forums: Stack Overflow, Reddit r/yaml
- Books and Tutorials: Platform-specific YAML guides (Docker, Kubernetes, CI/CD)
Remember: The best way to learn YAML is through hands-on practice with real-world configuration files and projects relevant to your field.
Top comments (0)