DEV Community

Cover image for YAML Learning Guide - Complete Tutorial
wymdev
wymdev

Posted on

YAML Learning Guide - Complete Tutorial

Introduction

YAML (YAML Ain't Markup Language) is a human-readable data serialization standard that has become increasingly popular for configuration files in modern programming applications. YAML combines the best features from Perl, C, XML, HTML, and JSON, creating a reliable and easily readable format.

Due to its strong compatibility with JSON format, YAML files can be easily interchanged with JSON files. This guide focuses on fundamental concepts and commonly used patterns for those beginning to learn YAML.

YAML Basic Structure

File Extensions and Rules

YAML files use the .yaml extension. Some systems that don't support extensions longer than three characters also use .yml.

Important Rules:

  • YAML is case sensitive
  • Use spaces instead of tabs for indentation
  • Tabs are prohibited due to varying tab settings across different editors

Document Structure

A YAML processing sequence is called a "stream." A stream can contain multiple documents.

Document Markers:

  • --- (Triple dashes): Indicates the start of a document
  • ... (Triple dots): Indicates the end of a document

While markers are optional for single documents, they are required for streams containing multiple documents.

Example - Multiple Documents:

# Baseball rankings 1998
---
- Mark McGwire
- Sammy Sosa  
- Ken Griffey

# Team rankings
---
- Chicago Cubs
- St Louis Cardinals
Enter fullscreen mode Exit fullscreen mode

Comments

Comments in YAML start with the # symbol.

# This is a comment
name: John Doe  # This is also a comment
Enter fullscreen mode Exit fullscreen mode

YAML Writing Styles

YAML can be written in two formats:

1. Block Style

A human-friendly format that uses indentation to represent structure.

Sequence (List) Syntax:

fruits:
  - apple
  - banana
  - orange
Enter fullscreen mode Exit fullscreen mode

Mapping (Key-Value) Syntax:

person:
  name: John Doe
  age: 30
  city: New York
Enter fullscreen mode Exit fullscreen mode

2. Flow Style

A compact format similar to JSON.

Flow Sequence:

fruits: [apple, banana, orange]
Enter fullscreen mode Exit fullscreen mode

Flow Mapping:

person: {name: "John Doe", age: 30, city: "New York"}
Enter fullscreen mode Exit fullscreen mode

Data Types

YAML has three main data types:

1. Scalar (Basic Data Types)

Numeric Types

Integer:

decimal: 123
octal: 0o123
hexadecimal: 0x123
Enter fullscreen mode Exit fullscreen mode

Floating Point:

fixed: 123.45
exponential: 1.23e+3
Enter fullscreen mode Exit fullscreen mode

Boolean:

enabled: true
disabled: false
yes_value: yes
no_value: no
on_value: on
off_value: off
Enter fullscreen mode Exit fullscreen mode

String Types

Block Style Strings:

Literal Style (|) - Preserves line breaks:

description: |
  This is line one
  This is line two  
  This is line three
Enter fullscreen mode Exit fullscreen mode

Folded Style (>) - Converts line breaks to spaces:

description: >
  This long sentence will be
  folded into a single line
  when processed
Enter fullscreen mode Exit fullscreen mode

Flow Style Strings:

# Plain style
plain_string: Hello World

# Quoted styles  
single_quoted: 'Hello "World"!'
double_quoted: "Hello\nWorld"  # Can contain escape sequences
Enter fullscreen mode Exit fullscreen mode

2. Sequences (Lists/Arrays)

Represents ordered collections.

Simple List:

shopping_list:
  - milk
  - bread
  - eggs
  - ""  # Empty string
Enter fullscreen mode Exit fullscreen mode

Complex Sequences:

employees:
  - name: John Doe
    position: Developer
    salary: 50000
  - name: Jane Smith  
    position: Designer
    salary: 45000
Enter fullscreen mode Exit fullscreen mode

Nested Sequences:

matrix:
  - [1, 2, 3]
  - [4, 5, 6] 
  - [7, 8, 9]
Enter fullscreen mode Exit fullscreen mode

3. Mappings (Key-Value Pairs)

Represents unordered associations.

Simple Mapping:

server:
  host: localhost
  port: 8080
  ssl: true
Enter fullscreen mode Exit fullscreen mode

Nested Mapping:

database:
  primary:
    host: db1.example.com
    port: 5432
  replica:
    host: db2.example.com
    port: 5432
Enter fullscreen mode Exit fullscreen mode

Mixed Collections:

application:
  name: MyApp
  version: 1.0.0
  features:
    - authentication
    - logging
    - caching
  database:
    type: postgresql
    settings:
      pool_size: 10
      timeout: 30
Enter fullscreen mode Exit fullscreen mode

Advanced Features

YAML Anchors and Aliases

Anchors and aliases are used to reduce code duplication and improve reusability.

Defining Anchors: &anchor_name
Using Aliases: *anchor_name

# Define anchor
default_settings: &default
  timeout: 30
  retries: 3
  ssl: true

# Use aliases
api_server:
  host: api.example.com
  <<: *default  # Merge anchor content

web_server:  
  host: web.example.com
  <<: *default
  timeout: 60  # Override specific value
Enter fullscreen mode Exit fullscreen mode

YAML Tags

Tags are used to specify data types or provide custom processing instructions.

Data Type Tags:

# Force string type
port: !!str 8080
version: !!str 1.0

# Explicit types
count: !!int 42
percentage: !!float 98.6
enabled: !!bool true
Enter fullscreen mode Exit fullscreen mode

Custom Tags:

%TAG ! tag:example.com,2024:
---
server: !server
  name: web-01
  <<: *default_config
Enter fullscreen mode Exit fullscreen mode

Practical Examples

Configuration File Example

# Application Configuration
application:
  name: "E-Commerce API"
  version: "2.1.0"
  environment: production

server:
  host: 0.0.0.0
  port: 8080
  ssl:
    enabled: true
    certificate: /path/to/cert.pem
    private_key: /path/to/key.pem

database:
  primary:
    driver: postgresql
    host: db-primary.example.com
    port: 5432
    name: ecommerce
    credentials:
      username: app_user
      password: ${DB_PASSWORD}

  redis:
    host: cache.example.com
    port: 6379
    database: 0

features:
  - user_authentication
  - order_processing  
  - inventory_management
  - payment_gateway

logging:
  level: info
  format: json
  outputs:
    - console
    - file: /var/log/app.log
Enter fullscreen mode Exit fullscreen mode

Docker Compose Example

version: '3.8'

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - app
    networks:
      - frontend

  app:
    build: 
      context: .
      dockerfile: Dockerfile
    environment:
      - NODE_ENV=production
      - DB_HOST=database
    depends_on:
      - database
    networks:
      - frontend
      - backend

  database:
    image: postgres:13
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    volumes:
      - db_data:/var/lib/postgresql/data
    networks:
      - backend

networks:
  frontend:
  backend:

volumes:
  db_data:
Enter fullscreen mode Exit fullscreen mode

CI/CD Pipeline Example

# GitHub Actions Workflow
name: Build and Deploy

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: '18'
  DOCKER_REGISTRY: ghcr.io

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Setup Node.js
      uses: actions/setup-node@v3
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'

    - name: Install dependencies
      run: npm ci

    - name: Run tests
      run: npm test

    - name: Run linting
      run: npm run lint

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
    - uses: actions/checkout@v3

    - name: Build Docker image
      run: |
        docker build -t ${{ env.DOCKER_REGISTRY }}/myapp:${{ github.sha }} .
        docker tag ${{ env.DOCKER_REGISTRY }}/myapp:${{ github.sha }} ${{ env.DOCKER_REGISTRY }}/myapp:latest

    - name: Push to registry
      run: |
        echo ${{ secrets.GITHUB_TOKEN }} | docker login ${{ env.DOCKER_REGISTRY }} -u ${{ github.actor }} --password-stdin
        docker push ${{ env.DOCKER_REGISTRY }}/myapp:${{ github.sha }}
        docker push ${{ env.DOCKER_REGISTRY }}/myapp:latest
Enter fullscreen mode Exit fullscreen mode

Common Mistakes and Best Practices

Common Errors

  1. Incorrect Indentation
# Wrong
parent:
child: value

# Correct  
parent:
  child: value
Enter fullscreen mode Exit fullscreen mode
  1. Mixing Tabs and Spaces
# Wrong (mixing tabs and spaces)
config:
    host: localhost  # tab used
  port: 8080        # spaces used

# Correct (consistent spacing)
config:
  host: localhost
  port: 8080
Enter fullscreen mode Exit fullscreen mode
  1. Unnecessary Quotes
# Unnecessary quotes
name: "John"
age: "30"

# Better
name: John  
age: 30
Enter fullscreen mode Exit fullscreen mode
  1. Special Characters Without Quotes
# Wrong - colon in value without quotes
message: Error: Something went wrong

# Correct
message: "Error: Something went wrong"
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Consistent Indentation: Use either 2 spaces or 4 spaces consistently throughout the file

  2. Meaningful Comments: Add comments for important sections and complex configurations

  3. Validation: Use YAML syntax checkers and linters to validate your files

  4. Security: Use environment variables for sensitive data instead of hardcoding

  5. Organization: Group related configurations together and use logical ordering

  6. Version Control: Always specify versions for dependencies and tools

  7. Documentation: Include inline documentation for complex configurations

Validation Tools

Online Validators:

  • YAML Lint (yamllint.com)
  • Online YAML Parser
  • JSON to YAML converters

Command Line Tools:

# Using yamllint
pip install yamllint
yamllint config.yaml

# Using yq for parsing and validation
yq eval '.' config.yaml
Enter fullscreen mode Exit fullscreen mode

Advanced Topics

Multi-line Strings

# Literal block scalar (preserves newlines)
literal: |
  Line 1
  Line 2
  Line 3

# Folded block scalar (folds newlines to spaces)  
folded: >
  This is a very long line that will be
  folded into a single line, which is
  useful for readability.

# With block chomping indicators
literal_keep: |+
  Line 1
  Line 2


literal_strip: |-
  Line 1
  Line 2
Enter fullscreen mode Exit fullscreen mode

Complex Anchors and Merging

# Complex anchor definitions
database_config: &db_config
  driver: postgresql
  pool_size: 10
  timeout: 30
  ssl: true

app_defaults: &app_defaults
  restart_policy: always
  memory_limit: 512m
  cpu_limit: 0.5

# Merging multiple anchors
production_api:
  <<: [*app_defaults, *db_config]
  name: production-api
  replicas: 3
  environment: production

staging_api:
  <<: [*app_defaults, *db_config]
  name: staging-api
  replicas: 1
  environment: staging
  memory_limit: 256m  # Override default
Enter fullscreen mode Exit fullscreen mode

Schema Validation

# JSON Schema for YAML validation
$schema: "http://json-schema.org/draft-07/schema#"
type: object
properties:
  name:
    type: string
    minLength: 1
  version:
    type: string
    pattern: "^\\d+\\.\\d+\\.\\d+$"
  dependencies:
    type: array
    items:
      type: string
required:
  - name
  - version
Enter fullscreen mode Exit fullscreen mode

Performance Considerations

Large Files

# For large datasets, consider using references
shared_config: &shared
  common_setting_1: value1
  common_setting_2: value2
  common_setting_3: value3

# Use references instead of repeating
service_1:
  <<: *shared
  specific_setting: value

service_2:
  <<: *shared
  specific_setting: different_value
Enter fullscreen mode Exit fullscreen mode

Memory Usage

# Avoid deeply nested structures when possible
# Instead of:
deeply:
  nested:
    structure:
      that:
        goes:
          very:
            deep: value

# Consider flattening:
config_deep_value: value
Enter fullscreen mode Exit fullscreen mode

Integration with Programming Languages

Python Example

import yaml

# Loading YAML
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Writing YAML
data = {
    'name': 'MyApp',
    'version': '1.0.0',
    'features': ['auth', 'logging']
}

with open('output.yaml', 'w') as file:
    yaml.dump(data, file, default_flow_style=False)
Enter fullscreen mode Exit fullscreen mode

JavaScript/Node.js Example

const yaml = require('js-yaml');
const fs = require('fs');

// Loading YAML
try {
  const config = yaml.load(fs.readFileSync('config.yaml', 'utf8'));
  console.log(config);
} catch (e) {
  console.error('Error loading YAML:', e);
}

// Writing YAML
const data = {
  name: 'MyApp',
  version: '1.0.0',
  features: ['auth', 'logging']
};

try {
  const yamlStr = yaml.dump(data);
  fs.writeFileSync('output.yaml', yamlStr);
} catch (e) {
  console.error('Error writing YAML:', e);
}
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Common Issues

Parsing Errors

# Issue: Unquoted strings with special characters
url: http://example.com:8080/path?param=value  # May cause issues

# Solution: Quote strings with special characters
url: "http://example.com:8080/path?param=value"
Enter fullscreen mode Exit fullscreen mode

Indentation Problems

# Issue: Inconsistent indentation
config:
  database:
    host: localhost
     port: 5432  # Wrong indentation

# Solution: Consistent indentation
config:
  database:
    host: localhost
    port: 5432
Enter fullscreen mode Exit fullscreen mode

Type Coercion Issues

# Issue: Unintended type conversion
version: 1.0     # Becomes float
port: 08080      # May become octal

# Solution: Explicit string typing
version: "1.0"
port: "8080"
Enter fullscreen mode Exit fullscreen mode

Conclusion

YAML has become an essential format for configuration management, data serialization, and API documentation. Its human-readable nature and powerful features make it popular among developers and system administrators.

After understanding the basic concepts, you can explore more advanced features such as anchors, aliases, tags, and complex data structures. The key to mastering YAML is practice and understanding the specific requirements of your tools and applications.

For more detailed documentation and examples, visit yaml.org and explore the extensive ecosystem of YAML tools and libraries available for your programming language of choice.

Additional Resources

  • Official YAML Specification: yaml.org/spec
  • YAML Lint Online: yamllint.com
  • JSON to YAML Converter: Various online tools available
  • Community Forums: Stack Overflow, Reddit r/yaml
  • Books and Tutorials: Platform-specific YAML guides (Docker, Kubernetes, CI/CD)

Remember: The best way to learn YAML is through hands-on practice with real-world configuration files and projects relevant to your field.

Top comments (0)