DEV Community

Cover image for 🚀 Streamline Secure, Self‑Service Developer Operations with AWS SSM Automation Runbooks 🎉
Gabriel Koo for AWS Community Builders

Posted on • Edited on

🚀 Streamline Secure, Self‑Service Developer Operations with AWS SSM Automation Runbooks 🎉

🎉 Dev Efficiency vs Deployment Freeze

Ever wished you could just let your fellow software engineers make chore changes to AWS workloads like Amazon ECS by themselves — without blowing up production or getting frozen out by business blackout windows? Buckle up, developers, because we’re about to introduce your new invisible DevOps sidekick: AWS SSM Automation Runbook! 🚀

The Great Dev Efficiency Freeze

Imagine running a critical application on Amazon ECS, where developers must redeploy ECS container tasks to:

  • Fix memory issues before a permanent application-layer solution is in place (…because your engineering squad is off saving the world—or at least building the next killer feature.)
  • Update environment variables without causing downtime
  • ...etc

Now add the twist: from 12:00 PM to 3:00 PM HKT, our sales team conducts external demos. Zero downtime is non-negotiable. How can we keep dev efficiency soaring while uphold the deployment freeze requirement? ⏰


Traditional Pitfalls (AKA The Villains)

🚨 DIY Disaster: Direct ECS Access

Granting developers direct ECS permissions feels fast, but:

  • Steep learning curve: AWS Console navigation or CLI setup can introduce errors
  • High risk: One misconfigured parameter can lead to downtime
  • No enforcement: Hard to embed custom rules like blocked demo hours

Imagine your developer making mistakes to one of the parameters below and executed the API to nuke your production site:
ecs API

🛂 Cloud Engineer Bouncer: Gatekeeping Every Request

Calling the cloud team for every deploy ensures safety, but:

  • Bottleneck alert! Scalability roadblock as your squad grows
  • Productivity hit: Cloud Engineers get pulled off strategic missions

🛠️ Portal Overkill: Build-Your-Own Deployment Site

A custom web portal can enforce biz logic, yet:

  • High effort: Development and maintenance drain cloud team resources
  • Duplicate features: Reinventing wheels AWS already made
  • And you will likely just tell yourself (or whoever proposing so as a "solution") — we are just a small team, we don't have resources for these FAANG level fancy developer friendly initiatives!

Meet the Hero: AWS SSM Automation Runbook

Here’s why AWS SSM Automation Runbook is the ultimate ally:

  • Zero Custom UI — Use the built-in, intuitive dropdown in the AWS Console
  • One-Click Deep Links — Skip account/role selection, teleport straight to your runbook
  • Git-Powered Workflows — YAML runbooks in Git for reviews, rollbacks, and CI/CD
  • Approval Gates & Guardrails — aws:approve, aws:branch, aws:assertAwsResourceProperty to enforce rules and time windows
  • Centralized Audit Trail — Push execution logs to CloudWatch for total traceability

Deep Link Magic

https://<identity-center-domain>/start/#/console \
  ?account_id=123456789012 \
  &role_name=DeployRole \
  &destination=https://us-east-1.console.aws.amazon.com/systems-manager/automation/execute/RefreshECSService
# One link to rule them all! 🧙‍♂️
Enter fullscreen mode Exit fullscreen mode

Seeing this in action:
https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yhwplhy7ztzx29ggsctn.gif

With such a handy 1-click experience - your developers have no excuse in not adopting this new workflow!


Automating Your ECS Deployment with Flair

  1. Validate the Show’s Running Time
    • Check HKT; abort if within 12 PM – 3 PM.
  2. Execute the Relevant AWS API
    • The automation step aws:executeAwsApi handles the ECS update under the hood—no manual CLI needed!
  3. Handle Plot Twists
    • On failure: branch, alert via SNS or Slack, and let the team know someone drop‑kicked prod.

AWS SSM Automation Execution


Your New Best Friend in YAML

Here's part of the runbook that I built for this scenario:

description: "Refresh ECS service with time-based guardrails"
assumeRole: "arn:aws:iam::{{global:ACCOUNT_ID}}:role/DeployRole"

parameters:
  EcsCluster:
    type: String
  EcsService:
    type: String

mainSteps:
  - name: ValidateWindow
    action: "aws:branch"
    inputs:
      Choices:
        - NextStep: Abort
  - name: DeployService
    action: "aws:executeAwsApi"
    inputs:
      Service: ecs
      Api: UpdateService
      Cluster: "{{EcsCluster}}"
      ServiceName: "{{EcsService}}"
      ForceNewDeployment: true
  - name: Notify
    action: "aws:executeWebhook"
    # Configure your SNS or Slack webhook here
Enter fullscreen mode Exit fullscreen mode

Pro tip: Use the UI editor for quick tweaks; commit your YAML for CI/CD muscle.

UI editor

Extra Benefit: Extra Security

Grant no direct ecs:UpdateService access for developers – Least privileged and avoid surprises:

Least privileged and avoid surprises

The most critical API call in the example, ecs:UpdateService, is called with locked parameters - no risk of manual errors.
Locked parameters


Extending to Your Own Workflows

Enhance your toolbox with these quick-start recipes, ready to deploy across your AWS environment:

Safely Bust CDN Caches in CloudFront 🚀

Ensure clients receive the latest assets without exceeding your invalidation limits. Create a runbook that:

  • Validates distribution status: Use aws:assertAwsResourceProperty to confirm there are no ongoing invalidations.
  • Invalidates a path pattern: Execute aws:executeAwsApi to call CloudFront’s cloudfront:CreateInvalidation API with specified paths.
mainSteps:
  - name: CheckDist
    action: "aws:assertAwsResourceProperty"
    inputs:
      Service: cloudfront
      Api: GetDistribution
      Id: "{{DistributionId}}"
      PropertySelector: "Distribution.Status"
      DesiredValues: ["Deployed"]
  - name: InvalidateCache
    action: "aws:executeAwsApi"
    inputs:
      Service: cloudfront
      Api: CreateInvalidation
      DistributionId: "{{DistributionId}}"
      InvalidationBatch:
        CallerReference: "{{Execution.Id}}"
        Paths:
          Quantity: {{Paths.Count}}
          Items: {{Paths.Items}}
Enter fullscreen mode Exit fullscreen mode

Schedule DB Snapshots with Graceful Throttling 🗄️

Allow developers to request automated on-demand DB backups without overloading the production database:

  • Time-window guardrails: Restrict execution to off-peak hours using aws:branch.
  • Performance check: Use aws:executeScript to poll rds:DescribeDBInstances and ensure CPUUtilization is below the threshold.
  • Snapshot API call: Trigger rds:CreateDBSnapshot with aws:executeAwsApi.
mainSteps:
  - name: CheckIsPeakHours
    action: "aws:executeScript"
    inputs: {...}
  - name: CheckWindow
    action: "aws:branch"
    inputs:
      Choices:
        - NextStep: TakeSnapshot
          BooleanEquals: false
          Variable: "{{IsPeakHours}}"
  - name: WaitForLowCPU
    action: "aws:executeScript"
    inputs: {...}
  - name: OptionalSleep
    action: "aws:sleep"
  - name: TakeSnapshot
    action: "aws:executeAwsApi"
    inputs:
      Service: rds
      Api: CreateDBSnapshot
      DBInstanceIdentifier: "{{DBInstance}}"
      DBSnapshotIdentifier: "snapshot-{{Execution.Id}}"
Enter fullscreen mode Exit fullscreen mode

🎯 Ready to Launch?

  1. Clone my starter repo: git clone https://github.com/gabrielkoo/aws-systems-manager-runbook-for-security
  2. Adjust parameters and IAM roles for your account
  3. Author your own workflow with custom business logic checking
  4. Distribute the deep link to your team and enjoy secure, self-service common AWS operations!

Embrace secure, frictionless operations with AWS Systems Manager Automation today! 🚀

Extra - Why not AWS Step Functions?

Great question! AWS Step Functions are awesome for complex, long-running workflows, but here’s why AWS SSM Automation Runbooks might be a better fit for self-service developer operations:

  • Simplicity & Speed
    SSM Automation has a focused, built-in UI for operational tasks. No need to define state machines or handle JSON-based state transitions—runbooks come with dropdowns and reduce setup time.

  • Deep AWS Systems Manager Integration
    Automation actions like aws:approve, aws:branch, and aws:assertAwsResourceProperty are first-class citizens in SSM. Step Functions would require Lambda or other services to enforce the same guardrails.

  • Permission Scoping
    Runbooks execute under a scoped IAM role you define. While Step Functions can assume roles too, SSM Runbooks make it explicit that every action is tied to your defined parameter inputs, and you don't need to worry about define a new AWS Lambda function for every custom script in case of Step Functions.

  • Audit & Compliance
    Execution history is automatically recorded in Systems Manager. With Step Functions, you’d need CloudWatch Logs or X-Ray to tie everything together.

That said, if you need multi-account orchestration, fan-out/fan-in patterns, or integrate with external systems at scale, Step Functions can complement your automation. Choose SSM Runbooks for quick self-service ops, and Step Functions for complex, distributed workflows.

Top comments (0)