🎉 Dev Efficiency vs Deployment Freeze
Ever wished you could just let your fellow software engineers make chore changes to AWS workloads like Amazon ECS by themselves — without blowing up production or getting frozen out by business blackout windows? Buckle up, developers, because we’re about to introduce your new invisible DevOps sidekick: AWS SSM Automation Runbook! 🚀
The Great Dev Efficiency Freeze
Imagine running a critical application on Amazon ECS, where developers must redeploy ECS container tasks to:
- Fix memory issues before a permanent application-layer solution is in place (…because your engineering squad is off saving the world—or at least building the next killer feature.)
- Update environment variables without causing downtime
- ...etc
Now add the twist: from 12:00 PM to 3:00 PM HKT, our sales team conducts external demos. Zero downtime is non-negotiable. How can we keep dev efficiency soaring while uphold the deployment freeze requirement? ⏰
Traditional Pitfalls (AKA The Villains)
🚨 DIY Disaster: Direct ECS Access
Granting developers direct ECS permissions feels fast, but:
- Steep learning curve: AWS Console navigation or CLI setup can introduce errors
- High risk: One misconfigured parameter can lead to downtime
- No enforcement: Hard to embed custom rules like blocked demo hours
Imagine your developer making mistakes to one of the parameters below and executed the API to nuke your production site:
🛂 Cloud Engineer Bouncer: Gatekeeping Every Request
Calling the cloud team for every deploy ensures safety, but:
- Bottleneck alert! Scalability roadblock as your squad grows
- Productivity hit: Cloud Engineers get pulled off strategic missions
🛠️ Portal Overkill: Build-Your-Own Deployment Site
A custom web portal can enforce biz logic, yet:
- High effort: Development and maintenance drain cloud team resources
- Duplicate features: Reinventing wheels AWS already made
- And you will likely just tell yourself (or whoever proposing so as a "solution") — we are just a small team, we don't have resources for these FAANG level fancy developer friendly initiatives!
Meet the Hero: AWS SSM Automation Runbook
Here’s why AWS SSM Automation Runbook is the ultimate ally:
- Zero Custom UI — Use the built-in, intuitive dropdown in the AWS Console
- One-Click Deep Links — Skip account/role selection, teleport straight to your runbook
- Git-Powered Workflows — YAML runbooks in Git for reviews, rollbacks, and CI/CD
-
Approval Gates & Guardrails —
aws:approve
,aws:branch
,aws:assertAwsResourceProperty
to enforce rules and time windows - Centralized Audit Trail — Push execution logs to CloudWatch for total traceability
Deep Link Magic
https://<identity-center-domain>/start/#/console \
?account_id=123456789012 \
&role_name=DeployRole \
&destination=https://us-east-1.console.aws.amazon.com/systems-manager/automation/execute/RefreshECSService
# One link to rule them all! 🧙‍♂️
Seeing this in action:
https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yhwplhy7ztzx29ggsctn.gif
With such a handy 1-click experience - your developers have no excuse in not adopting this new workflow!
Automating Your ECS Deployment with Flair
-
Validate the Show’s Running Time
- Check HKT; abort if within 12 PM – 3 PM.
-
Execute the Relevant AWS API
- The automation step
aws:executeAwsApi
handles the ECS update under the hood—no manual CLI needed!
- The automation step
-
Handle Plot Twists
- On failure: branch, alert via SNS or Slack, and let the team know someone drop‑kicked prod.
Your New Best Friend in YAML
Here's part of the runbook that I built for this scenario:
description: "Refresh ECS service with time-based guardrails"
assumeRole: "arn:aws:iam::{{global:ACCOUNT_ID}}:role/DeployRole"
parameters:
EcsCluster:
type: String
EcsService:
type: String
mainSteps:
- name: ValidateWindow
action: "aws:branch"
inputs:
Choices:
- NextStep: Abort
- name: DeployService
action: "aws:executeAwsApi"
inputs:
Service: ecs
Api: UpdateService
Cluster: "{{EcsCluster}}"
ServiceName: "{{EcsService}}"
ForceNewDeployment: true
- name: Notify
action: "aws:executeWebhook"
# Configure your SNS or Slack webhook here
Pro tip: Use the UI editor for quick tweaks; commit your YAML for CI/CD muscle.
Extra Benefit: Extra Security
Grant no direct ecs:UpdateService
access for developers – Least privileged and avoid surprises:
The most critical API call in the example, ecs:UpdateService
, is called with locked parameters - no risk of manual errors.
Extending to Your Own Workflows
Enhance your toolbox with these quick-start recipes, ready to deploy across your AWS environment:
Safely Bust CDN Caches in CloudFront 🚀
Ensure clients receive the latest assets without exceeding your invalidation limits. Create a runbook that:
- Validates distribution status: Use
aws:assertAwsResourceProperty
to confirm there are no ongoing invalidations. - Invalidates a path pattern: Execute
aws:executeAwsApi
to call CloudFront’scloudfront:CreateInvalidation
API with specified paths.
mainSteps:
- name: CheckDist
action: "aws:assertAwsResourceProperty"
inputs:
Service: cloudfront
Api: GetDistribution
Id: "{{DistributionId}}"
PropertySelector: "Distribution.Status"
DesiredValues: ["Deployed"]
- name: InvalidateCache
action: "aws:executeAwsApi"
inputs:
Service: cloudfront
Api: CreateInvalidation
DistributionId: "{{DistributionId}}"
InvalidationBatch:
CallerReference: "{{Execution.Id}}"
Paths:
Quantity: {{Paths.Count}}
Items: {{Paths.Items}}
Schedule DB Snapshots with Graceful Throttling 🗄️
Allow developers to request automated on-demand DB backups without overloading the production database:
- Time-window guardrails: Restrict execution to off-peak hours using
aws:branch
. - Performance check: Use
aws:executeScript
to pollrds:DescribeDBInstances
and ensureCPUUtilization
is below the threshold. - Snapshot API call: Trigger
rds:CreateDBSnapshot
withaws:executeAwsApi
.
mainSteps:
- name: CheckIsPeakHours
action: "aws:executeScript"
inputs: {...}
- name: CheckWindow
action: "aws:branch"
inputs:
Choices:
- NextStep: TakeSnapshot
BooleanEquals: false
Variable: "{{IsPeakHours}}"
- name: WaitForLowCPU
action: "aws:executeScript"
inputs: {...}
- name: OptionalSleep
action: "aws:sleep"
- name: TakeSnapshot
action: "aws:executeAwsApi"
inputs:
Service: rds
Api: CreateDBSnapshot
DBInstanceIdentifier: "{{DBInstance}}"
DBSnapshotIdentifier: "snapshot-{{Execution.Id}}"
🎯 Ready to Launch?
- Clone my starter repo:
git clone https://github.com/gabrielkoo/aws-systems-manager-runbook-for-security
- Adjust parameters and IAM roles for your account
- Author your own workflow with custom business logic checking
- Distribute the deep link to your team and enjoy secure, self-service common AWS operations!
Embrace secure, frictionless operations with AWS Systems Manager Automation today! 🚀
Extra - Why not AWS Step Functions?
Great question! AWS Step Functions are awesome for complex, long-running workflows, but here’s why AWS SSM Automation Runbooks might be a better fit for self-service developer operations:
Simplicity & Speed
SSM Automation has a focused, built-in UI for operational tasks. No need to define state machines or handle JSON-based state transitions—runbooks come with dropdowns and reduce setup time.Deep AWS Systems Manager Integration
Automation actions like aws:approve, aws:branch, and aws:assertAwsResourceProperty are first-class citizens in SSM. Step Functions would require Lambda or other services to enforce the same guardrails.Permission Scoping
Runbooks execute under a scoped IAM role you define. While Step Functions can assume roles too, SSM Runbooks make it explicit that every action is tied to your defined parameter inputs, and you don't need to worry about define a new AWS Lambda function for every custom script in case of Step Functions.Audit & Compliance
Execution history is automatically recorded in Systems Manager. With Step Functions, you’d need CloudWatch Logs or X-Ray to tie everything together.
That said, if you need multi-account orchestration, fan-out/fan-in patterns, or integrate with external systems at scale, Step Functions can complement your automation. Choose SSM Runbooks for quick self-service ops, and Step Functions for complex, distributed workflows.
Top comments (0)