We had auto scaling set. Alarms configured. Metrics wired. And yet—502s.
That was the story every month-end in our GIS image processing app. A spike in usage from ops teams. Annotation tools slowing down. And the infamous error that no one wants to debug under pressure.
The ECS setup wasn’t new—built by the previous team—but now it was on us: developers and DevOps engineers trying to make sense of why scaling wasn’t saving us.
We did what most teams would do. We scaled the ECS service. Added more tasks. And for a while, it worked. Until it didn’t.
This blog isn’t just about what CAS is—there are plenty of docs for that. This is about why you might miss it, how we almost did, and what real-world capacity alignment actually looks like.
Most Builders Miss This: Task Scaling Isn’t Enough
When you configure ECS Service Auto Scaling (like scaling from 2 to 10 tasks based on CPU > 50%), ECS will try to place new tasks.
But here’s the catch:
If you’re using EC2 launch type, ECS needs available capacity on the cluster to actually place those tasks.
No CPU or memory available? The tasks stay stuck in PENDING
. And it’s silent unless you're watching.
Here’s where ECS Cluster Auto Scaling (CAS) enters the story.
A Past Pain: Month-End GIS Workloads That Failed to Scale
In a previous role, we managed an internal image processing tool that rendered GIS data and allowed operations teams to annotate high-resolution maps. It wasn’t a real-time app — but it was heavy. And during critical windows like month-end or year-end closures, load would spike massively.
The app:
Generated map tiles on the fly
Handled concurrent uploads and annotations
Involved CPU-heavy image processing
We assumed ECS auto scaling would “just work.” But then came 502s.
Naturally, we began by debugging the app:
Checked RDS performance
Tuned Apache settings
Reproduced failures with same payloads
Nothing helped. The mystery deepened.
Until we noticed this: tasks were stuck in PENDING
, but CPU and memory metrics looked fine.
That’s when we connected the dots. We had scaling at the task level, but the infrastructure wasn’t scaling with it.
It was like hiring more workers without giving them desks. We were adding more containers, but the underlying compute had no room to host them.
How to Estimate Capacity Like a Developer
Let’s say you're running a Flask or FastAPI app on ECS. The app handles:
10–12 API calls per user action
Each API call does a DB lookup + image transform
Spikes happen during end-of-day or batch usage
How do you estimate how many ECS tasks you need?
Here’s a developer-first method:
Step 1: Understand the API behaviour
What is the average latency of a single API call? (e.g. 500ms)
Are the calls CPU or memory bound? (CloudWatch / APM tools)
What’s the max concurrency? (e.g. 100 users x 10 calls = 1,000)
If each ECS task can handle ~10 concurrent API calls → you need ~100 tasks
Step 2: Know Your Task Size
If task = 0.25 vCPU, 512 MB and EC2 = 2 vCPU, 8 GB → host ~8 tasks per EC2
100 tasks → ~13 EC2s
Step 3: Monitor Key Metrics
CPUReservation
andMemoryReservation
PendingTaskCount
(cluster)ECS
ManagedScaling
logsApp logs for 502s, slow endpoints, queuing behavior
Step 4: Set Scaling Policies
Task scaling: CPU > 50%
CAS scaling: set
targetCapacity = 80%
for buffer
How ECS Cluster Auto Scaling Actually Works
It’s Not Magic — It’s Math
When ECS needs to launch new tasks but can't due to resource shortage, it uses a Capacity Provider with a formula like this:
desired = ceil((needed capacity) / (instance capacity)) * target capacity %
Let’s say you have:
Pending Tasks: 4 tasks
Each Task Needs: 0.5 vCPU and 1 GB RAM
EC2 Type:
t4g.medium
(2 vCPU, 4 GB RAM)Target Capacity: 100% (binpack strategy)
Step-by-step:
- Total Needed Capacity:
* 2 vCPU (0.5 x 4)
* 4 GB RAM (1 x 4)
- Per Instance Capacity:
* 2 vCPU and 4 GB RAM per `t4g.medium`
- Divide & Ceil:
* CPU: 2 / 2 = 1
* Memory: 4 / 4 = 1
* Take the **max of the two** = 1
- Apply Target Capacity %:
* At 100% target, no buffer → `desired = 1` EC2 instance
So CAS would scale one t4g.medium to place those four tasks.
Target capacity lets you control buffer: set to 100% for binpack-style efficiency, or 80% for headroom.
Key Concepts from ECS CAS Internals
ECS checks task placement every 15 seconds
If it can’t place tasks, they go into the provisioning state (not failed)
CAS calculates how many EC2s are needed based on task resource demand
Up to 100 tasks can be in provisioning per cluster
Provisioning timeout is 10–30 minutes before task is stopped
Daemon vs Non-Daemon Tasks: What Matters for Scaling
Daemon Task
Scheduled to run on every EC2 instance
Used for agents, log forwarders, metrics collectors
ECS ignores these when calculating scale-out/scale-in
Non-Daemon Task
Your real app workloads (Flask, Socket, Workers)
These determine whether EC2s are needed or idle
How ECS Decides How Many EC2s to Run
Let’s say:
N
= current EC2sM
= desired EC2s (CAS output)
Condition | Outcome |
---|---|
No pending tasks, all EC2s used | M = N |
Pending tasks present | M > N (scale out) |
Idle EC2s (only daemon tasks) | M < N (scale in) |
ECS in Real Life: Before and After CAS
Once we added CAS:
We linked services to a capacity provider
Enabled managed scaling (target = 100%)
Switched placement to
binpack
Finally, tasks scaled. And so did the infra. No more 502s.
What If You're Launching a New App and Don’t Know the Load Yet?
Start lean. Scale for learnings.
When launching something new—like we are with NuShift—it’s often unclear what kind of user load or traffic patterns to expect. In such cases, make decisions based on expected concurrency, your framework’s behaviour, and instance characteristics.
Here are some tips to guide early capacity planning:
Estimate concurrency: If you expect 50–100 concurrent users, and each user triggers multiple API calls, try to estimate peak call concurrency.
Understand your app behaviour: Flask or FastAPI-based apps usually work well with 0.25 vCPU and 512MB, especially if I/O bound (e.g., API calls, DB reads). If your app does image processing or CPU-intensive work, start with 0.5 vCPU.
Choose your EC2 wisely: We use
t4g.medium
(2 vCPU, 4GB RAM) for its cost-efficiency and support for multiple small tasks (6–8 per instance).Monitor early patterns: Let metrics shape your scaling curve—track
CPUUtilisation
,MemoryUtilisation
, and task startup times.
Example initial config:
Flask API: 1–3 tasks (0.25 vCPU, 512 MB)
WebSocket: 1–2 tasks (depends on socket concurrency)
EC2: t4g.medium in an ASG with ECS capacity provider
CAS: enabled with 80% targetCapacity for buffer
Use New Relic, CloudWatch, or X-Ray to track CPU, memory, latency, and pending counts.
Final Thought
Scaling your application is easy to talk about. But infrastructure scaling is where things quietly break.
If you’re only watching task counts and CPU graphs, you might miss deeper issues:
PENDING tasks with nowhere to run
EC2s running agents, not apps
Cold starts caused by infra lag
Auto scaling isn’t just about adding containers—it’s about giving them somewhere to live
Top comments (4)
been there, absolutely nailed the pain - always makes me wonder tho, you think long-term scaling is more about monitoring patterns or about just learning from the screwups?
It should be combination because sometime what we are in crunch with budget in that case monitoring patterns + wait for screwups condition. It is up to us engineering team and business how much they can take up heat either over budget or user experience.
same story applies for Fargate?
Yes. Fargate also need to enabled CAS under ECS.