Skip to content

feat: add per-node untaint grace period#286

Open
FocalChord wants to merge 1 commit into
atlassian:masterfrom
FocalChord:feat/untaint-grace-period
Open

feat: add per-node untaint grace period#286
FocalChord wants to merge 1 commit into
atlassian:masterfrom
FocalChord:feat/untaint-grace-period

Conversation

@FocalChord
Copy link
Copy Markdown
Contributor

@FocalChord FocalChord commented Mar 6, 2026

What

This PR adds a new optional config field untaint_grace_period that prevents a node from being re-tainted within a configurable duration after it was untainted. When Escalator untaints a node, a timestamp is recorded in an in-memory map on NodeGroupState. When taintInstances selects taint candidates, it skips any node whose untaint timestamp is within the grace period.

Defaults to no grace period when the field is omitted, empty, or set to "0s". Existing deployments are unaffected.

Why

When Escalator untaints a node, the Kubernetes scheduler can place new pods on it within seconds. If utilisation dips on the next tick, Escalator can re-taint that same node. Those freshly scheduled pods are now running on a NoSchedule node that will not receive further work but cannot be terminated until the pods complete or hard_delete_grace_period expires.

We observed this on one of our external clusters where a node group was oscillating at 28 taint/untaint cycles per hour. One node was untainted at 22:37, received 6 pipeline pods during the untainted window, and was re-tainted at 22:43. Those pods stranded the node at single-digit useful utilisation for hours.

A grace period of 5 to 10 minutes makes recently untainted nodes ineligible for re-tainting, giving the scheduler time to fill them and giving Escalator a more stable signal before deciding whether to remove them again.

Design considerations

In-memory map vs Kubernetes annotations. We considered using node annotations to store the untaint timestamp. This would survive Escalator restarts but requires an extra API write on every untaint and an extra annotation read per taint candidate per tick. For clusters with 80+ nodes, that's meaningful API load. The in-memory approach is a hashmap lookup with zero API calls.

NodeGroupState already carries in-memory state between scans in the same pattern: taintTracker, forceTaintTracker, scaleDelta, lastScaleOut, scaleUpLock, cpuCapacity, memCapacity. This is another instance of that, not a new concept.

The tradeoff is that the map is lost on restart. If Escalator restarts, every node is immediately eligible for tainting. The worst case is a node untainted shortly before the restart could be re-tainted on the first tick after restart. That's a brief window of reduced protection, not a correctness problem.

Map growth. A cleanup step runs at the start of each scaleNodeGroup call and removes entries for nodes that no longer exist. The map is bounded by the number of nodes in the node group.

Testing

Six test cases covering: recently untainted node is not re-tainted, expired grace period allows re-tainting, empty/0s grace period preserves existing behavior, stale tracker entries are cleaned up, wet mode records timestamps, dry mode records timestamps.


Rovo Dev code review: Rovo Dev couldn't review this pull request
Upgrade to Rovo Dev Standard to continue using code review.

@atlassian-cla-bot
Copy link
Copy Markdown

Thank you for your submission! Like many open source projects, we ask that you sign our CLA (Contributor License Agreement) before we can accept your contribution.
If your email is listed below, please ensure that you sign the CLA with the same email address.

The following users still need to sign our CLA:
❌nbhatt-atlassian

Already signed the CLA? To re-check, try refreshing the page.

@FocalChord FocalChord changed the title (feat) Add configurable per-node re-taint grace period feat: add per-node untaint grace period Mar 6, 2026
@FocalChord FocalChord force-pushed the feat/untaint-grace-period branch from ed756da to b43d33b Compare March 10, 2026 23:34
@FocalChord FocalChord requested a review from awprice March 13, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant