macos_run_puppet: make bootstrap reboot-survivable via LaunchDaemon#1210
Open
rcurranmoz wants to merge 2 commits into
Open
macos_run_puppet: make bootstrap reboot-survivable via LaunchDaemon#1210rcurranmoz wants to merge 2 commits into
rcurranmoz wants to merge 2 commits into
Conversation
Today, bootstrapping a fresh M4 worker requires an external SSH session
that babysits the run across at least two reboots (TCC.db detection +
final post-puppet reboot), and a third if MDM drives an OS upgrade
mid-bootstrap.
This commit adds a self-registering LaunchDaemon
(com.mozilla.ronin-puppet-bootstrap) so run-puppet.sh fires on every
boot until two conditions are met:
1. Puppet apply has succeeded cleanly (existing `while/run_puppet`
loop already gates this)
2. Puppet's regular at-boot mechanism (com.mozilla.atboot_puppet) is
installed
When both are satisfied, the script writes /var/tmp/semaphore/run-buildbot
so generic-worker can start, then unloads and removes its own
LaunchDaemon. Future boots are handled by the regular puppet at-boot
mechanism with no overlap.
Two helpers added:
- `install_bootstrap_launchd`: copies the script to
/usr/local/sbin/ronin-puppet-bootstrap.sh and writes
/Library/LaunchDaemons/com.mozilla.ronin-puppet-bootstrap.plist. No-op
if puppet's at-boot LaunchDaemon is already present.
- `finalize_bootstrap`: writes the run-buildbot semaphore and removes
the bootstrap LaunchDaemon (only after confirming puppet's at-boot
mechanism is in place, so we don't leave the host with no puppet
trigger).
`install_bootstrap_launchd` is called once role/puppet/facter
preconditions are confirmed; `finalize_bootstrap` is called at the
existing `exit 0` after the puppet retry loop has broken.
Result: an MDM script-job (or one-shot `bolt run`) can kick the
bootstrap and walk away. The host finishes provisioning across however
many reboots are needed.
Fixes #1206
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1206.
Today a fresh M4 worker bootstrap needs a babysitter SSH/Bolt session that survives across all reboots — TCC.db reboot, MDM-driven OS upgrade, post-puppet reboot. We just walked through this with macmini-m4-130..149 on 2026-05-12 and spent ~3 hours nursing it.
This PR has
run-puppet.shregister a self-removing LaunchDaemon (com.mozilla.ronin-puppet-bootstrap) on first invocation. The LaunchDaemon re-runs the script on every boot until:while/run_puppetloop already gates this); andcom.mozilla.atboot_puppet) is installed, indicating the host is fully managed.Once both hold, the script writes
/var/tmp/semaphore/run-buildbotand unloads + removes its own LaunchDaemon. The host is in the pool, and future boots are handled by the regular puppet at-boot mechanism with no overlap.Implementation
Two helpers in
modules/macos_run_puppet/files/run-puppet.sh:install_bootstrap_launchd: copies the script to/usr/local/sbin/ronin-puppet-bootstrap.shand writes/Library/LaunchDaemons/com.mozilla.ronin-puppet-bootstrap.plist. No-op if puppet's at-boot LaunchDaemon is already in place.finalize_bootstrap: writes/var/tmp/semaphore/run-buildbot, then (only if puppet's at-boot LaunchDaemon exists) unloads and removes the bootstrap LaunchDaemon.install_bootstrap_launchdis called after the existing role-file / puppet-binary preconditions, so a host without/etc/puppet_rolewon't install the LaunchDaemon and start a reboot loop.finalize_bootstrapis called at the existingexit 0after the puppet retry loop breaks.Operational impact
Before: orchestrator must SSH-and-wait across N reboots.
After: a single MDM script-job (or
bolt run) kicksrun-puppet.shand walks away.Test plan
run-puppet.shonce, observe it hits TCC.db reboot trigger, comes back, applies cleanly, writesrun-buildbot, removes itselfrunning_in_test_kitchenfact path is unchanged)Related
Pairs naturally with #1208 (TCC.db cltbld-session gate) — together they'd eliminate the babysitter pattern entirely.
🤖 Generated with Claude Code