Skip to content

Add cleanup-r2-orphans command for bulk R2 deletion#109

Merged
retlehs merged 1 commit into
mainfrom
cleanup-r2-orphans-cli
May 11, 2026
Merged

Add cleanup-r2-orphans command for bulk R2 deletion#109
retlehs merged 1 commit into
mainfrom
cleanup-r2-orphans-cli

Conversation

@retlehs
Copy link
Copy Markdown
Member

@retlehs retlehs commented May 11, 2026

Follow-up to #108. The band-aid added there uses a serial loop in SyncToR2() to delete R2 files for deactivated packages — fine for the handful of deactivations per 5-min pipeline cycle, but far too slow for the one-time backfill needed after #108 ships.

The backfill needs to clear ~67k packages × 2 keys = ~135k DeleteObject calls. At ~135ms per call, the serial loop projects to ~5 hours and blocks the pipeline (oneshot service skips timer ticks while running). Confirmed empirically on prod: ~3.7 packages/sec.

This PR adds wppackages cleanup-r2-orphans, a one-off CLI command that uses the S3 DeleteObjects batch API (up to 1000 keys per request) parallelized across cfg.R2.Concurrency workers. ~135 batches at ~50-wide concurrency should finish in seconds, not hours.

What it does

internal/deploy/cleanup.go — new BulkDeleteDeactivated():

  • Queries is_active = 0 AND deployed_hash IS NOT NULL (same predicate as SyncToR2)
  • Builds the flat list of (package_id, key) pairs
  • Splits into 1000-key batches, runs DeleteObjects across an errgroup capped at cfg.R2.Concurrency
  • Tracks per-key failures from the DeleteObjectsOutput.Errors field
  • Clears deployed_hash per-batch (best effort) and again in a final pass for packages whose keys spanned multiple batches
  • Safe to interrupt and re-run — successfully-cleaned packages have their deployed_hash cleared, so a subsequent run only picks up what's left

cmd/wppackages/cmd/cleanup_r2_orphans.go — thin Cobra wrapper that calls BulkDeleteDeactivated() and logs the result.

How to run (one-time, on prod)

# Stop the running pipeline + timer so they don't fight the bulk cleanup
sudo systemctl stop wppackages-pipeline.service
sudo systemctl stop wppackages-pipeline.timer

# Bulk cleanup
/srv/wp-packages/current/wppackages cleanup-r2-orphans --db /srv/wp-packages/shared/storage/wppackages.db

# Resume normal pipeline operation
sudo systemctl start wppackages-pipeline.timer

After this, the regular per-deploy serial loop in SyncToR2 is sufficient — only a few packages get deactivated per cycle.

🤖 Generated with Claude Code

Follow-up to #108. The serial loop added there is fine for the handful of deactivations per 5-min pipeline cycle, but the one-time historical backfill (~67k packages × 2 keys = ~135k DeleteObject calls) projects to ~5 hours at the observed prod rate of ~3.7 packages/sec.

Add wppackages cleanup-r2-orphans, a one-off command that uses the S3 DeleteObjects batch API (up to 1000 keys per request) parallelized across cfg.R2.Concurrency workers. Tracks per-key failures from DeleteObjectsOutput.Errors and clears deployed_hash per-batch with a final pass for packages whose keys spanned multiple batches. Safe to interrupt and re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@retlehs retlehs merged commit fa9d246 into main May 11, 2026
5 checks passed
@retlehs retlehs deleted the cleanup-r2-orphans-cli branch May 11, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant