DEV Community

Alex Aslam
Alex Aslam

Posted on

The Phantom Deadlock: When Rails, Sidekiq, and PostgreSQL Collide

It was 3 AM when PagerDuty screamed into the void.

Our Sidekiq dashboard showed a nightmare: dozens of workers frozen mid-execution. No errors. No timeouts. Just… silence.

The logs revealed nothing.

After hours of forensic debugging, we found the culprit: an invisible database deadlock—one that didn’t even log its crime.

Here’s how we hunted it down and fixed it for good.


The Crime Scene

Symptoms

  • Sidekiq jobs hung indefinitely (but only sometimes).
  • PostgreSQL logs showed no deadlocks (log_lock_waits was off—mistake #1).
  • Workers stuck on trivial queries (e.g., UPDATE orders SET status = 'processed').

The Root Cause

A deadly embrace:

  1. Job A locked Order #123 (via with_advisory_lock).
  2. Job B (in the same process) tried to update Order #123 in a transaction.
  3. Job A then queried another table… which Job B had already locked.

Deadlock. But PostgreSQL didn’t report it because advisory locks bypass the deadlock detector.


The Investigation

1. Diagnosing the Invisible

We used:

SELECT pid, query, wait_event_type, wait_event  
FROM pg_stat_activity  
WHERE state = 'active';  
Enter fullscreen mode Exit fullscreen mode

Found: Workers waiting on Lock events—but no blocking queries in sight.

Lesson: Always enable in postgresql.conf:

log_lock_waits = on  
deadlock_timeout = 1s  
Enter fullscreen mode Exit fullscreen mode

2. The with_advisory_lock Trap

This convenience method turns deadly when:

  • Used inside transactions.
  • Nested with other locks.

Bad:

Order.transaction do  
  with_advisory_lock("order_#{id}") do  
    # ...  
  end  
end  
Enter fullscreen mode Exit fullscreen mode

Good:

with_advisory_lock("order_#{id}") do  
  Order.transaction do  
    # ...  
  end  
end  
Enter fullscreen mode Exit fullscreen mode

The Fixes

1. Shorten Transactions

  • Move non-critical ops (e.g., logging) outside transactions.
  • Split mega-jobs into idempotent micro-jobs.

2. Use SKIP LOCKED for Queues

For job queues competing for the same rows:

Order.lock("FOR UPDATE SKIP LOCKED").where(status: "pending").first  
Enter fullscreen mode Exit fullscreen mode

3. Nuclear Option: Pessimistic Lock Timeouts

Order.transaction do  
  Order.lock!("FOR UPDATE NOWAIT") # Fails instantly if locked  
rescue ActiveRecord::LockWaitTimeout  
  retry_later  
end  
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

  1. Advisory locks are sharp knives: Never nest them in transactions.
  2. PostgreSQL won’t always save you: Enable log_lock_waits proactively.
  3. Idempotency is king: Design jobs to survive retries.

Final Tip: Test concurrency locally with:

10.times.map { Thread.new { Job.perform_now } }.each(&:join)  
Enter fullscreen mode Exit fullscreen mode

Have you battled phantom deadlocks? Share your war stories below.

Read more? Here’s how Shopify detonates deadlocks.

Top comments (0)