"Replaying 10,000 events just to check a balance? There’s a better way."
Event sourcing gives you time-travel superpowers—until you realize:
- Rebuilding an aggregate from 1M events takes minutes.
- Your read API times out waiting for a replay.
- Testing becomes painfully slow.
Snapshots fix this by periodically caching state, so you only replay recent events.
Here’s how to implement them without breaking event-sourcing principles.
1. When Do You Need Snapshots?
Problem Signs
-
Slow reads:
GET /users/123
triggers a 5-second replay. - High memory usage: Event processing OOMs your pods.
- Frequent replays: The same aggregate is rebuilt repeatedly.
Rule of Thumb
Snapshots make sense when replay time > 100ms for hot aggregates.
2. Snapshot Strategies
Strategy 1: Scheduled Snapshots
How it works:
- Every N events (e.g., 1,000), save the current state.
- On replay, load the latest snapshot + newer events.
Implementation:
class AccountSnapshot
def self.take(account_id)
events = EventStore.up_to(account_id, limit: 1000)
state = AccountProjection.new(events).state
Snapshot.create!(aggregate_id: account_id, state: state, version: 1000)
end
end
# On replay
snapshot = Snapshot.for(account_id)
new_events = EventStore.after(account_id, snapshot.version)
current_state = AccountProjection.apply(snapshot.state, new_events)
Best for:
- Predictable workloads (e.g., daily batch updates).
Strategy 2: On-Demand Snapshots
How it works:
- Cache state after first replay (like memoization).
- Subsequent requests use the cached snapshot.
Implementation:
class AccountProjection
def initialize(account_id)
@snapshot = Rails.cache.fetch("snapshot/#{account_id}") do
events = EventStore.for(account_id)
{ state: build_state(events), version: events.last.version }
end
@new_events = EventStore.after(account_id, @snapshot[:version])
end
def current_balance
apply_new_events(@snapshot[:state], @new_events).balance
end
end
Best for:
- Read-heavy APIs.
Strategy 3: Incremental Snapshots
How it works:
- Store deltas (changes since last snapshot).
- Rebuild by applying deltas to the last full snapshot.
Implementation:
class DeltaSnapshot
def self.save(account_id, new_events)
deltas = new_events.map { |e| { event_type: e.type, data: e.changes } }
DeltaStore.append(account_id, deltas)
end
def self.load(account_id)
base_snapshot = Snapshot.for(account_id)
deltas = DeltaStore.since(account_id, base_snapshot.version)
deltas.reduce(base_snapshot.state) { |state, delta| apply_delta(state, delta) }
end
end
Best for:
- High-write systems (e.g., trading platforms).
3. Snapshot Pitfalls
Pitfall 1: Stale Snapshots
What happens:
- A snapshot is taken at version 100.
- Events 101-110 are lost (disk failure).
- Replay from snapshot skips lost events.
Fix:
- Checksum snapshots (e.g., SHA of all events up to the snapshot).
- Reject mismatches and rebuild from scratch.
Pitfall 2: Over-Snapshotting
What happens:
- Snapshots every 10 events → storage bloat.
- Writes slow down due to snapshot overhead.
Fix:
- Adaptive snapshotting: Take snapshots less frequently for cold aggregates.
Pitfall 3: Thread Safety
What happens:
- Two threads take snapshots at the same time.
- One overwrites the other → data corruption.
Fix:
- Optimistic locking:
Snapshot.where(aggregate_id: account_id, version: old_version)
.update_all(state: new_state, version: new_version)
4. When to Avoid Snapshots
🚫 Small event streams (< 1K events per aggregate)
🚫 Real-time systems (sub-millisecond replays)
🚫 Immutable infrastructure (no persistent storage)
"But Snapshots Feel Like Cheating!"
They’re not. Event sourcing isn’t about purity—it’s about practical replayability.
Have you implemented snapshots? Share your lessons below.
Top comments (0)