"Your event store remembers everything—but the law demands forgetting."
Event sourcing thrives on immutability. Every change is preserved, enabling:
- Time-travel debugging ✅
- Audit trails ✅
- Rebuilding state at any point ✅
But then comes GDPR’s "Right to Be Forgotten"—requiring you to delete user data on request.
How do you reconcile "never delete" with "must delete"?
Here’s how to comply without sacrificing event-sourcing benefits.
1. The Core Conflict
Problem: Events Are Forever
UserRegistered.new(
user_id: "u123",
email: "[email protected]", # PII
ip_address: "192.168.1.1" # Also PII
)
Deleting this breaks:
- Projections (e.g., "Total users registered last month")
- Downstream workflows (e.g., "Welcome email sent to u123")
Solution: Selective Forgetting
Three legal ways to comply:
2. Strategy 1: Pseudonymization
Replace PII with tokens, keeping events valid but untraceable.
Step 1: Tokenize on Ingestion
# Before storing
event = UserRegistered.new(
user_id: "u123",
email: encrypt("[email protected]", key: "GDPR_KEY"),
ip_address: nil # Discard non-essential PII
)
Step 2: Decrypt Only When Needed
# For legal audits (with authorization)
email = decrypt(event.email, key: "GDPR_KEY")
Pros:
- Keeps event stream intact
- Reversible for legal requests
Cons:
- Key management complexity
Best for: Systems needing full replayability.
3. Strategy 2: Event Redaction
Scrub PII from existing events (like a "black box" recorder).
Step 1: Flag Events for Redaction
class UserDataRedaction
def call(user_id)
events = EventStore.for(user_id)
events.each { |e| e.redact! }
Projections.rebuild!(events) # Update read models
end
end
Step 2: Apply Redaction Rules
class UserRegistered
def redact!
self.email = "[REDACTED]"
self.ip_address = nil
end
end
Pros:
- No cryptographic overhead
- Explicit compliance
Cons:
- Breaks replay if business logic depends on PII
Best for: Systems where PII isn’t critical to workflows.
4. Strategy 3: Legal Hold Buckets
Isolate sensitive events in a separate stream.
Implementation
# Store sensitive events separately
LegalHoldEventStore.publish(
UserRegistered.new(user_id: "u123", email: "[email protected]")
)
# Main event store gets pseudonymized version
EventStore.publish(
UserRegistered.new(user_id: "u123", email: nil)
)
Pros:
- Granular control over retention
- Simplifies compliance audits
Cons:
- Dual-write complexity
Best for: Highly regulated industries (healthcare, finance).
5. Key Considerations
What to Keep
-
Internal IDs (e.g.,
user_id
) → Needed for projections - Timestamps → Required for auditing
What to Remove
- Direct identifiers: Email, phone, IP
- Indirect identifiers: Geolocation, behavioral data
Testing Compliance
Automate checks:
it "redacts all PII for user123" do
redacted_events = UserDataRedaction.new.call("user123")
expect(redacted_events).to all(
have_attributes(email: "[REDACTED]", ip_address: nil)
)
end
6. When to Avoid Event Sourcing
🚫 If you can’t pseudonymize early (e.g., third-party event producers)
🚫 For trivial systems where CRUD + soft-delete suffices
"But We’re Not a Bank!"
Even startups face GDPR. Start small:
- Pseudonymize one event type (e.g.,
UserRegistered
). - Test redaction workflows.
- Expand as compliance demands grow.
Have you tackled GDPR in event-sourced systems? Share your hacks below.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.