DEV Community

Alex Aslam
Alex Aslam

Posted on

Event Sourcing for GDPR: How to Forget Data Without Breaking History

"Your event store remembers everything—but the law demands forgetting."

Event sourcing thrives on immutability. Every change is preserved, enabling:

  • Time-travel debugging ✅
  • Audit trails ✅
  • Rebuilding state at any point ✅

But then comes GDPR’s "Right to Be Forgotten"—requiring you to delete user data on request.

How do you reconcile "never delete" with "must delete"?

Here’s how to comply without sacrificing event-sourcing benefits.


1. The Core Conflict

Problem: Events Are Forever

UserRegistered.new(
  user_id: "u123",
  email: "[email protected]", # PII
  ip_address: "192.168.1.1" # Also PII
)
Enter fullscreen mode Exit fullscreen mode

Deleting this breaks:

  • Projections (e.g., "Total users registered last month")
  • Downstream workflows (e.g., "Welcome email sent to u123")

Solution: Selective Forgetting

Three legal ways to comply:


2. Strategy 1: Pseudonymization

Replace PII with tokens, keeping events valid but untraceable.

Step 1: Tokenize on Ingestion

# Before storing
event = UserRegistered.new(
  user_id: "u123",
  email: encrypt("[email protected]", key: "GDPR_KEY"),
  ip_address: nil # Discard non-essential PII
)
Enter fullscreen mode Exit fullscreen mode

Step 2: Decrypt Only When Needed

# For legal audits (with authorization)
email = decrypt(event.email, key: "GDPR_KEY")
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Keeps event stream intact
  • Reversible for legal requests

Cons:

  • Key management complexity

Best for: Systems needing full replayability.


3. Strategy 2: Event Redaction

Scrub PII from existing events (like a "black box" recorder).

Step 1: Flag Events for Redaction

class UserDataRedaction
  def call(user_id)
    events = EventStore.for(user_id)
    events.each { |e| e.redact! }
    Projections.rebuild!(events) # Update read models
  end
end
Enter fullscreen mode Exit fullscreen mode

Step 2: Apply Redaction Rules

class UserRegistered
  def redact!
    self.email = "[REDACTED]"
    self.ip_address = nil
  end
end
Enter fullscreen mode Exit fullscreen mode

Pros:

  • No cryptographic overhead
  • Explicit compliance

Cons:

  • Breaks replay if business logic depends on PII

Best for: Systems where PII isn’t critical to workflows.


4. Strategy 3: Legal Hold Buckets

Isolate sensitive events in a separate stream.

Implementation

# Store sensitive events separately
LegalHoldEventStore.publish(
  UserRegistered.new(user_id: "u123", email: "[email protected]")
)

# Main event store gets pseudonymized version
EventStore.publish(
  UserRegistered.new(user_id: "u123", email: nil)
)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Granular control over retention
  • Simplifies compliance audits

Cons:

  • Dual-write complexity

Best for: Highly regulated industries (healthcare, finance).


5. Key Considerations

What to Keep

  • Internal IDs (e.g., user_id) → Needed for projections
  • Timestamps → Required for auditing

What to Remove

  • Direct identifiers: Email, phone, IP
  • Indirect identifiers: Geolocation, behavioral data

Testing Compliance

Automate checks:

it "redacts all PII for user123" do
  redacted_events = UserDataRedaction.new.call("user123")
  expect(redacted_events).to all(
    have_attributes(email: "[REDACTED]", ip_address: nil)
  )
end
Enter fullscreen mode Exit fullscreen mode

6. When to Avoid Event Sourcing

🚫 If you can’t pseudonymize early (e.g., third-party event producers)
🚫 For trivial systems where CRUD + soft-delete suffices


"But We’re Not a Bank!"

Even startups face GDPR. Start small:

  1. Pseudonymize one event type (e.g., UserRegistered).
  2. Test redaction workflows.
  3. Expand as compliance demands grow.

Have you tackled GDPR in event-sourced systems? Share your hacks below.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.