Governance becomes infrastructure
once AI SDLCs scale.
Three flagship demos and a benchmark suite. Start with the product loop: a Python agent proposes violating code, Mneme catches it, the agent retries with compliant output. Then see how drift compounds without governance, and how invariants hold across multiple actors. The remediation is a governance layer that lives outside any single coding tool.
Open source · Mneme dogfoods this on its own repo · 117 passing benchmark scenarios
What review-based governance can no longer absorb
The same three pathologies show up everywhere AI assistance is taken seriously: reasonable-looking code that quietly violates architecture, reviewers who cannot keep pace with parallel agents, and drift that propagates faster than humans can detect it. The flagship demos below each isolate one of these and show what changes when a governance layer sits upstream of generation.
Reasonable code, wrong architecture
Agents pattern-match against training data, not against the decisions your team already made. The output is fluent and confidently violates an ADR no one reread this week.
Reviewers cannot scale linearly
One reviewer cannot evaluate the architectural implications of ten parallel agent-produced PRs per hour. The bottleneck is structural, not effort.
Local fixes become systemic drift
Agent A introduces a divergence. Agent B builds on it. Agent C adds infrastructure around it. By the time anyone notices, the architecture has silently forked.
Mneme compiles your Architectural Decision Records (ADRs) into an executable governance corpus. Every agent proposal is evaluated against this corpus before generation, ensuring architectural invariants hold across tools, sessions, and actors.
Three manifestations of the same governance problem
Each flagship is a category-level narrative: what fails without a governance layer, what holds with one, and where the evidence lives. They are designed to be cited, demoed, and walked through end-to-end — not skimmed.
Governed Python agent — from bad code to compliant output
A Python coding agent proposes from MnemeHQ.memory_store import MemoryStore. Locally reasonable — MnemeHQ is the brand name the agent has seen everywhere. Architecturally invalid — it violates ADR-005 (Brand vs Package Namespace Enforcement). Mneme retrieves the compiled decision, blocks the violation, injects the context, and the agent retries with from mneme.memory_store import MemoryStore. This is the product loop in full.
python examples/demo-adr-import.pyArchitectural drift prevention
A six-step timeline. An agent proposes reasonable-looking code that violates ADR-001. Three downstream changes amplify the divergence. A human reviewer would plausibly miss it. Mneme detects the invariant violation upstream, emits an enforcement trace explaining why, and the agent retries within the constraints. The system converges instead of forking.
python examples/architectural-drift/run.pyGovernance continuity across multiple actors
Three agents act sequentially against the same codebase. Agent A introduces a divergence. Agent B builds on it. Agent C tries to remediate. Mneme evaluates the architectural invariants at every step. The point isn't multi-agent runtime sophistication — it's that the governance layer remains coherent across actors, sessions, and retries. As AI execution becomes distributed and persistent, governance becomes the coordination layer.
python examples/multi-agent-governance/run.pyIs the enforcement real? Yes, here are the deterministic verdicts
If the flagships answer why does this category exist, the supporting examples answer is the enforcement actually deterministic. Each one is a single-violation walkthrough: a concrete ADR, the diff an agent would generate, and the exact mneme check verdict.
Storage decision enforcement
JSON-only storage. The agent extends the existing module instead of proposing a Postgres migration.
Read →Dependency policy enforcement
An unapproved dependency (sqlalchemy) is flagged with a structured WARN and a tracked override path.
Repository pattern enforcement
An ADR-004 violation in user.service.ts hard-fails mneme check in CI.
How the governance corpus is built
The flagship demos work because ADR decisions are compiled into an executable corpus before generation happens. The feature below is the foundation layer — it turns existing architectural documentation into the structured decision store the demos run against.
The benchmark, the integrations, the trace format
The flagships and supporting demos sit on top of three operational artifacts: a reproducible scenario benchmark, hook-level integrations with the tools teams already use, and a structured governance trace that drives the CI gate.
Governance Benchmark v1.1
Deterministic scenario suite, structured-output verification, pre-registered thresholds. 18 drift scenarios. 117 passing tests in v0.3.0.
Methodology →Claude Code, Cursor, GitHub Actions, ADR import
Pre-generation hooks for editors, post-generation enforcement in CI, and a corpus importer for ADRs that already exist in docs/adr/.
Governance violations reference
PASS / WARN / FAIL with decision IDs. The same structured trace gates pull requests, feeds dashboards, and drives the retry loop.
Reference →Three lines to wire this up on your own repo
Open source, MIT. Same decision corpus drives the editor hook, the CI gate, and the ADR compiler.
pip install mneme && mneme init && mneme check
Want help wiring it up? Request a pilot — we'll compile your ADRs and walk the enforcement trace on your own repo.
Common questions about the demo structure
Why three flagship demos instead of a single feature tour?
What is the difference between flagship and supporting demos?
Are the runnable examples real or scripted?
examples/demo-adr-import.py in mneme-project-memory/) that imports, applies, and enforces end-to-end. The drift and multi-agent governance flagships ship lightweight reproducible scripts that simulate the orchestration; the enforcement and conflict-detection steps are the real Mneme pipeline. The point is to demonstrate governance coherence, not to claim a multi-agent runtime.How is this different from CLAUDE.md or .cursor/rules?
CLAUDE.md and .cursor/rules are static text files the model is asked to respect. Mneme is a structured decision store with a precedence engine and hook-level enforcement, so compliance is not probabilistic. The full breakdown is in why prompt memory fails at scale; the head-to-head comparison is at Mneme vs Cursor Rules.