Govern AI coding agents
before they generate the code.
Stop architectural drift before it reaches review. Mneme catches violations at the moment AI generates code — so your standards are enforced, not just documented.
- Block banned frameworks, cross-boundary calls, and superseded decisions before generation
- No re-prompting — constraints apply on every call, every session, across every agent
- Surface violations before the PR, not during it — cut review overhead at the source
Works with direct API integrations, coding assistants, agent frameworks, and managed agent platforms.
AI increased code output.
Review capacity did not.
Coding assistants generate code faster than teams can review it.
But review bandwidth has not increased.
That means more surface area to validate, more architectural drift to catch,
and more governance pushed downstream into PR review.
AI agents do not just create more code. They expose intent debt: undocumented, stale, or unenforced architectural decisions that human reviewers used to catch manually.
The issue is not model quality.
It is that coding agents do not retain your architectural decisions by default.
Adjacent tools solve adjacent problems.
Mneme is not a memory tool, not a rules file, and not a RAG system. Each of those exists for a reason. None of them govern implementation.
The AI coding governance stack
SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. The two layers are complementary.
Five stages. No vector store. No ML.
Almost everyone is competing in layers 01–03. Mneme is layer 05 — the governance layer above the agent runtime. Read the full layer-by-layer breakdown →
Load
Your decisions become durable rules. Engineers edit a JSON file once; Mneme loads it on every call — no re-prompting, no session amnesia.
Retrieve
The right rules reach the agent every time. Deterministic scoring means the same task always surfaces the same constraints — no probabilistic gaps, no missed standards.
Build
Only relevant constraints reach the agent. A targeted packet keeps latency low and prevents rule dilution — the agent gets what applies, not everything you've ever decided.
Inject
Every AI call runs under your standards. The context packet is injected as the system prompt before generation — regardless of agent, IDE, or platform.
Evaluate
Violations surface before review, not during it. Responses are scored against the injected constraints — giving you a blocking gate before code reaches your PR queue.
Every current approach shares a common flaw: none of them enforce decisions before the model writes the code.
| Approach | Why It Breaks at Scale | Mneme HQ |
|---|---|---|
| Rules Files | Static, manually maintained, silently ignored by tools | Deterministic pre-generation enforcement. Structured decisions with a precedence engine, scope-aware retrieval, and hook-level blocking. |
| Prompt Templates | Drift between sessions, omitted by integrators, inconsistent across agents | |
| RAG / Vector Search | Probabilistic retrieval, no authority model, no enforcement | |
| Code Review | Reactive, linear capacity, too late to prevent architectural debt |
Concrete violations, not abstract rules.
Mneme injects your team's architectural decisions into AI-assisted generation. Below is what that catches in practice — the kinds of changes an agent will otherwise ship, because nothing told it not to.
A developer asks Claude Code to add analytics to a checkout route. The agent proposes importing the BigQuery client directly into the frontend service — violating your layered architecture decision that data-platform calls belong in a backend service only.
Mneme detects the cross-boundary call before generation completes. The violation is flagged and blocked — the agent never writes the code, and nothing reaches your PR queue.
Unauthorized framework introduction
Redux pulled into a Zustand-standardized app. Banned ORM imported into a service that already chose another.
Cross-boundary architecture violations
BigQuery client instantiated inside a frontend route. Business logic dropped into a controller. Layering decisions ignored.
ADR supersession conflicts
Celery re-introduced after the team moved to Pub/Sub. Old decisions reappearing because the agent didn't see the new one.
Restricted path modifications
Codegen agent writing to db/prod/migrations/*. Billing agent touching the auth package.
Security policy violations
Raw SQL string concatenation. Mock auth shipped in production paths. Credentials handled outside the approved surface.
Non-approved dependency usage
GPL packages added to a license-restricted repo. Internal-only libraries imported into externally-shipped services.
Three flagship demos.
One worldview.
Each flagship is a different manifestation of the same structural problem: AI accelerates entropy, review does not scale linearly with AI output, drift compounds. Together they sell the category, not a feature. Each ships with a runnable example that drives real Mneme enforcement against scripted diffs — deterministic, no LLM call required.
Architectural drift prevention — the AI SDLC entropy demo
Six-step timeline. An agent proposes reasonable-looking code that violates ADR-001. Three downstream changes amplify the divergence. A reviewer would plausibly miss it. Mneme blocks the first divergence upstream and the system converges instead of forking.
Walk the timeline →Governance continuity across multiple actors
Three actors act sequentially against the same codebase with no shared memory. The compiled corpus is the only thing they share. The architectural invariants stay coherent because they live outside any single actor — in the layer the governance evaluates against.
See the governance trace →All three flagships, supporting enforcement examples, and operational evidence on the demo hub →
Model-agnostic. Agent-agnostic.
Frontier and open-weight models. IDE agents, CLI agents, and orchestration frameworks. The decision corpus is the constant; everything upstream of it can change.
Models
OpenAI, Anthropic, Gemini, Llama, Qwen, DeepSeek, Mistral — direct APIs and OpenAI-compatible endpoints.
Coding agents
Claude Code & Cursor (native). Copilot, Aider, Cline, OpenHands designed-to-support.
Frameworks & CI
LangGraph, CrewAI, AutoGen, OpenAI Agents SDK. GitHub Actions (native), self-hosted runners.
Running in under two minutes.
$ git clone https://github.com/TheoV823/mneme $ cd mneme $ pip install -e .
# Runs the before/after demo without an API key $ python demo.py --dry-run
$ mneme check --memory .mneme/project_memory.json \ --input pr.diff --query "$PR_TITLE" --mode strict
Building the governance layer
for AI-assisted development.
Mneme is evolving from local governance tooling into the governance infrastructure layer for AI-assisted software development. As coding workflows mature, teams will need more than prompt files to maintain architectural consistency at scale.
Common questions.
What is Mneme HQ?
Mneme HQ is the architectural governance layer for AI-assisted development. It compiles architectural intent into enforceable constraints that govern AI coding agents before code is generated. As agent platforms proliferate, governance becomes infrastructure, and Mneme is positioned as the pre-generation governance layer of that stack.
How is Mneme different from Cursor Rules or CLAUDE.md?
Rules files document standards. Mneme enforces them. Cursor Rules and CLAUDE.md are prompt files that describe preferences to the model. Mneme is a governance layer that compiles architectural decisions into enforceable constraints, retrieves them at prompt time based on what the agent is doing, and validates outputs against them.
How is Mneme different from RAG or vector databases?
RAG retrieves knowledge. Mneme operationalizes decisions. RAG systems surface documents that the model may or may not act on. Mneme compiles architectural decisions into structured rules and evaluates AI-generated code against them. There is no embedding model, no vector store, and no probabilistic retrieval in the governance path.
How is Mneme different from observability tools like SentRux?
SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. Pre-generation governance and post-generation observability are complementary layers of the AI coding stack.
Does Mneme require a vector store or ML infrastructure?
No. Mneme uses deterministic, version-controlled decision graphs and tag-scoped retrieval. There is no vector store, no embedding model, and no ML dependency in the governance path. This is a deliberate architectural commitment.
What stacks does Mneme work with?
Mneme works with direct LLM API integrations, IDE coding assistants like Cursor and Claude Code, agent frameworks, managed agent platforms, and internal prompt pipelines. The enterprise framing is a governance control plane for AI coding agents operating within Azure and GitHub-based engineering workflows.
Methodology before metrics.
Mneme's governance benchmark is a deterministic, reproducible regression instrument — not an eval-score leaderboard play. Methodology, deterministic retrieval, and rule-text enforcement are pinned at the Layer 1 freeze (commit e73ff7d) so every change is visible and every PASS is reconstructible.
Same memory and same query produce byte-identical retrieval order on every run. Verdicts are observable facts. We are explicit about what the benchmark proves and what it does not.
Same prompt, same model, different answer. The example below shows what governance changes about agent behavior.