Mneme HQ is the architectural governance layer for AI-assisted development. It compiles your team's architectural intent into enforceable constraints that govern AI coding agents at the pre-generation stage, before architectural drift reaches review. Rules files document standards. Memory tools recall context. RAG retrieves knowledge. Mneme governs implementation.

Architectural governance for AI-assisted development

Govern AI coding agents
before they generate the code.

Stop architectural drift before it reaches review. Mneme catches violations at the moment AI generates code — so your standards are enforced, not just documented.

Block banned frameworks, cross-boundary calls, and superseded decisions before generation
No re-prompting — constraints apply on every call, every session, across every agent
Surface violations before the PR, not during it — cut review overhead at the source

Works with direct API integrations, coding assistants, agent frameworks, and managed agent platforms.

Request pilot access Walk the flagship demos View on GitHub

Works with

Claude Code Cursor GitHub Actions GitHub Copilot Windsurf OpenAI Aider + more →

The bottleneck

AI increased code output.
Review capacity did not.

Coding assistants generate code faster than teams can review it.
But review bandwidth has not increased.

That means more surface area to validate, more architectural drift to catch, and more governance pushed downstream into PR review.

AI agents do not just create more code. They expose intent debt: undocumented, stale, or unenforced architectural decisions that human reviewers used to catch manually.

The issue is not model quality. It is that coding agents do not retain your architectural decisions by default.

Throughput vs. review capacity · 2023–2026

More PR Surface Area

AI increases the amount of code reviewers must validate per change.

Reactive Governance

Architectural violations are caught after generation, during review.

Session Amnesia

Coding agents forget prior decisions unless re-prompted every time.

Where Mneme sits

Adjacent tools solve adjacent problems.

Mneme is not a memory tool, not a rules file, and not a RAG system. Each of those exists for a reason. None of them govern implementation.

Rules files document standards.

Mneme enforces them.

Memory tools recall context.

Mneme governs implementation.

RAG retrieves knowledge.

Mneme operationalizes decisions.

The AI coding governance stack

Pre-generation governance

Mneme. Compiles architectural intent into enforceable constraints before the agent generates code.

Generation and runtime

Agent frameworks and runtime harnesses. Cursor, Claude Code, agent platforms.

Post-generation observability

Tools like SentRux. Detect violations after the agent has acted.

SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. The two layers are complementary.

How it works

Five stages. No vector store. No ML.

Where Mneme sits · generative AI software engineering stack

07 Human oversight review · approvals

06 Validation & eval benchmarks · tracing

05 Governance & control Mneme HQ

04 Tooling & execution MCP · CI/CD · shells

03 Agent runtime LangGraph · Claude Code

02 Context & retrieval RAG · vectors · memory

01 Foundation models OpenAI · Anthropic · Gemini

Almost everyone is competing in layers 01–03. Mneme is layer 05 — the governance layer above the agent runtime. Read the full layer-by-layer breakdown →

project_memory.json → MemoryStore → Retriever → ContextBuilder → LLMAdapter → Evaluator

Load

Your decisions become durable rules. Engineers edit a JSON file once; Mneme loads it on every call — no re-prompting, no session amnesia.

Retrieve

The right rules reach the agent every time. Deterministic scoring means the same task always surfaces the same constraints — no probabilistic gaps, no missed standards.

Build

Only relevant constraints reach the agent. A targeted packet keeps latency low and prevents rule dilution — the agent gets what applies, not everything you've ever decided.

Inject

Every AI call runs under your standards. The context packet is injected as the system prompt before generation — regardless of agent, IDE, or platform.

Evaluate

Violations surface before review, not during it. Responses are scored against the injected constraints — giving you a blocking gate before code reaches your PR queue.

Why Existing Approaches Fail

Every current approach shares a common flaw: none of them enforce decisions before the model writes the code.

Approach	Why It Breaks at Scale	Mneme HQ
Rules Files	Static, manually maintained, silently ignored by tools	Deterministic pre-generation enforcement. Structured decisions with a precedence engine, scope-aware retrieval, and hook-level blocking.
Prompt Templates	Drift between sessions, omitted by integrators, inconsistent across agents
RAG / Vector Search	Probabilistic retrieval, no authority model, no enforcement
Code Review	Reactive, linear capacity, too late to prevent architectural debt

Why RAG fails → Why code review doesn't scale →

What Mneme prevents

Concrete violations, not abstract rules.

Mneme injects your team's architectural decisions into AI-assisted generation. Below is what that catches in practice — the kinds of changes an agent will otherwise ship, because nothing told it not to.

Example scenario

A developer asks Claude Code to add analytics to a checkout route. The agent proposes importing the BigQuery client directly into the frontend service — violating your layered architecture decision that data-platform calls belong in a backend service only.

Mneme detects the cross-boundary call before generation completes. The violation is flagged and blocked — the agent never writes the code, and nothing reaches your PR queue.

Unauthorized framework introduction

Redux pulled into a Zustand-standardized app. Banned ORM imported into a service that already chose another.

Cross-boundary architecture violations

BigQuery client instantiated inside a frontend route. Business logic dropped into a controller. Layering decisions ignored.

ADR supersession conflicts

Celery re-introduced after the team moved to Pub/Sub. Old decisions reappearing because the agent didn't see the new one.

Restricted path modifications

Codegen agent writing to db/prod/migrations/*. Billing agent touching the auth package.

Security policy violations

Raw SQL string concatenation. Mock auth shipped in production paths. Credentials handled outside the approved surface.

Non-approved dependency usage

GPL packages added to a license-restricted repo. Internal-only libraries imported into externally-shipped services.

See all twelve examples across five governance categories →

Operational proof

Three flagship demos.
One worldview.

Each flagship is a different manifestation of the same structural problem: AI accelerates entropy, review does not scale linearly with AI output, drift compounds. Together they sell the category, not a feature. Each ships with a runnable example that drives real Mneme enforcement against scripted diffs — deterministic, no LLM call required.

Flagship 01 · Centerpiece Runnable Mneme dogfoods this

The ADR compiler — turn architectural decisions into infrastructure

Most teams already have ADRs. They sit in docs/adr/ and are quietly ignored by every AI coding agent. The compiler reads the same files, parses an optional ## Constraints section, and emits enforceable, precedence-aware decisions that govern generation and CI. No rewrite.

Walk through the compiler →

Flagship 02 Runnable

Architectural drift prevention — the AI SDLC entropy demo

Six-step timeline. An agent proposes reasonable-looking code that violates ADR-001. Three downstream changes amplify the divergence. A reviewer would plausibly miss it. Mneme blocks the first divergence upstream and the system converges instead of forking.

Walk the timeline →

Flagship 03 Forward-looking Runnable

Governance continuity across multiple actors

Three actors act sequentially against the same codebase with no shared memory. The compiled corpus is the only thing they share. The architectural invariants stay coherent because they live outside any single actor — in the layer the governance evaluates against.

See the governance trace →

All three flagships, supporting enforcement examples, and operational evidence on the demo hub →

Works with

Model-agnostic. Agent-agnostic.

Frontier and open-weight models. IDE agents, CLI agents, and orchestration frameworks. The decision corpus is the constant; everything upstream of it can change.

Models

OpenAI, Anthropic, Gemini, Llama, Qwen, DeepSeek, Mistral — direct APIs and OpenAI-compatible endpoints.

Coding agents

Claude Code & Cursor (native). Copilot, Aider, Cline, OpenHands designed-to-support.

Frameworks & CI

LangGraph, CrewAI, AutoGen, OpenAI Agents SDK. GitHub Actions (native), self-hosted runners.

Full compatibility surface → · Native integrations →

Get started

Running in under two minutes.

install

$ git clone https://github.com/TheoV823/mneme
$ cd mneme
$ pip install -e .

run demo

# Runs the before/after demo without an API key
$ python demo.py --dry-run

governance gate (CI)

$ mneme check --memory .mneme/project_memory.json \
    --input pr.diff --query "$PR_TITLE" --mode strict

Full CLI reference → · Run the benchmark → · Python API →

Vision & roadmap

Building the governance layer
for AI-assisted development.

Mneme is evolving from local governance tooling into the governance infrastructure layer for AI-assisted software development. As coding workflows mature, teams will need more than prompt files to maintain architectural consistency at scale.

Phase 1 — Current

OSS Developer Wedge

Architectural governance for individual developers and early engineering adopters.

Phase 2

Team Governance Layer

Shared policy and decision stores for teams adopting AI-assisted development.

Phase 3

Agent Platform Integrations

Governance for enterprise agent workflows and managed coding platforms.

Phase 4

Governance Infrastructure

Policy-as-code enforcement and drift analytics across engineering organizations.

See full roadmap →

Frequently asked

Common questions.

What is Mneme HQ?

Mneme HQ is the architectural governance layer for AI-assisted development. It compiles architectural intent into enforceable constraints that govern AI coding agents before code is generated. As agent platforms proliferate, governance becomes infrastructure, and Mneme is positioned as the pre-generation governance layer of that stack.

How is Mneme different from Cursor Rules or CLAUDE.md?

Rules files document standards. Mneme enforces them. Cursor Rules and CLAUDE.md are prompt files that describe preferences to the model. Mneme is a governance layer that compiles architectural decisions into enforceable constraints, retrieves them at prompt time based on what the agent is doing, and validates outputs against them.

How is Mneme different from RAG or vector databases?

RAG retrieves knowledge. Mneme operationalizes decisions. RAG systems surface documents that the model may or may not act on. Mneme compiles architectural decisions into structured rules and evaluates AI-generated code against them. There is no embedding model, no vector store, and no probabilistic retrieval in the governance path.

How is Mneme different from observability tools like SentRux?

SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. Pre-generation governance and post-generation observability are complementary layers of the AI coding stack.

Does Mneme require a vector store or ML infrastructure?

No. Mneme uses deterministic, version-controlled decision graphs and tag-scoped retrieval. There is no vector store, no embedding model, and no ML dependency in the governance path. This is a deliberate architectural commitment.

What stacks does Mneme work with?

Mneme works with direct LLM API integrations, IDE coding assistants like Cursor and Claude Code, agent frameworks, managed agent platforms, and internal prompt pipelines. The enterprise framing is a governance control plane for AI coding agents operating within Azure and GitHub-based engineering workflows.

Benchmark

Methodology before metrics.

Mneme's governance benchmark is a deterministic, reproducible regression instrument — not an eval-score leaderboard play. Methodology, deterministic retrieval, and rule-text enforcement are pinned at the Layer 1 freeze (commit e73ff7d) so every change is visible and every PASS is reconstructible.

deterministic retrieval two-layer scoring structured-fixture verification recall@1 reported, never optimized

Same memory and same query produce byte-identical retrieval order on every run. Verdicts are observable facts. We are explicit about what the benchmark proves and what it does not.

Why the benchmark is intentionally constrained → Full methodology spec →

Illustrative example

Same prompt, same model, different answer. The example below shows what governance changes about agent behavior.

Without Mneme HQ

Prompt: "Should we rebuild the retrieval system from scratch with embeddings?"

We could consider rebuilding the system with a vector database and embedding model. This would improve semantic matching and scale better long-term. Sentence-transformers is a good option for generating embeddings...

With Mneme HQ

Prompt: "Should we rebuild the retrieval system from scratch with embeddings?"

Do not rebuild from scratch. The project has an explicit rule to extend current infrastructure before rebuilding (rule-001). Keyword scoring was chosen intentionally — it is deterministic, has no ML dependencies, and is easy to debug. The team already declined adding sentence-transformers in v1. Extend the current retriever instead.

Without Mneme HQ — enforcer result

Severity: FAIL

Decision: mneme_retrieval_deterministic

Violation: "embeddings" + "vector database"

Mode: strict

With Mneme HQ — enforcer result

Severity: PASS

Retrieved decision: mneme_retrieval_deterministic

Enhanced response: no violations detected

Walk all three flagship demos →

Govern AI coding agents before they generate the code.

AI increased code output.Review capacity did not.