The Legal AGI Lab

Bringing AI agents into the legal fold through integrated AI & legal research

Mission

AI agents are becoming autonomous actors.

We are building agentic law to align them with societal values and enable high-stakes safe deployments.

Research Areas

01

Legal Turing Test

We develop various Turing Tests related to high-stakes real legal workflows. Norm will continue to build out that suite of benchmarks that only a technology company powering a full-service law firm can.

02

As Intelligence Becomes Cheap, Trust is the new Bottleneck

As AI drives the cost of intelligence toward zero, the bottleneck in the economy shifts to assurance of agentic legal systems. How do we ensure that AI systems act in ways that are legal, trustworthy, and enforceable?

03

Legal Infrastructure for the Agentic Economy

Legal systems were built for human actors. As AI agents become economic and societal actors, law is their real-time alignment infrastructure. We investigate the systems by which AI agents will transact, be governed, and held liable.

04

Benchmarking Legal Reasoning

Legal reasoning spans rule extraction, statutory interpretation, analogical reasoning, judgment under ambiguity, and more. Most academic benchmarks do not measure reasoning. We build the evaluation infrastructure for getting to the right answers the right way.

Opus 4.6 Opus 4 GPT-5.4 GPT-5 Sonnet 4.6 Sonnet 4 GPT-5.4 Mini GPT-5 Mini

Even at 90% consistency, frontier models still contradict themselves at scale

Why the latest generation of models is not yet production-ready for high-stakes legal work

The latest generation of frontier models reaches the same conclusion on a legal question roughly 90% of the time. At scale, that gap still produces contradictory answers to the same question every single week.

AI agents exhibit increasing levels of key aspects of intentionality

Domain experts award higher intentionality scores to newer models

At Norm Ai, we are planning for much of the economy to be run by AI agents. In fact, we're building the legal infrastructure to enable that. Under current law, there is no clarity about how AI agents can comply in many consequential areas.

Opus 4.6 Opus 4 GPT-5.4 GPT-5 Sonnet 4.6 Sonnet 4 GPT-5.4 Mini GPT-5 Mini

Opus 4.6 Opus 4 GPT-5.4 GPT-5 Sonnet 4.6 Sonnet 4 GPT-5.4 Mini GPT-5 Mini

Frontier model performance on advanced legal reasoning

Evaluating 8 frontier models across 1,456 questions spanning rule-application and interpretation tasks

Frontier models are getting measurably better at legal reasoning. But they are far from production-ready in high-stakes corporate environments, where perfect accuracy and consistency are table stakes.

How the Lab Compounds

01

Foundations

Legal ontologies, reasoning taxonomies, and the formal structures that underpin evaluation.

02

Benchmarks

Expert-designed evaluations spanning multiple legal domains with blind review protocols.

03

Deployment

Battle-tested agent architectures operating under real compliance and advisory constraints.

04

Horizon

Exploratory research into reasoning capabilities that don't yet have established measurement.

Existing Research

Benchmarking frontier AI agents on advanced legal reasoning tasks — with Norm Law Partners

Evaluating whether legal professionals can distinguish AI-agent work product from expert humans

Generating high leverage training data for distilling domain-specific reasoning into models — using complex, multi-step decision traces

Simulating AI agent deployment in high stakes corporate environments — e.g., contract negotiations and regulatory compliance

Publications

Legal Engineering: A Paradigm Shift in Law — Stanford Law

Directing with AI: Corporate Governance, AI Governance, and the Board — SSRN

Large Language Models as tax attorneys — Royal Society

Artificial intelligence & interspecific law — Science

Large Language Models as fiduciaries — Stanford Law

How Regulators Can Use AI — Vanderbilt Law Review

Aligning AI Agents with Humans Through Law as Information — Stanford Law

Can LLMs Follow Simple Rules? — arXiv

LegalBench: A collaboratively built benchmark for measuring legal reasoning in LLMs — arXiv

A holistic assessment of the reliability of AI — arXiv