feat: Implement Deterministic Code-Level Guardrails for Agent Safety by Saurav-Gupta-13 · Pull Request #7800 · microsoft/autogen

Saurav-Gupta-13 · 2026-06-04T13:00:49Z

Resolves #7770

The Problem

As reported in #7770, prompt-based safety and system instructions are fundamentally broken. LLMs suffer from context window degradation and jailbreaks, allowing them to bypass prompt rules and execute destructive commands (resulting in massive infrastructure losses).

The Solution

This PR introduces a deterministic Code-Based Governance architecture.

GuardrailInterceptor: A middleware hook injected directly into _code_executor_agent.py that parses the AST and regex footprint of commands before OS execution.
Blast-Radius Protection: If an agent attempts multiple destructive commands in a single turn, the block immediately fails.
Mandatory Dry-Runs: Destructive actions (rm -rf, terraform, aws) suspend the execution thread and require a strict human terminal CONFIRM token.
Persistent SQLite State: If an agent violates safety rules, its GuardrailState is updated to RESTRICTED in a persistent SQLite database. Even if the LLM's memory is wiped or restarted, it remains locked down until a human resets it.

This completely isolates the "Brain" from the "Hands" using a zero-trust model.

…crosoft#7770)

Saurav-Gupta-13 · 2026-06-04T13:03:20Z

Hi team, just wanted to ping that this PR introduces a structural fix for the safety vulnerabilities outlined in #7770. I have tested the blast-radius interceptor and SQLite persistent state locally, and it successfully blocks destructive commands across context resets. Please let me know if you want me to adjust the SQLite schema or the regex patterns during the review process!

Saurav-Gupta-13 · 2026-06-04T13:04:50Z

@microsoft-github-policy-service agree

feat: implement deterministic SQLite guardrail interceptor (Closes mi…

0f80d34

…crosoft#7770)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement Deterministic Code-Level Guardrails for Agent Safety#7800

feat: Implement Deterministic Code-Level Guardrails for Agent Safety#7800
Saurav-Gupta-13 wants to merge 1 commit into
microsoft:mainfrom
Saurav-Gupta-13:feature/deterministic-guardrails

Saurav-Gupta-13 commented Jun 4, 2026

Saurav-Gupta-13 commented Jun 4, 2026

Saurav-Gupta-13 commented Jun 4, 2026

Labels

1 participant

Conversation

Saurav-Gupta-13 commented Jun 4, 2026

The Problem

The Solution

Saurav-Gupta-13 commented Jun 4, 2026

Saurav-Gupta-13 commented Jun 4, 2026

Labels

1 participant