Caching & Compression — LeanCTX Docs

lean-ctx uses a multi-layered caching and compression system to minimize token usage. Understanding these layers helps you get the most out of the system.

Session Cache

Every file read via ctx_read is stored in a per-session in-memory cache with a BLAKE3 content hash. When the same file is read again:

Content unchanged: Returns a compact cache-hit stub (~13 tokens) instead of the full file
Content changed: Returns the full new content and updates the cache
Different mode requested: Re-reads with the new mode

Cache Lifecycle

# 1. First read: full content cached
ctx_read src/auth.ts
→ F1=auth.ts 123L [full content...]  (~450 tokens)

# 2. Second read: cache hit
ctx_read src/auth.ts
→ F1=auth.ts cached 2t 123L  (~13 tokens, 97% saved)

# 3. File was edited externally, third read: detects change
ctx_read src/auth.ts
→ F1=auth.ts 125L [new full content...]

# 4. Force bypass cache
ctx_read src/auth.ts fresh=true
→ F1=auth.ts 125L [re-read full content...]

Cache Management

Command	Effect
`ctx_cache action="status"`	Show cached files, sizes, and hit rates
`ctx_cache action="clear"`	Clear entire session cache
`ctx_cache action="invalidate" path="..."`	Invalidate a specific file
`ctx_read fresh=true`	Bypass cache for a single read

The cache auto-clears after 5 minutes of inactivity to prevent stale data.

Predictive Coding v3.6.8

Inspired by Rao & Ballard's Predictive Coding theory from neuroscience, lean-ctx transmits only prediction errors (structural deltas) when a file is re-read in the same session. Instead of resending the full content, only changes since the last read are sent — dramatically reducing token usage for iterative development workflows.

How It Works

First read: Full content is delivered and stored as the "prediction" baseline
Subsequent reads: Only lines that differ from the baseline are sent as a delta
Large refactors: If changes exceed 60% of lines, the system falls back to full output

# First read: full output (450 tokens)
ctx_read src/auth.ts mode="signatures"
→ pub fn authenticate(...) → Result<Token>
  pub fn validate_jwt(...) → bool
  pub fn refresh_token(...) → Token

# After editing: only delta sent (~30 tokens instead of 450)
ctx_read src/auth.ts mode="signatures"
→ [delta:signatures] unchanged:2
  + pub fn revoke_token(token_id: &str) → Result<()>

Token savings: 90-97% for typical edit cycles where 1-3 functions change between reads.

Hebbian Co-Access Cache v3.6.8

Based on Hebb's rule ("neurons that fire together, wire together"), lean-ctx tracks which files are accessed together and strengthens their association over time. This enables intelligent cache eviction that preserves working-set coherence.

Co-Access Tracking

Files accessed within the same burst window (tool calls in rapid succession) build associative strength. When cache pressure requires eviction, files with strong associations to the current working set are preserved, while isolated files with weak associations are evicted first.

Boltzmann Temperature Eviction

Cache eviction uses a Boltzmann distribution from statistical physics. Each cache entry has an "energy" score based on recency, access frequency, association strength, and graph centrality. Memory pressure acts as "temperature":

Low pressure (high T): Lenient eviction — even moderate-value entries survive
High pressure (low T): Deterministic eviction of lowest-energy entries

This prevents both premature eviction of potentially useful entries and cache bloat from unused files.

Predictive Prefetch v3.6.8

Using the Free Energy Principle from neuroscience, lean-ctx learns file access transition patterns and proactively pre-loads files predicted to be needed next. This eliminates cold-read latency for common workflows.

How It Works

Observation: Every file access updates a transition probability matrix
Prediction: After accessing file A, the model predicts B is next (based on learned patterns)
Prefetch: High-confidence predictions trigger background pre-loading
Feedback: Hit/miss tracking adjusts confidence thresholds via Active Inference

The system's "free energy" (prediction error rate) is continuously minimized — when predictions are wrong, confidence thresholds increase; when predictions are correct, thresholds decrease for more aggressive prefetching.

Homeostatic Memory Guard v3.6.8

Inspired by biological homeostasis, lean-ctx maintains system equilibrium through a multi-level graduated response system. Rather than hard limits that cause abrupt failures, the memory guard applies proportional responses to resource pressure:

Level	Trigger	Action	Effect
Nominal	<70% memory	None	Normal operation
Elevated	70-85%	Trim outputs	Compress oversized responses
High	85-93%	Evict entries	Remove low-energy cache entries
Critical	93-95%	Unload indices	Drop search indices (rebuildable)
Emergency	>95%	Emergency drop	Aggressive cleanup to prevent OOM

A feedback loop tracks whether actions were effective. If pressure persists after intervention, the system escalates to the next level. When pressure subsides, it resets to nominal — no permanent degradation occurs.

File References (F1, F2, ...)

Each file read in a session gets a persistent short ID: F1, F2, etc. These IDs survive across the entire session and can be used instead of full paths to save tokens.

F1=auth.ts 123L      → Use "F1" instead of "src/auth/service.ts"
F2=server.rs 262L    → Use "F2" instead of "src/http/server.rs"
F3=db.ts 64L         → Use "F3" instead of "src/database/db.ts"

In TDD mode, even longer identifiers within file content are mapped to short symbols (α1, α2...) for further compression.

Shell Output Compression

ctx_shell applies pattern-based compression to the output of 60+ recognized developer tools. Each tool has a specialized compressor that preserves actionable information while stripping boilerplate.

How It Works

Command Detection: Identifies the tool from the command string (git, npm, docker, etc.)
Pattern Matching: Applies the tool-specific compression pattern
Structured Output: Returns only the essential information with token savings count
Fallback: Unrecognized commands get generic compression (ANSI stripping, empty line removal)

Compression Examples

Command	Raw Output	Compressed	Savings
`git status`	~600 tokens	~80 tokens	87%
`npm install`	~300 tokens	~85 tokens	71%
`npm test`	~2000 tokens	~200 tokens	90%
`docker compose ps`	~400 tokens	~100 tokens	75%
`kubectl get pods`	~800 tokens	~200 tokens	75%

Error Recovery (Tee)

When a command fails (non-zero exit code), the full uncompressed output is automatically saved to ~/.lean-ctx/tee/. Use lean-ctx tee last to recover the full output. This ensures compression never hides error details.

Tool Result Archive

When enabled, the archive system stores full tool results to disk when they exceed a token threshold. The compressed response includes an [ARCHIVE: <id>] reference that the agent can use with ctx_expand to retrieve the full content on demand.

Flow

Tool result exceeds threshold (default: 500 tokens)
Full result stored in ~/.lean-ctx/archive/
Compressed response + archive ID sent to the agent
Agent calls ctx_expand id="..." when full detail is needed
Archived entries auto-expire after TTL (default: 120 minutes)

Configuration

# config.toml
[archive]
enabled = true
threshold_tokens = 500
ttl_minutes = 120

Zero-Loss Archive (ctx_expand)

The archive system stores large tool outputs to disk so they never consume context window space - but unlike simple truncation, nothing is lost. The full content is always available on demand via ctx_expand.

How It Works

A tool result exceeds the configured token threshold
The full output is written to ~/.lean-ctx/archives/ with a unique ID

The model receives a compact hint instead of the full output:

[ARCHIVE:a7f3c2] auth.ts analysis (2,847 tokens) - 14 functions, 3 classes
  Key exports: AuthService, TokenManager, validateJWT
  Use ctx_expand id="a7f3c2" for full content

The agent calls ctx_expand id="a7f3c2" only when full detail is actually needed

Configuration

# config.toml
[archive]
enabled = true
threshold_tokens = 500   # Archive results larger than this
ttl_minutes = 120        # Auto-expire after 2 hours
max_disk_mb = 256        # Disk space limit for archives
mask_secrets = true      # Redact detected secrets before archiving

Option	Default	Description
`enabled`	`true`	Enable/disable the archive system
`threshold_tokens`	`500`	Minimum token count to trigger archiving
`ttl_minutes`	`120`	Time-to-live before auto-expiration
`max_disk_mb`	`256`	Maximum total disk usage for archives
`mask_secrets`	`true`	Redact API keys, tokens, and passwords before writing to disk

Secret Masking

When mask_secrets is enabled, lean-ctx scans archived content for common secret patterns (API keys, JWT tokens, connection strings, private keys) and replaces them with [REDACTED:type] placeholders before writing to disk. This ensures sensitive data never persists in the archive directory.

Cache-Safe Guarantee

lean-ctx provides a cache-safe guarantee: content already present in the model's context window is never mutated or corrupted by lean-ctx operations. This is a critical invariant that prevents subtle bugs from stale or inconsistent data.

What This Means

No silent overwrites: Once a file is cached as F1 with a specific hash, the F1 reference always points to that exact content until explicitly invalidated
Hash-based validation: Every cache hit verifies the BLAKE3 content hash - if the file changed on disk, the cache entry is invalidated and a full re-read occurs
Immutable archive entries: Archived content (ctx_expand IDs) is immutable once written - the same ID always returns the same content
No partial reads: If a read fails mid-stream, no partial content enters the cache

Doctor Cache-Safety Check

lean-ctx doctor includes a cache-safety validation step that verifies:

All cached file hashes match current disk content
No archive entries have been externally modified
Session state is consistent with the file reference table
No orphaned cache entries exist from crashed sessions

lean-ctx doctor
→ Cache safety: ✓ All 12 cached files verified
  Archive integrity: ✓ 8 entries, 0 corrupted
  Session state: ✓ Consistent
  Orphaned entries: ✓ None found

Context Compaction

ctx_compress creates a checkpoint of the current session state for long conversations. It summarizes all cached files, their signatures, and the session context into a compact format that can survive context window truncation.

When to use: After 15-20 tool calls, or when approaching context window limits. lean-ctx auto-triggers checkpoints at configurable intervals.

ctx_compress
→ Session checkpoint created:
  12 files cached (F1-F12)
  3 signatures preserved
  Session context: 2 tasks, 1 workflow
  Checkpoint size: ~800 tokens (vs ~15000 tokens for full state)