Documentation

Caching & Compression

How lean-ctx caches files, compresses shell output, and manages the context window to achieve 74-99% token savings.

lean-ctx uses a multi-layered caching and compression system to minimize token usage. Understanding these layers helps you get the most out of the system.

Session Cache

Every file read via ctx_read is stored in a per-session in-memory cache with a BLAKE3 content hash. When the same file is read again:

  • Content unchanged: Returns a compact cache-hit stub (~13 tokens) instead of the full file
  • Content changed: Returns the full new content and updates the cache
  • Different mode requested: Re-reads with the new mode

Cache Lifecycle

# 1. First read: full content cached
ctx_read src/auth.ts
→ F1=auth.ts 123L [full content...]  (~450 tokens)

# 2. Second read: cache hit
ctx_read src/auth.ts
→ F1=auth.ts cached 2t 123L  (~13 tokens, 97% saved)

# 3. File was edited externally, third read: detects change
ctx_read src/auth.ts
→ F1=auth.ts 125L [new full content...]

# 4. Force bypass cache
ctx_read src/auth.ts fresh=true
→ F1=auth.ts 125L [re-read full content...]

Cache Management

CommandEffect
ctx_cache action="status"Show cached files, sizes, and hit rates
ctx_cache action="clear"Clear entire session cache
ctx_cache action="invalidate" path="..."Invalidate a specific file
ctx_read fresh=trueBypass cache for a single read

The cache auto-clears after 5 minutes of inactivity to prevent stale data.

Predictive Coding v3.6.8

Inspired by Rao & Ballard's Predictive Coding theory from neuroscience, lean-ctx transmits only prediction errors (structural deltas) when a file is re-read in the same session. Instead of resending the full content, only changes since the last read are sent — dramatically reducing token usage for iterative development workflows.

How It Works

  1. First read: Full content is delivered and stored as the "prediction" baseline
  2. Subsequent reads: Only lines that differ from the baseline are sent as a delta
  3. Large refactors: If changes exceed 60% of lines, the system falls back to full output
# First read: full output (450 tokens)
ctx_read src/auth.ts mode="signatures"
→ pub fn authenticate(...) → Result<Token>
  pub fn validate_jwt(...) → bool
  pub fn refresh_token(...) → Token

# After editing: only delta sent (~30 tokens instead of 450)
ctx_read src/auth.ts mode="signatures"
→ [delta:signatures] unchanged:2
  + pub fn revoke_token(token_id: &str) → Result<()>

Token savings: 90-97% for typical edit cycles where 1-3 functions change between reads.

Hebbian Co-Access Cache v3.6.8

Based on Hebb's rule ("neurons that fire together, wire together"), lean-ctx tracks which files are accessed together and strengthens their association over time. This enables intelligent cache eviction that preserves working-set coherence.

Co-Access Tracking

Files accessed within the same burst window (tool calls in rapid succession) build associative strength. When cache pressure requires eviction, files with strong associations to the current working set are preserved, while isolated files with weak associations are evicted first.

Boltzmann Temperature Eviction

Cache eviction uses a Boltzmann distribution from statistical physics. Each cache entry has an "energy" score based on recency, access frequency, association strength, and graph centrality. Memory pressure acts as "temperature":

  • Low pressure (high T): Lenient eviction — even moderate-value entries survive
  • High pressure (low T): Deterministic eviction of lowest-energy entries

This prevents both premature eviction of potentially useful entries and cache bloat from unused files.

Predictive Prefetch v3.6.8

Using the Free Energy Principle from neuroscience, lean-ctx learns file access transition patterns and proactively pre-loads files predicted to be needed next. This eliminates cold-read latency for common workflows.

How It Works

  1. Observation: Every file access updates a transition probability matrix
  2. Prediction: After accessing file A, the model predicts B is next (based on learned patterns)
  3. Prefetch: High-confidence predictions trigger background pre-loading
  4. Feedback: Hit/miss tracking adjusts confidence thresholds via Active Inference

The system's "free energy" (prediction error rate) is continuously minimized — when predictions are wrong, confidence thresholds increase; when predictions are correct, thresholds decrease for more aggressive prefetching.

Homeostatic Memory Guard v3.6.8

Inspired by biological homeostasis, lean-ctx maintains system equilibrium through a multi-level graduated response system. Rather than hard limits that cause abrupt failures, the memory guard applies proportional responses to resource pressure:

LevelTriggerActionEffect
Nominal<70% memoryNoneNormal operation
Elevated70-85%Trim outputsCompress oversized responses
High85-93%Evict entriesRemove low-energy cache entries
Critical93-95%Unload indicesDrop search indices (rebuildable)
Emergency>95%Emergency dropAggressive cleanup to prevent OOM

A feedback loop tracks whether actions were effective. If pressure persists after intervention, the system escalates to the next level. When pressure subsides, it resets to nominal — no permanent degradation occurs.

File References (F1, F2, ...)

Each file read in a session gets a persistent short ID: F1, F2, etc. These IDs survive across the entire session and can be used instead of full paths to save tokens.

F1=auth.ts 123L      → Use "F1" instead of "src/auth/service.ts"
F2=server.rs 262L    → Use "F2" instead of "src/http/server.rs"
F3=db.ts 64L         → Use "F3" instead of "src/database/db.ts"

In TDD mode, even longer identifiers within file content are mapped to short symbols (α1, α2...) for further compression.

Shell Output Compression

ctx_shell applies pattern-based compression to the output of 60+ recognized developer tools. Each tool has a specialized compressor that preserves actionable information while stripping boilerplate.

How It Works

  1. Command Detection: Identifies the tool from the command string (git, npm, docker, etc.)
  2. Pattern Matching: Applies the tool-specific compression pattern
  3. Structured Output: Returns only the essential information with token savings count
  4. Fallback: Unrecognized commands get generic compression (ANSI stripping, empty line removal)

Compression Examples

CommandRaw OutputCompressedSavings
git status~600 tokens~80 tokens87%
npm install~300 tokens~85 tokens71%
npm test~2000 tokens~200 tokens90%
docker compose ps~400 tokens~100 tokens75%
kubectl get pods~800 tokens~200 tokens75%

Error Recovery (Tee)

When a command fails (non-zero exit code), the full uncompressed output is automatically saved to ~/.lean-ctx/tee/. Use lean-ctx tee last to recover the full output. This ensures compression never hides error details.

Tool Result Archive

When enabled, the archive system stores full tool results to disk when they exceed a token threshold. The compressed response includes an [ARCHIVE: <id>] reference that the agent can use with ctx_expand to retrieve the full content on demand.

Flow

  1. Tool result exceeds threshold (default: 500 tokens)
  2. Full result stored in ~/.lean-ctx/archive/
  3. Compressed response + archive ID sent to the agent
  4. Agent calls ctx_expand id="..." when full detail is needed
  5. Archived entries auto-expire after TTL (default: 120 minutes)

Configuration

# config.toml
[archive]
enabled = true
threshold_tokens = 500
ttl_minutes = 120

Zero-Loss Archive (ctx_expand)

The archive system stores large tool outputs to disk so they never consume context window space - but unlike simple truncation, nothing is lost. The full content is always available on demand via ctx_expand.

How It Works

  1. A tool result exceeds the configured token threshold
  2. The full output is written to ~/.lean-ctx/archives/ with a unique ID
  3. The model receives a compact hint instead of the full output:
    [ARCHIVE:a7f3c2] auth.ts analysis (2,847 tokens) - 14 functions, 3 classes
      Key exports: AuthService, TokenManager, validateJWT
      Use ctx_expand id="a7f3c2" for full content
  4. The agent calls ctx_expand id="a7f3c2" only when full detail is actually needed

Configuration

# config.toml
[archive]
enabled = true
threshold_tokens = 500   # Archive results larger than this
ttl_minutes = 120        # Auto-expire after 2 hours
max_disk_mb = 256        # Disk space limit for archives
mask_secrets = true      # Redact detected secrets before archiving
OptionDefaultDescription
enabledtrueEnable/disable the archive system
threshold_tokens500Minimum token count to trigger archiving
ttl_minutes120Time-to-live before auto-expiration
max_disk_mb256Maximum total disk usage for archives
mask_secretstrueRedact API keys, tokens, and passwords before writing to disk

Secret Masking

When mask_secrets is enabled, lean-ctx scans archived content for common secret patterns (API keys, JWT tokens, connection strings, private keys) and replaces them with [REDACTED:type] placeholders before writing to disk. This ensures sensitive data never persists in the archive directory.

Cache-Safe Guarantee

lean-ctx provides a cache-safe guarantee: content already present in the model's context window is never mutated or corrupted by lean-ctx operations. This is a critical invariant that prevents subtle bugs from stale or inconsistent data.

What This Means

  • No silent overwrites: Once a file is cached as F1 with a specific hash, the F1 reference always points to that exact content until explicitly invalidated
  • Hash-based validation: Every cache hit verifies the BLAKE3 content hash - if the file changed on disk, the cache entry is invalidated and a full re-read occurs
  • Immutable archive entries: Archived content (ctx_expand IDs) is immutable once written - the same ID always returns the same content
  • No partial reads: If a read fails mid-stream, no partial content enters the cache

Doctor Cache-Safety Check

lean-ctx doctor includes a cache-safety validation step that verifies:

  • All cached file hashes match current disk content
  • No archive entries have been externally modified
  • Session state is consistent with the file reference table
  • No orphaned cache entries exist from crashed sessions
lean-ctx doctor
→ Cache safety: ✓ All 12 cached files verified
  Archive integrity: ✓ 8 entries, 0 corrupted
  Session state: ✓ Consistent
  Orphaned entries: ✓ None found

Context Compaction

ctx_compress creates a checkpoint of the current session state for long conversations. It summarizes all cached files, their signatures, and the session context into a compact format that can survive context window truncation.

When to use: After 15-20 tool calls, or when approaching context window limits. lean-ctx auto-triggers checkpoints at configurable intervals.

ctx_compress
→ Session checkpoint created:
  12 files cached (F1-F12)
  3 signatures preserved
  Session context: 2 tasks, 1 workflow
  Checkpoint size: ~800 tokens (vs ~15000 tokens for full state)