cost | Bernstein

your bill, audited

what cheapest-passing-test routing would shift.

enter your last month's spend on three of the bills bernstein users typically pay. the calculation below uses a documented heuristic, hardcoded model prices, and shows every step so you can audit it.

monthly spend (usd)

claude (anthropic api / max plan)

codex / openai api

cursor

estimated band [?]

$180–$360 /mo

the calculation suggests bernstein could shift roughly 30%–60% of your llm spend. your real saving will vary because it depends on the test-pass rate of the cheaper models on your specific tasks.

show the math

total monthly llm spend$400 + $200 + $0 = $600
fraction of tasks routable to a cheaper model that still passes tests40%–80% (heuristic)
cost ratio: cheapest-passing model vs current premium model~25% of original (see model-prices table below)
saving = total × routable% × (1 − cost ratio)$180–$360 /mo

the routable% is a heuristic, not a measured number. real saving depends on whether the cheaper models pass your project's tests on each task. on a repo where tests are flaky or coverage is low, routing falls back to the premium model and the band shrinks toward zero. on a repo with tight tests and a lot of mechanical work, the band shifts higher than 80%.

your last month's bill was $600.

sponsoring at $25/mo is 4% of $600.

bernstein keeps routing the cheapest model that passes tests.

→ github.com/sponsors/chernistry

model prices

hardcoded snapshot from 2026-05-09. cheapest first. prices are usd per 1m tokens.

family	model	input / 1m	cached input / 1m	output / 1m
google	`gemini-2.5-flash-lite`	$0.10	$0.01	$0.40
deepseek	`deepseek-v4-flash`	$0.14	$0.0028	$0.28
xai	`grok-4.1-fast`	$0.20	n/a	$0.50
openai	`gpt-5.4-nano`	$0.20	$0.02	$1.25
google	`gemini-2.5-flash`	$0.30	$0.03	$2.50
openai	`gpt-5.4-mini`	$0.75	$0.075	$4.50
anthropic	`claude-haiku-4.5`	$1	$0.10	$5
google	`gemini-2.5-pro`	$1.25	$0.125	$10
xai	`grok-4.3`	$1.25	n/a	$2.50
openai	`gpt-5.4`	$2.50	$0.25	$15
anthropic	`claude-sonnet-4.6`	$3	$0.30	$15
anthropic	`claude-opus-4.7`	$5	$0.50	$25

sources: claude.com/pricing, platform.openai.com/docs/pricing, ai.google.dev/gemini-api/docs/pricing, openrouter.ai/models.

prices update sporadically. the snapshot date above is the last manual update. since late 2025 the cadence has picked up: anthropic, openai, and google have all shipped a new tier within the last six months, and absolute numbers can shift by 20-40% between revisions even when relative ordering survives. check the source links before quoting these in a procurement conversation. cached input prices assume a 10% multiplier on the base input rate where the provider supports prompt caching.

how the band is computed

the calculator multiplies your total monthly llm spend by the fraction of tasks bernstein could route to a cheaper model (40-80%, a heuristic) and by the cost gap between the premium and cheap models (about 75% saving on the routable tasks at typical claude opus 4.7 vs gemini 2.5 flash-lite ratios — opus is $5/m input, flash-lite is $0.10/m input, so swapping one for the other on a routable task saves roughly 98% of input cost; the 75% blended figure is the band-weighted average across mixed task types). the result is a band, not a point. the math is shown step by step in the calculator block so you can substitute your own assumptions if the heuristic does not fit your repo.

the routable fraction is the load-bearing assumption. on a codebase with flaky tests it skews toward zero — the cheaper models do not pass, the bandit falls back to the premium model, and the saving collapses. on a codebase with tight tests and a lot of mechanical work (typed refactors, test scaffolding, lint fixes) it skews higher than the upper bound. neither extreme is a promise.

if you want to verify any of this on your own repo before you sponsor, install bernstein with pipx install bernstein and check the cost column in the run report after one parallel run. the numbers there are real, not heuristic.

frequently asked

How much does Bernstein save on LLM bills?

It depends on how much of your work is routable to a cheaper model that still passes your tests. The calculator on this page uses a heuristic: 40-80% of tasks are routable, and the cheapest passing model costs about a quarter of the premium model. On a $600/month combined Claude + Codex + Cursor bill, that suggests a band of roughly $180-360/month shifted. Real saving will be lower if your tests are flaky and higher if you have a lot of mechanical work in the repo.

How does Bernstein decide which model to route a task to?

Bernstein runs an epsilon-greedy contextual bandit over a per-task pass-rate history. Each task type (lint fix, test generation, refactor, architecture, tests-and-boilerplate) has its own arm. The bandit prefers the cheapest model whose recent pass rate on that task type is above a configurable threshold, and explores a more expensive model with probability epsilon.

Why is the calculator output a band and not a single number?

A single number would be marketing, not honest. The actual saving depends on how many tasks route to a cheaper model (varies with task mix), how often the cheaper model passes your tests (varies with test quality), and how aggressively you tune the bandit explore rate. The band is the lower and upper bounds of a heuristic that assumes routing kicks in 40-80% of the time.

Does sponsoring Bernstein affect what it routes to?

No. Bernstein is on-prem only. It runs on your machine, calls the model APIs you configure with your own keys, and writes state to disk you own. Sponsorship funds the operator, not the routing logic. Routing decisions are deterministic Python in src/bernstein/scheduler.py - what model wins is a function of the bandit history, the cost table, and your test results.