bernstein cost calculator

Question 1

How much does bernstein save on llm bills?

Accepted Answer

It depends on how much of your work is routable to a cheaper model that still passes your tests. The calculator on this page uses a heuristic: 40-80% of tasks are routable, and the cheapest passing model costs about a quarter of the premium model. On a $600/month combined claude+codex+cursor bill, that suggests a band of roughly $180-360/month shifted. Your real saving will be lower if your tests are flaky and higher if you have a lot of mechanical work in the repo. The math is shown step by step on the page so you can audit the inputs against your own situation.

Question 2

How does bernstein decide which model to route a task to?

Accepted Answer

Bernstein runs an epsilon-greedy contextual bandit over a per-task pass-rate history. Each task type (lint fix, test generation, refactor, architecture, tests-and-boilerplate) has its own arm. The bandit prefers the cheapest model whose recent pass rate on that task type is above a configurable threshold, and explores a more expensive model with probability epsilon. The router never blindly picks the cheapest model; it picks the cheapest model that has been passing recent tasks of the same shape. If no cheap model has a credible pass rate yet, the router falls back to the premium model and records the result so the bandit can learn.

Question 3

Are the model prices in this calculator current?

Accepted Answer

The price table is dated 2026-05-09 and lists per-million-token input and output prices for twelve models bernstein routes through, ranging from gemini 2.5 flash-lite at $0.10/m input to claude opus 4.7 at $5/m input. The lineup covers anthropic (opus 4.7, sonnet 4.6, haiku 4.5), openai (gpt-5.4 series including mini and nano), google (gemini 2.5 pro / flash / flash-lite), xai (grok 4.3, grok 4.1 fast), and deepseek (v4 flash). Source links go to the anthropic, openai, google ai, and openrouter pricing pages. Prices update sporadically; if the table date is more than a few months old, treat the saving figures as a rough guide and check the upstream pages.

Question 4

Why is the calculator output a band and not a single number?

Accepted Answer

A single number would be marketing, not honest. The actual saving on a real codebase depends on how many tasks are routable to a cheaper model (varies with task mix), how often the cheaper model passes your tests (varies with test quality and coverage), and how aggressively you tune the bandit explore rate (varies with risk tolerance). The band represents the lower and upper bounds of a heuristic that assumes routing kicks in 40-80% of the time. Picking a single point inside that band is a personal call about how aggressive your local situation can be.

Question 5

Does sponsoring bernstein affect what it routes to?

Accepted Answer

No. Bernstein is on-prem only. It runs on your machine, calls the model apis you configure with your own keys, and writes state to disk you own. Sponsorship through github sponsors funds the operator, not the routing logic. The routing decisions are deterministic python in src/bernstein/scheduler.py — what model wins is a function of the bandit history, the cost table, and your test results, not of who sponsors the project.

family	model	input / 1m	cached input / 1m	output / 1m
google	`gemini-2.5-flash-lite`	$0.10	$0.01	$0.40
deepseek	`deepseek-v4-flash`	$0.14	$0.0028	$0.28
xai	`grok-4.1-fast`	$0.20	n/a	$0.50
openai	`gpt-5.4-nano`	$0.20	$0.02	$1.25
google	`gemini-2.5-flash`	$0.30	$0.03	$2.50
openai	`gpt-5.4-mini`	$0.75	$0.075	$4.50
anthropic	`claude-haiku-4.5`	$1	$0.10	$5
google	`gemini-2.5-pro`	$1.25	$0.125	$10
xai	`grok-4.3`	$1.25	n/a	$2.50
openai	`gpt-5.4`	$2.50	$0.25	$15
anthropic	`claude-sonnet-4.6`	$3	$0.30	$15
anthropic	`claude-opus-4.7`	$5	$0.50	$25

cost

what cheapest-passing-test routing would shift.

show the math

model prices

how the band is computed