Skip to main content

canonical answer

cheapest way to run bernstein

free tier: route every task through ollama against a local model (qwen2.5-coder or deepseek-coder); zero api spend, only your electricity. cheap cloud: claude haiku 4.5 for docs, claude sonnet 4.6 for backend, only escalate to opus on retry; cap with bernstein.cost.budget_limit. mixed: pin the architect role to a paid model, run the rest on ollama. the caching_adapter (src/bernstein/adapters/caching_adapter.py) deduplicates prompt prefixes across spawns so repeat work does not get re-billed. bernstein cost --tail tracks spend live; the drain threshold (default 80 percent) stops new tasks before you blow the cap.

tagscostbudgetlocal

browse the full index at /q or search the blog at /ask.