Most agent frameworks send everything to the same model. Renaming a variable, designing an auth system: same prompt budget, same model either way. Bernstein's router learns which task type each model handles. Our own self-development runs roughly halved their per-session bill once the bandit warmed up.
the uniform problem
Routing a docstring task to Opus instead of Haiku is, at current pricing, about 30x overspend. Forty tasks a session, that's a real number - the kind that turns "side project" into a line item your spouse asks about.
three layers
Heuristic. Every task carries a complexity (low/medium/high) and a role (backend, qa, security, docs…). A rule-based classifier gives an opening guess: low → Haiku/Sonnet, high → Opus.
Bandit. Epsilon-greedy, per-role reward estimates. Exploit the best-known model 80% of the time, explore alternatives 20%. Reward = task completed, tests green, no retries.
# Simplified selection
candidates = ["sonnet", "opus"] if task.complexity == "high" else CASCADE
selected = bandit.select(role=task.role, candidate_models=candidates)The cascade is cheapest-to-strongest. High-complexity tasks never get sent to Haiku. Wasting an agent's hour on the wrong model isn't actually cheap.
Effectiveness seeding. Cold-start data lives in .sdd/metrics/. If last week's backend tasks landed Sonnet at 95% and Haiku at 70%, the bandit starts there.
what falls out
After a few sessions:
| Task type | Typical model | Why |
|---|---|---|
| Docs, docstrings | Haiku | Templated, low reasoning |
| Test writing | Sonnet | Code understanding, not creativity |
| Bug fixes | Sonnet | Pattern matching |
| Refactoring | Sonnet/Opus | Depends on scope |
| Architecture, security | Opus | Deep reasoning |
Not hardcoded. Learned. If your test suite is unusually gnarly, the bandit will figure that out and route accordingly.
config
Default-on once .sdd/metrics/ exists. Tunable:
# .sdd/config.yaml
routing:
bandit_epsilon: 0.2
cascade: [haiku, sonnet, opus]
min_samples_per_arm: 5Disable:
routing:
bandit_enabled: falsenumbers
Across our self-development sessions - Bernstein refactoring its own codebase - bandit routing cut the bill roughly in half versus Sonnet-for-everything. Completion rate stayed within a couple of percent. A 10-agent / 50-task session that used to run $15-20 lands at $7-10. Measure your own with bernstein cost.
further
- Architecture for where routing sits in the pipeline.
- Source.
- orchestrate the orchestrators covers leaf-node delegation: how Bernstein routes through wrapped sub-orchestrators (Composio AO, ralphex) as if each were a single agent.
- v2.0 release notes for the per-step
cli:/model:directives that let a plan pin specific arms of the bandit per task. - cost calculator lets you put your own monthly LLM spend in and see what the routing band would shift.