Skip to main content
← Back to blog

Picking a cheaper model when the task allows

Most agent frameworks send everything to the same model. Renaming a variable, designing an auth system: same prompt budget, same model either way. Bernstein's router learns which task type each model handles. Our own self-development runs roughly halved their per-session bill once the bandit warmed up.

the uniform problem

Routing a docstring task to Opus instead of Haiku is, at current pricing, about 30x overspend. Forty tasks a session, that's a real number - the kind that turns "side project" into a line item your spouse asks about.

three layers

Heuristic. Every task carries a complexity (low/medium/high) and a role (backend, qa, security, docs…). A rule-based classifier gives an opening guess: low → Haiku/Sonnet, high → Opus.

Bandit. Epsilon-greedy, per-role reward estimates. Exploit the best-known model 80% of the time, explore alternatives 20%. Reward = task completed, tests green, no retries.

# Simplified selection
candidates = ["sonnet", "opus"] if task.complexity == "high" else CASCADE
selected = bandit.select(role=task.role, candidate_models=candidates)

The cascade is cheapest-to-strongest. High-complexity tasks never get sent to Haiku. Wasting an agent's hour on the wrong model isn't actually cheap.

Effectiveness seeding. Cold-start data lives in .sdd/metrics/. If last week's backend tasks landed Sonnet at 95% and Haiku at 70%, the bandit starts there.

what falls out

After a few sessions:

Task typeTypical modelWhy
Docs, docstringsHaikuTemplated, low reasoning
Test writingSonnetCode understanding, not creativity
Bug fixesSonnetPattern matching
RefactoringSonnet/OpusDepends on scope
Architecture, securityOpusDeep reasoning

Not hardcoded. Learned. If your test suite is unusually gnarly, the bandit will figure that out and route accordingly.

config

Default-on once .sdd/metrics/ exists. Tunable:

# .sdd/config.yaml
routing:
  bandit_epsilon: 0.2
  cascade: [haiku, sonnet, opus]
  min_samples_per_arm: 5

Disable:

routing:
  bandit_enabled: false

numbers

Across our self-development sessions - Bernstein refactoring its own codebase - bandit routing cut the bill roughly in half versus Sonnet-for-everything. Completion rate stayed within a couple of percent. A 10-agent / 50-task session that used to run $15-20 lands at $7-10. Measure your own with bernstein cost.

further

  • Architecture for where routing sits in the pipeline.
  • Source.
  • orchestrate the orchestrators covers leaf-node delegation: how Bernstein routes through wrapped sub-orchestrators (Composio AO, ralphex) as if each were a single agent.
  • v2.0 release notes for the per-step cli: / model: directives that let a plan pin specific arms of the bandit per task.
  • cost calculator lets you put your own monthly LLM spend in and see what the routing band would shift.
Bernstein

Prefer a weekly recap? Subscribe to the weekly digest.