Skip to main content
← Back home

why bernstein

this is the page you land on when you've already read the docs and you're trying to decide: is this thing actually for me, or am i better off staying with what i've got. honest answers below — including the cases where the answer is "stay where you are".

why bernstein over crewai or autogen

the one-line truth: their scheduler is an llm; bernstein's scheduler is plain python. the loop that decides "agent A gets task 17 next" is a state machine in src/bernstein/scheduler.py — zero llm tokens spent on coordination, replayable from the audit log, deterministic across re-runs of the same plan.

bernstein is better for: regulated environments where every routing decision needs an auditable reason, multi-hour runs where llm-coordinator drift compounds, anyone who wants the same plan to produce the same task graph twice.

bernstein is worse for: free-form research-style swarms where "let the model figure out who does what" is the whole point. crewai and autogen are designed for that shape; bernstein is not. if you want emergent behavior, the determinism is a wall, not a feature.

it doesn't drive cli coding agents either. crewai and autogen wire python tool-calls; bernstein wraps actual terminal agents — claude code, codex, aider, gemini cli — and gives each one a real git worktree. different problem, different shape.

why bernstein over claude code alone

claude code can spawn sub-agents on its own; bernstein does the same thing across 44 different cli agents at once and verifies their output against your tests instead of trusting it. claude code included as one of the 44, and the most common primary backend in real bernstein installs.

the multi-agent shape matters when: you have ≥3 independent tasks that could run in parallel (one task per cli agent in its own worktree, no merge conflicts during execution), you want a regulated review path (cross-model verifier passes a diff to a second model before the merge queue lands it), or you're paying for compute by the parallel task and want throughput rather than depth on a single thread.

the multi-agent shape is overkill when: you're a single dev on a small repo doing one task at a time. claude code alone is faster, simpler, and the orchestration overhead is wasted ceremony. bernstein adds a state machine, a task store, a merge queue — value when you're running 5 agents, dead weight when you're running 1.

short test: if your last week of claude code sessions had any moment where you wished you could split the work, bernstein helps. if every session was naturally one focused thread, you don't need it.

is it production-ready

yes, with specific caveats — not the marketing kind.

what works: the scheduler is deterministic, the merge queue lands what passes the gates, the audit log is hmac-signed and replays cleanly, there are 44 adapters in the registry, the install is one pipx install bernstein. people are running this against real codebases.

what to know before you commit: the team is one person (alex, the operator). 0.x means breaking changes are still on the table — pin your version. the audit format is hmac-signed but the vendor binary format is not yet a published standard, so cross-tool replay is bernstein-only for now. the cloudflare workers integration is solid for edge sandboxing, but the agents you wrap (claude code, codex) still call their own apis, so api outages upstream knock those agents out individually.

production-ready in the "this is solid open-source code i'd run on a real repo" sense. not production-ready in the "saas vendor with a soc 2 report and a 24/7 oncall rotation" sense. those are different things; pick the one that matches your situation.

what's the catch

solo dev, free time funded, openrouter token spend on the operator's own card. apache 2.0 repo, $0 to install, no signup, no forced telemetry, no premium tier. but expect to pay your own model api fees if you actually run it — claude code calls anthropic, codex calls openai, aider calls whichever provider you wired up. those costs are not bernstein's; bernstein just orchestrates the agents.

the docs bot on this site is also a real cost. every answer goes through the operator's own ai gateway, which routes to free openrouter models when it can. that's why github sponsors helps: github.com/sponsors/chernistry. it stays free either way.

if you're looking for a 12-person team, vc backing, and a roadmap meeting twice a quarter, this is not that project. it's "engineer scratching their own itch and shipping the result".

who is bernstein for

specific shapes where the value lands:

  • engineering teams running ≥3 cli coding agents in parallel — each agent gets its own git worktree, the merge queue serialises landings, no race conditions
  • regulated or on-prem environments — every routing decision is in plain text, the audit log is hmac-signed and tamper-evident, no saas hop, no third-party data plane
  • platform teams that need an audit log of agent decisions — the orchestrator writes one row per scheduling decision, you can grep it
  • anyone burning more than $1k/mo on cursor/aider/claude-max who wants determinism — you can replay yesterday's plan and get yesterday's task graph
  • forward-deployed engineers dropping into a client repo — credentials stay in your env, not the client's; agents you spawn are whichever cli tool the client already trusts

if you nodded at two of those bullets, this fits.

who is bernstein NOT for

equally specific. these are the cases where you should pick something else:

  • "i want one pair-programmer to chat with about my code" — claude code or cursor alone. bernstein adds orchestration overhead you don't need
  • prototypes where merge gates are overkill — the lint/types/tests/cross-model-review pipeline is value when the cost of a bad merge is real, friction when you're throwing the repo away on friday
  • non-coding tasks (research, writing, data analysis pipelines) — bernstein wraps cli coding agents specifically, not generic llm workflows. crewai or autogen are the right shape there
  • anyone who wants a fully-managed saas with a credit card form, no infra to think about — bernstein gives you cloud-runtime options (cloudflare workers, kubernetes cluster, sandbox-execution mode) but doesn't host them for you. the runtime is yours; if you want someone else to operate it, this is the wrong project, not the wrong fit
  • teams that need a vendor with a support sla and a contract — solo open-source project. github issues are how support happens
  • research-shape "let the agents collaborate emergently" use cases — the deterministic scheduler is a hard wall there

where bernstein runs

wherever you point it. four shapes that operators actually pick:

  • laptoppipx install bernstein && bernstein init. the deterministic scheduler, audit log, and adapters all run locally. easiest way to feel the shape.
  • on-prem behind a firewall — air-gapped repos and internal CI runners. bernstein writes state to disk you own, no signup, no forced telemetry, no third-party data plane.
  • cloudflare workers cloud runtime — bernstein-spawned agents run inside cloudflare sandbox containers, paid through your cloudflare account. zero ssh, zero VPS to manage. the orchestrator can stay local while execution lives in the cloud, or both can sit on workers.
  • multi-node cluster — kubernetes-shaped fanout when one box can't hold the parallel agent count. queue + state in shared storage, agents spin up per-task pods.

your repo is the input. your tests are the gate. bernstein adapts to the host instead of forcing one. if your code can't leave the network it lives in — pick laptop or on-prem; if you want it elastic — pick workers or cluster.

this is also why there's no signup and no forced telemetry. the orchestrator runs where you run it, talks to the model apis you configure, writes state to disk you own. opt-in observability is full and audit-grade — hmac-chained run trail, per-task tool calls, model usage, token cost, latency percentiles, exportable to your own otel collector / datadog / splunk / s3. nothing leaves your box without your config.

what does it cost to actually run

specific numbers, not round ones.

the orchestrator itself: $0. apache 2.0, pipx install bernstein, no license fee.

the cli agents bernstein wraps cost what they cost. claude code on a max plan is ~$200/mo. codex cli on chatgpt plus rolls into the chatgpt subscription. aider with sonnet on a heavy day can hit $5–10 in api spend. gemini cli is free up to a quota. running all four together is what bernstein optimizes — the cost-aware bandit routes each task to the model that's been passing tasks in that shape.

the operator's own gateway (the thing answering this docs bot) runs on a small ovh vps at roughly $80/yr including hosting, qdrant, and the openrouter free-tier middleware. a self-hosted bernstein run is cheaper than that — you're not paying the openrouter middle hop, you're calling the model apis directly with your own keys.

if you're already burning >$1k/mo on coding agents, bernstein typically pays for itself in the first parallel run by letting you saturate your existing api budget instead of bottlenecking on one agent at a time.