Spec-driven development with Bernstein: a walkthrough

Why this page exists

The runtime surface - goal in, verified pull request out - is covered on the home page. What that summary skips is the loop a human actually drives before any agent spawns: you state intent, you read the decomposed plan, and only then do you let the scheduler execute it. This page walks those five stages end to end, with the real command at each step.

Bernstein keeps all of its state in a per-project .sdd/ directory (spec-driven development). That directory is the spine of everything below: the spec, the task backlog, the worktrees, the audit chain, and the cost ledger all live under it.

The loop at a glance

spec → checklist → tasks → implement → review

Each arrow is deterministic Python, not a prompt. The two human decision points are the start (writing the spec) and the end (approving the merge); everything between them is reproducible from the spec.

Stage 1

Spec - state the intent

bernstein init && bernstein run -g "<goal>"

bernstein init creates the .sdd/ workspace and a bernstein.yaml in the current directory. From there a spec is either a natural-language goal passed to run -g, or - when the goal is too coarse - a hand-written plan.yaml manifest that pins stages, roles, and models explicitly.

The spec is the only place a human writes prose. Everything downstream is derived from it deterministically, so the spec is also the artefact you version and review in a pull request before a single agent runs.

Stage 2

Checklist - review the decomposed plan

bernstein run --plan-only # or --dry-run for cost preview

run --plan-only emits the decomposed task plan as Markdown and exits before any agent spawns. run --dry-run adds the scheduling order and an estimated cost band. This is the checkpoint where you read what the planner intends to do and stop it if the decomposition is wrong.

Because the planner is plain Python rather than an LLM, the plan is reproducible: the same spec yields the same task graph. A decomposition bug shows up here as a wrong checklist you can read, not as a bad chain-of-thought you have to infer after the fact.

Stage 3

Tasks - the graph the scheduler builds

bernstein run plan.yaml

The planner emits a task graph: each task carries a role, a model, and its dependencies. Tasks with no unmet dependency are eligible to run; the rest wait. Each task is assigned its own isolated git worktree so concurrent agents never share a working tree.

State lives on disk under .sdd/runtime/ (the task backlog, per-agent tokens, the write-ahead log). External workers can claim eligible tasks from a shared same-host backlog with bernstein backlog claim --role <role>; the orchestrator and the worker pool read the same files.

Stage 4

Implement - one agent per worktree

(agents run automatically as tasks become eligible)

Each eligible task gets one agent in its worktree. Model selection follows the task role - a stronger model for architecture, a mid-tier model for ordinary implementation, a cheap model for tests and boilerplate. An epsilon-greedy bandit reroutes by observed pass rate per task type.

The agent itself is whichever CLI tool you already trust (Claude Code, Codex, and other adapters). Bernstein owns the scheduling, scoping, and audit; the adapter owns the edit. Every routing and gate decision is written to the HMAC-chained audit log under .sdd/audit/.

Stage 5

Review - gates, then the merge decision

bernstein approve <task-id> # or bernstein reject

Lint, type-check, tests, a security scan, and an optional cross-model review run on every diff. A failed gate retries against a stronger model. Your branch only ever sees diffs that passed every gate. The final merge decision is an explicit bernstein approve / bernstein reject.

bernstein wrap-up closes the session with a summary, lineage, and cost report. Nothing about the run is implicit: the spec, the plan, every gate result, and the approve/reject decision are all recorded, so the path from intent to merged PR is auditable end to end.

Where the state lives

Everything the loop produces is on disk under .sdd/, so a run is inspectable without a database:

.sdd/runtime/ - the live task backlog, per-agent tokens, and the write-ahead log.
.sdd/audit/ - an HMAC-chained audit log; every routing and gate decision is appended.
.sdd/metrics/ - the per-model cost ledger surfaced by bernstein cost.

Because the planner is code and the state is files, the same spec reproduces the same task graph, and the path from intent to merged PR is auditable end to end.

Next steps

CLI quickstart - install and the first commands, step by step.
Full documentation - every command, flag, and configuration surface.
Source on GitHub - the deterministic planner and the gate pipeline are open.