Skip to main content
← Back to blog

Four commands that take the glue out of multi-agent runs

After a Bernstein session finishes, most teams end up running the same four shell snippets by hand: open a PR with the results, copy a ticket into a task, kick off a run on a beefier remote box, fire a post-merge hook. 1.8.14 turns each into a first-class command.

the glue tax

A healthy multi-agent workflow looks simple from outside: set a goal, four agents work in worktrees, the janitor verifies, the queue lands what passes. In practice every team that runs this daily ends up with a scripts/ directory full of wrappers:

  • A post-run.sh that runs git push && gh pr create --body "$(cat last-run-summary.md)" with a hand-rolled summary.
  • A Python one-liner that reads a Linear webhook and curls the task server.
  • A ssh-run.sh that rsyncs to a build box, opens screen, tails logs.
  • A .git/hooks/pre-commit that checks the janitor's JSON against a policy.

Each is ~30 lines. Each is slightly wrong somewhere: missing auth env var, wrong PATH, no retry, no graceful cleanup. The operator pack ships the four most common as composable subcommands sharing the same state model.

bernstein pr — open a PR with the janitor's receipts

bernstein pr --session-id last --draft

Reads .sdd/sessions/<id>-wrapup.json, gate results, cost tracker. Opens a GitHub PR with a conventional-commit title and body:

## Summary
- Add JWT auth with refresh tokens
- Cover the refresh endpoint with tests
- Document the new /auth/* routes
 
## Changes
src/auth/middleware.py   | 74 +++++
tests/test_auth.py       | 112 ++++++++
docs/auth.md             |  42 ++++
 
## Verification
✅ lint (ruff: 0 findings)
✅ types (pyright: 0 errors)
✅ tests (pytest: 48 passed)
✅ security (semgrep: clean)
 
## Cost
$0.38 · 182,410 tokens
  manager:   $0.04
  engineer:  $0.27
  qa:        $0.07
 
Generated from Bernstein session 7c4f1a3b9d22.

Flags: --base, --title, --draft, --dry-run, --no-push, --session-id. Conventional-commit prefix is inferred from goal + role (a docs-heavy session gets docs:, a bugfix gets fix:, default is feat:).

bernstein from-ticket <url>

You wrote the work description once, in the ticket. Pull it straight in:

bernstein from-ticket https://linear.app/acme/issue/ENG-412 --run

Three providers out of the box:

  • Linear — GraphQL via LINEAR_API_KEY.
  • GitHub Issues — local gh CLI when available, else GITHUB_TOKEN + REST.
  • Jira Cloud — REST v3 via JIRA_EMAIL + JIRA_API_TOKEN.

Labels drive role and scope. bugqa, docsdocs, epic bumps scope from medium to large. Provider, external ID, and URL get stashed on the task so downstream tooling can round-trip.

--run dispatches immediately. --dry-run previews:

Task preview
  goal:       "Migrate session store to Redis"
  role:       backend
  scope:      medium
  priority:   medium
  source:     linear / ENG-412
  assignee:   dmitri

bernstein remote — SSH sandbox

Heavier than your laptop (large test matrix, GPU calls, staging DB), and a VPS is faster than a cloud sandbox? remote wraps it:

bernstein remote test build-box-1
bernstein remote run build-box-1 ~/work/bernstein --user alex --port 22

Backed by an SSH SandboxBackend:

  • ControlMaster reuse. First call opens ~/.ssh/bernstein-<host>-<pid>.sock; subsequent commands reuse it. Per-call overhead drops from ~500ms to ~30ms.
  • ConnectTimeout=10, ServerAliveInterval=30 so a flaky network doesn't hang the run.
  • Error translation. Connection refused becomes SandboxConnectionError(host=..., hint="check that sshd is running on port X"). Permission denied suggests ssh-add or an IdentityFile entry.

Artifacts stay on the remote box for the duration. bernstein remote forget <host> tears the socket down.

bernstein hooks — pre/post lifecycle

Six events: pre_task, post_task, pre_merge, post_merge, pre_spawn, post_spawn. Hook any of them with a shell script, a Python callable, or a pluggy @hookimpl:

# bernstein.yaml
hooks:
  pre_task:
    - script: "scripts/check-branch.sh"
      timeout: 10
  post_merge:
    - script: "scripts/notify-slack.sh"
    - plugin: "bernstein_plugin_jira"

Shell hooks get a JSON payload on stdin plus BERNSTEIN_EVENT, BERNSTEIN_TASK_ID, BERNSTEIN_SESSION_ID, BERNSTEIN_WORKDIR in the env. Env is whitelisted (PATH, HOME, USER, BERNSTEIN_*) so credentials don't leak into third-party scripts. Stdout truncated at 10 MB. Non-zero exit from a pre_* hook aborts the event, useful for "don't spawn an agent if the working tree is dirty."

Three subcommands round it out: hooks list, hooks run <event> (fires with empty context for debugging), hooks check validates every script path.

why they compose

All four read the same .sdd/ state. So:

  1. bernstein from-ticket https://linear.app/acme/issue/ENG-412 --run
  2. …agents run on the SSH sandbox via bernstein remote run build-box-1 .
  3. bernstein pr --session-id last once the janitor signs off.
  4. post_merge hook fires the Slack notification and closes the Linear ticket.

No ad-hoc glue, no script drift between ~/work/*/scripts/. The session metadata that flowed from ticket → task → merge is still there if you need to replay or audit.

what's missing

  • No GitLab or Bitbucket ticket providers yet, open an issue. Provider interface is one small file per source.
  • SSH sandbox uses OpenSSH, not paramiko. Works everywhere OpenSSH does, won't embed in pure-Python deployments. SandboxBackend is stable; a paramiko adaptation is ~200 lines.
  • PR generator targets GitHub only. GitLab is small follow-up; gate results and cost tracker are already provider-agnostic.

pipx install 'bernstein>=1.8.14'. bernstein pr --help. Open an issue with whichever shell snippet is next on your list.

Bernstein