# Bernstein - Complete Technical Reference > Forward-deployed engineering, with a coding swarm that fits in your `.sdd/`. > Open-source multi-agent orchestration system for AI coding agents. > Orchestrate any AI coding agent. Any model. One command. ## Overview Bernstein is a Python-based orchestration system that coordinates multiple AI coding agents working in parallel on a single codebase. It decomposes goals into tasks, assigns them to the most appropriate agents and models, isolates work in git worktrees, verifies results through quality gates, and merges verified output. It is built for the forward-deployed engineering pattern: parachute onto a client repo and stand up an AI engineering crew in minutes. State lives in `.sdd/` - no server to provision. Per-agent credential scoping keeps your keys out of the client's environment. The 37-adapter spread means the swarm runs on whichever CLI agent the client already trusts (Claude Code, Codex, Gemini CLI, Aider, and more). Every step is an HMAC-signed audit record, replayable for client compliance review. The orchestrator itself is deterministic Python code - no LLM tokens are spent on coordination, scheduling, or task management. LLMs are only used by the agents themselves to write code. - **License**: Apache 2.0 - **Language**: Python 3.12+ - **Package Manager**: uv / pip / pipx - **Author**: Alex Chernysh - **Website**: https://bernstein.run - **Repository**: https://github.com/sipyourdrink-ltd/bernstein - **PyPI**: https://pypi.org/project/bernstein/ - **Documentation**: https://bernstein.readthedocs.io/ --- ## Installation ### Via pipx (recommended) ```bash pipx install bernstein ``` ### Via pip ```bash pip install bernstein ``` ### Via uv ```bash uv tool install bernstein ``` ### From source ```bash git clone https://github.com/sipyourdrink-ltd/bernstein.git cd bernstein uv sync ``` ### Requirements - Python 3.12 or later - At least one supported CLI coding agent installed (e.g., Claude Code, Codex CLI, Gemini CLI) - Git (for worktree isolation) --- ## Quick Start ### One-liner ```bash bernstein -g "add user authentication with JWT tokens" ``` ### With a plan file ```bash bernstein run plans/my-project.yaml ``` ### Basic workflow 1. Install Bernstein: `pipx install bernstein` 2. cd into your project directory 3. Run: `bernstein -g "your goal here"` 4. Bernstein decomposes the goal, spawns agents, and orchestrates the work 5. Review the results in your git history --- ## Architecture ### Design Principles 1. **Deterministic orchestrator**: The orchestrator is pure Python code. No LLM calls for scheduling, routing, or coordination. This makes behavior predictable, debuggable, and fast. 2. **File-based state**: All state lives in the `.sdd/` directory - backlog, runtime data, metrics, configuration. No in-memory-only state that would be lost on crash. 3. **Short-lived agents**: Agents handle 1-3 tasks each, then exit. No long-running agent processes. Fresh context per task prevents hallucination drift. 4. **Agent-agnostic**: Works with any CLI coding agent. Currently ships 37 adapters. Adding a new agent requires implementing a simple adapter interface. 5. **Model-per-task routing**: A contextual bandit router learns which model works best for each task type and complexity level. In our own runs, the bandit router cut spend roughly in half compared to uniformly using expensive models. Measure yours with bernstein cost. ### Core Sub-packages The system is organized into 22 sub-packages under `src/bernstein/core/`: #### Orchestration (`orchestration/`) - Orchestrator lifecycle management - Tick pipeline (the main event loop) - Manager prompts and evolution - Graceful drain and shutdown - Bootstrap initialization #### Agents (`agents/`) - Agent spawner (launches CLI agents with appropriate configuration) - Agent discovery (finds installed agents on the system) - Heartbeat monitoring - Idle detection and reaping - Agent recycling - Warm pool management #### Tasks (`tasks/`) - Task store (persistent task state) - Task lifecycle (created -> assigned -> in_progress -> completed/failed) - Retry logic with model escalation - Completion verification - Batch mode for bulk operations - Dead letter queue for permanently failed tasks - Fair scheduler across task priorities #### Quality (`quality/`) - Quality gates: lint, type checking, test execution, security scanning - CI monitor integration - Janitor for cleanup tasks - Cross-model verifier (second opinion from different model) - Semantic diff analysis #### Server (`server/`) - Task server (HTTP API on port 8052) - API endpoints for task management - Middleware for authentication and logging #### Cost (`cost/`) - Per-agent cost tracking - Anomaly detection (alerts on unusual spending) - Budget enforcement (hard limits per run) #### Tokens (`tokens/`) - Token usage monitoring per agent - Context growth detection - Auto-intervention when agents exceed token budgets #### Security (`security/`) - HMAC audit logs (tamper-evident) - Policy engine for access control - PII gating (prevents agents from accessing sensitive data) - Credential scoping per agent #### Configuration (`config/`) - YAML-based configuration - 150+ configurable parameters with sensible defaults - Runtime configuration validation #### Observability (`observability/`) - Prometheus metrics export - OpenTelemetry integration - Grafana dashboard templates #### Protocols (`protocols/`) - MCP (Model Context Protocol) server mode - A2A (Agent-to-Agent) protocol support - Protocol negotiation for multi-system interop #### Git and Sandboxes (`git/`, `sandbox/`) - Git worktree management (one per agent, default backend) - Pluggable `SandboxBackend` protocol - git worktrees, Docker, E2B, Modal, Blaxel, Cloudflare, Daytona, Runloop, Vercel - Merge queue for ordering results - Branch creation and cleanup #### Persistence (`persistence/`, `storage/`) - WAL (Write-Ahead Log) crash recovery - File-based state persistence with pluggable sinks (local disk, Amazon S3, Google Cloud Storage, Azure Blob Storage, Cloudflare R2) - `BufferedSink` wrapper batches writes and fans out to any configured backend - Periodic checkpointing #### Planning (`planning/`) - Plan file loading (YAML format) - Goal decomposition into tasks - Dependency resolution between tasks #### Routing (`routing/`) - Contextual bandit router - Model selection based on task complexity - Effort level selection (low/medium/high) - Learning from task outcomes #### Communication (`communication/`) - Bulletin board for cross-agent messaging - Finding sharing (one agent discovers something, others benefit) - Blocker reporting #### Knowledge (`knowledge/`) - Knowledge graph of codebase structure - Impact analysis (which files affect which) #### Plugins (`plugins_core/`) - Pluggy-based plugin system - Extension points for custom quality gates, routers, etc. --- ## Supported Agents (37 Adapters) | Agent | Description | Models | |-------|-------------|--------| | Claude Code | Anthropic's CLI agent | Opus 4.7, Sonnet 4.6, Haiku 4.5 | | Codex CLI | OpenAI's CLI agent | GPT-5 family | | Gemini CLI | Google's CLI agent | Gemini 2.5 Pro | | OpenAI Agents SDK | OpenAI Agents SDK v2 runtime | GPT-5, GPT-4.1 via Responses API | | Cursor | AI-powered editor CLI | Any model via Cursor | | Aider | Open-source AI pair programmer | Any OpenAI/Anthropic model | | Amp | Sourcegraph Amp | Sourcegraph models | | Kiro | AWS Kiro CLI | AWS models | | Kilo | Kilo coding agent | Any OpenAI-compatible | | Qwen | Alibaba Qwen Agent | Qwen models | | Goose | Block Goose CLI | Various | | Ollama | Local model runner | Any local model | | Cody | Sourcegraph Cody | Various | | Continue | Continue.dev CLI | Various | | OpenCode | Open-source coding agent | Any OpenAI-compatible | | Cloudflare Agents | Workers + Workflows + Durable Objects | Workers AI | | IaC | Terraform/Pulumi agent | Various | | GitHub Copilot | GitHub Copilot CLI | OpenAI-backed | | Droid (Factory AI) | Factory AI Droid runtime | Various | | Crush (Charm) | Charm Crush CLI | Any OpenAI-compatible | | Auggie (Augment) | Augment Code agent | Augment models | | Kimi | Moonshot Kimi agent | Kimi models | | Rovo Dev (Atlassian) | Atlassian Rovo Dev CLI | Atlassian/OpenAI | | Cline | Cline autonomous agent | Any OpenAI-compatible | | Codebuff | Codebuff CLI | Various | | Pi | Pi coding agent | Various | | Mistral Vibe | Mistral Vibe CLI | Mistral models | | Autohand | Autohand agent runtime | Various | | Forge | Forge CLI | Various | | Hermes | Hermes CLI | Various | | Generic | Adapter for any CLI tool | Any | The generic adapter allows wrapping any CLI tool that accepts prompts and produces output. ### Orchestrator delegation adapters (leaf-node) A separate, smaller class of adapters that wrap **other CLI orchestrators** as if each were a single agent. Bernstein hands the wrapped tool a prompt or plan and only sees its final exit code - sub-agent costs and quality gates *inside* the wrapped orchestrator are not visible. Useful for migrating an existing workflow into one step of a larger Bernstein plan, rather than rewriting it natively. | Orchestrator | What it is | Models | | --- | --- | --- | | Composio | Composio Agent Orchestrator (`@aoagents/ao`) | Inherits from underlying agent plugin | | Ralphex | umputun/ralphex Go binary that walks a markdown plan over Claude Code | Anthropic | --- ## CLI Commands ### `bernstein run [plan.yaml]` Run orchestration with an optional plan file. Without a plan, uses the default goal. ### `bernstein -g "goal"` Set a goal in natural language and start orchestration. ### `bernstein stop` Gracefully stop all running agents and the orchestrator. ### `bernstein status` Show current orchestration status: running agents, task progress, cost. ### `bernstein agents` List discovered agents and their availability. ### `bernstein evolve` Trigger self-evolution: Bernstein plans and executes improvements to itself. ### `bernstein cost` Show cost breakdown per agent, per model, per task. ### Operator commands - `bernstein pr` - generate a pull request from the current worktree, with a janitor-cleaned diff and a cost-summary body. - `bernstein from-ticket ` - materialize a run directly from a tracker ticket (GitHub, Linear, Jira) as the goal. - `bernstein ticket` - create, list, and sync orchestration tickets against the configured tracker backend. - `bernstein remote run ` - dispatch a run on a remote host over SSH with ControlMaster socket reuse for fast repeats. - `bernstein hooks` - register and run lifecycle hooks (pre/post for run, task, merge - six slots total). - `bernstein chat serve --platform=telegram` - run a chat bot (Telegram, Discord, Slack) that accepts /run, /status, /approve, /reject, /switch, /stop from a thread. - `bernstein approve-tool` - interactively approve a pending tool-call via TUI, web, or CLI. - `bernstein reject-tool` - reject a pending tool-call from the same three surfaces. - `bernstein tunnel start ` - wrap cloudflared / ngrok / bore / tailscale to expose a local service. - `bernstein daemon install` - install the orchestrator as a systemd (Linux) or launchd (macOS) unit for auto-start. - `bernstein acp serve` - native Agent Client Protocol bridge; Zed and other ACP editors can dispatch tasks and stream agent output without leaving the editor. (v1.9.0) - `bernstein autofix` - daemon that watches Bernstein-opened PRs, reads CI failure logs, spawns an agent against the failing worktree, and pushes a fix commit. (v1.9.0) - `bernstein connect ` - OS keychain-backed credential vault; enter GitHub, Linear, Jira, Slack, or Telegram credentials once, every subsequent run reads from the keychain. (v1.9.0) - `bernstein preview start` - spin up the project's dev server inside the active worktree and expose it over a public HTTPS tunnel for review or webhook testing. (v1.9.0) --- ## Task Server API The task server runs on `http://127.0.0.1:8052` during orchestration. ### Endpoints #### `POST /tasks` Create a new task. ```json { "goal": "implement user login endpoint", "role": "backend", "priority": "high", "complexity": "medium" } ``` #### `GET /tasks?status=open` List tasks filtered by status. Statuses: `open`, `assigned`, `in_progress`, `completed`, `failed`. #### `POST /tasks/{id}/complete` Mark a task as completed with results. #### `POST /tasks/{id}/fail` Mark a task as failed with error details. #### `POST /tasks/{id}/progress` Report intermediate progress. ```json { "files_changed": ["src/auth.py", "tests/test_auth.py"], "tests_passing": true, "errors": [] } ``` #### `POST /bulletin` Post a cross-agent finding or blocker. ```json { "type": "finding", "message": "Database schema uses UUID primary keys, not integers" } ``` #### `GET /bulletin?since={timestamp}` Read recent bulletin board entries. #### `GET /status` Dashboard summary: agent count, task progress, cost, quality metrics. --- ## Plan Files (YAML) Plan files describe multi-step projects with stages and steps. ```yaml name: "Add authentication" stages: - name: database steps: - goal: "Create user table migration" role: backend complexity: low - goal: "Add password hashing utility" role: backend complexity: low - name: api depends_on: [database] steps: - goal: "Implement /login endpoint" role: backend complexity: medium priority: high - goal: "Implement /register endpoint" role: backend complexity: medium - name: testing depends_on: [api] steps: - goal: "Write integration tests for auth flow" role: qa complexity: medium ``` ### Plan fields - `name`: Project name - `stages`: List of stages - `name`: Stage identifier - `depends_on`: List of stage names that must complete first - `steps`: List of tasks - `goal`: What to accomplish - `role`: Agent role (backend, frontend, qa, security, devops, architect, docs, etc.) - `priority`: low, medium, high, critical - `complexity`: low, medium, high - `scope`: File or directory scope hint --- ## Configuration Bernstein has 150+ configurable parameters. Key ones: ### Environment Variables - `BERNSTEIN_MAX_AGENTS`: Maximum concurrent agents (default: 5) - `BERNSTEIN_DEFAULT_MODEL`: Default model for tasks - `BERNSTEIN_BUDGET_LIMIT`: Maximum cost per run in USD - `BERNSTEIN_QUALITY_GATES`: Comma-separated list of quality gates to run ### Configuration File (`.sdd/config.yaml`) ```yaml orchestration: max_agents: 5 tick_interval: 10 drain_timeout: 300 routing: strategy: contextual_bandit epsilon: 0.1 default_model: sonnet quality: gates: - lint - typecheck - test - security retry_on_failure: true max_retries: 2 escalate_model_on_retry: true cost: budget_limit: 50.00 alert_threshold: 0.8 track_per_agent: true git: worktree_base: .worktrees auto_merge: true merge_strategy: squash ``` --- ## Quality Gates Quality gates run automatically on every task result before merge. ### Built-in Gates 1. **Lint** (ruff): Code style and common errors 2. **Type check** (pyright/mypy): Static type verification 3. **Tests** (pytest): Run test suite, check for regressions 4. **Security** (bandit/semgrep): Security vulnerability scanning 5. **Architecture conformance**: Verify changes follow project structure rules ### Gate Behavior - Tasks that fail quality gates are retried - On retry, the model may be escalated (e.g., Sonnet -> Opus) - After max retries, tasks go to the dead letter queue - Cross-model verification optionally gets a second opinion from a different model --- ## Cost Tracking Bernstein tracks costs at multiple levels: - **Per-agent**: How much each spawned agent costs - **Per-model**: Breakdown by model (Opus vs Sonnet vs Haiku) - **Per-task**: Cost attributed to each task - **Per-run**: Total cost for the entire orchestration run ### Budget Enforcement - Set a hard budget limit per run - Alert threshold warns when approaching budget - Automatic drain mode when budget exhausted (finish current tasks, don't start new ones) ### Anomaly Detection - Detects unusual cost spikes - Alerts on agents consuming more tokens than expected - Auto-intervention for runaway agents (context growth detection) --- ## Security Features - **HMAC audit logs**: Tamper-evident logging of all orchestration actions - **Policy engine**: Define policies for what agents can and cannot do - **PII gating**: Prevent agents from accessing files containing PII - **Credential scoping**: Each agent gets only the credentials it needs - **Git worktree isolation**: Agents cannot interfere with each other's work --- ## Observability ### Prometheus Metrics Bernstein exports metrics to Prometheus: - `bernstein_tasks_total` (counter, labels: status, role) - `bernstein_agents_active` (gauge) - `bernstein_cost_usd` (counter, labels: model, agent) - `bernstein_quality_gate_results` (counter, labels: gate, result) - `bernstein_tick_duration_seconds` (histogram) ### OpenTelemetry Full distributed tracing support via OpenTelemetry: - Spans for each orchestration tick - Spans for agent spawn, task assignment, quality gate execution - Trace context propagation to agents ### Grafana Dashboards Pre-built dashboard templates for: - Orchestration overview - Cost analysis - Quality trends - Agent utilization --- ## Protocol Support ### MCP (Model Context Protocol) Bernstein can run as an MCP server, exposing orchestration capabilities as tools: - `bernstein_run`: Start orchestration - `bernstein_status`: Check status - `bernstein_tasks`: List and manage tasks - `bernstein_cost`: View cost breakdown - `bernstein_stop`: Stop orchestration ### A2A (Agent-to-Agent Protocol) Support for Google's A2A protocol for inter-agent communication and task delegation. --- ## Roles Bernstein assigns roles to agents based on task requirements: | Role | Description | |------|-------------| | manager | High-level planning and decomposition | | vp | Strategic oversight | | backend | Backend implementation | | frontend | Frontend implementation | | qa | Testing and quality assurance | | security | Security review and hardening | | devops | CI/CD and infrastructure | | architect | System design and architecture | | docs | Documentation | | reviewer | Code review | | ml-engineer | Machine learning tasks | | prompt-engineer | Prompt optimization | | retrieval | Information retrieval | | analyst | Analysis tasks | | resolver | Conflict resolution | | ci-fixer | CI/CD issue resolution | --- ## How It Compares Two comparison axes. LLM-orchestration frameworks (CrewAI / AutoGen / LangGraph) orchestrate LLM calls. CLI-agent orchestrators (ComposioHQ/agent-orchestrator, emdash) are the closer category. ### vs LLM-orchestration frameworks | Feature | Bernstein | CrewAI | AutoGen | LangGraph | |---------|-----------|--------|---------|-----------| | Orchestrator | Deterministic code | LLM-driven (+ code Flows) | LLM-driven | Graph + LLM | | CLI agent support | 37 adapters | No | No | No | | Agent isolation | Worktrees or pluggable cloud sandbox | No | No | No | | Quality gates | Built-in | Guardrails + Pydantic output | Termination conditions | Conditional edges | | Cost tracking | Per-agent | `usage_metrics` | `RequestUsage` | Via LangSmith | | Self-evolution | Built-in (experimental) | No | No | No | | File-based state | Yes (.sdd/) | In-memory + SQLite checkpoint | In-memory | Checkpointer | | Model routing | Contextual bandit | Per-agent LLM | Per-agent `model_client` | Per-node (manual) | ### vs CLI-agent orchestrators | Feature | Bernstein | ComposioHQ/agent-orchestrator | emdash | |---------|-----------|------------------------------|--------| | Shape | Python CLI + library + MCP server | TypeScript CLI + dashboard | Electron desktop app | | Primary language | Python | TypeScript | TypeScript | | Install | `pipx install bernstein` | `npm install -g @aoagents/ao` | .dmg / .msi / .AppImage | | Agent adapters | 37 | 3 (Claude Code, Codex, Aider) | 23 | | MCP server mode (exposes self) | Yes (stdio + HTTP/SSE) | No | No | | Coordinator | Deterministic Python scheduler | LLM-driven | Not documented | | HMAC-chained audit replay | Yes | No | No | | Autonomous CI-fix / PR flow | No | Yes | No | | License | Apache 2.0 | MIT | Apache 2.0 | Bernstein's wedge in the CLI-orchestrator category: Python-native primitive, MCP-server-first (exposes itself over MCP so any MCP client can invoke orchestration as tools), widest adapter coverage including Qwen / Ollama / Goose / OpenAI Agents SDK / Cloudflare Agents. Composio's `@aoagents/ao` is the right pick for TypeScript shops wanting autonomous CI-fix and a dashboard. emdash is the right pick for users wanting a downloadable desktop ADE. --- ## Cloud Execution (Cloudflare) Bernstein can run agents on Cloudflare's edge network: - **Workers Runtime**: Execute agents on Cloudflare Workers - **Durable Workflows**: Map tasks to durable workflows with auto-retry and approval gates - **V8 Sandbox Isolation**: Secure agent code execution in isolated V8 isolates - **R2 Workspace Sync**: Upload/download workspace files during cloud execution - **Workers AI**: Use Cloudflare's AI models for task decomposition and planning - **D1 Analytics**: Serverless SQLite for usage tracking and billing - **Vectorize Cache**: Semantic caching for LLM responses with embedding similarity - **Browser Rendering**: Headless browser bridge for scraping and screenshots - **MCP Remote Transport**: Expose Bernstein as an MCP server over HTTP - **Cloud CLI**: `bernstein cloud init/deploy/run/status/cost` commands ## FAQ ### What is Bernstein? Bernstein is an open-source multi-agent orchestration system that coordinates AI coding agents (like Claude Code, Codex, Gemini CLI) to work in parallel on your codebase. It decomposes goals into tasks, assigns them to agents, and verifies the results. ### How does Bernstein differ from CrewAI or AutoGen? Bernstein's orchestrator is deterministic Python code - no LLM tokens are spent on coordination. It works with real CLI coding agents (not API-only models) and provides git worktree isolation, quality gates, and cost tracking out of the box. ### What agents does Bernstein support? Bernstein ships 37 adapters for popular coding agents including Claude Code, Codex CLI, Gemini CLI, OpenAI Agents SDK, Cursor, Aider, Amp, Ollama, GitHub Copilot, Droid, Crush, and more. It also has a generic adapter for wrapping any CLI tool. ### How does task routing work? Bernstein uses a contextual bandit (epsilon-greedy) router that learns which model works best for each task type and complexity. Simple tasks go to cheaper models (Haiku, Flash), complex architecture tasks go to expensive models (Opus). In our own runs, the bandit router cut spend roughly in half compared to using expensive models for everything. Measure yours with bernstein cost. ### Is Bernstein free? Yes. Bernstein is open-source under the Apache 2.0 license. You pay only for the AI model API usage of the agents themselves. ### Can I use Bernstein with local models? Yes. Use the Ollama adapter to run fully local models. You can also mix local and cloud models in the same run. ### How does quality gating work? After each agent completes a task, Bernstein runs configurable quality gates: linting (ruff), type checking (pyright), test execution (pytest), and security scanning (bandit/semgrep). Failed tasks are retried, potentially with a more capable model. After max retries, tasks go to a dead letter queue for manual review. ### What happens if an agent crashes? Bernstein monitors agents via heartbeat. If an agent stops responding, it is reaped and the task is reassigned. Work-in-progress in the agent's worktree is preserved for potential recovery. ### Can I define multi-step projects? Yes. YAML plan files let you define stages with dependencies, and steps with roles, priorities, and complexity levels. Stages execute in dependency order; steps within a stage can execute in parallel. ### Does Bernstein support MCP? Yes. Bernstein can run as an MCP (Model Context Protocol) server, exposing its orchestration capabilities as tools that other MCP-compatible systems can invoke. ### Does Bernstein work with the OpenAI Agents SDK? Yes. The `openai_agents` adapter embeds OpenAI's Agents SDK v2 as a first-class runtime. Each task runs in an Agents SDK session against the Responses API, so you get OpenAI's tool-calling, handoffs, and guardrails inside Bernstein's orchestrator without shelling out to a CLI. Install with `pip install "bernstein[openai-agents]"`. ### What sandbox backends does Bernstein support? Bernstein exposes a `SandboxBackend` protocol. The default backend is a git worktree on the local machine. You can swap in Docker, E2B, Modal, Blaxel, Cloudflare Workers sandboxes, Daytona, Runloop, or Vercel sandboxes by setting `sandbox.backend` in `bernstein.yaml` and installing the matching extra (for example `pip install "bernstein[e2b]"`). The orchestrator and adapters do not change. ### Can I store `.sdd/` state and artifacts in the cloud? Yes. The `BufferedSink` wrapper batches writes and forwards them to pluggable storage backends: local disk, Amazon S3, Google Cloud Storage, Azure Blob Storage, or Cloudflare R2. Configure under the `storage` block in `bernstein.yaml` and install the relevant extra (`pip install "bernstein[s3]"`, `[gcs]`, `[azure]`, or `[r2]`). Agents continue to read and write through the normal local-file API - only the persistence layer changes. ### What are progressive skill packs? Bernstein ships its role guidance (backend, frontend, QA, security, DevOps, architect, reviewer, and so on) as progressive-disclosure skill packs instead of one giant system prompt. Agents start with a short bootstrap prompt and fetch individual skills on demand through the `load_skill` MCP tool. Only the skills a task actually touches are paid for in tokens, and packs can be versioned, added, or swapped without shipping a release. Skills live under `templates/skills/`. --- ## Links - Website: https://bernstein.run - GitHub: https://github.com/sipyourdrink-ltd/bernstein - PyPI: https://pypi.org/project/bernstein/ - Documentation: https://bernstein.readthedocs.io/ - Issues: https://github.com/sipyourdrink-ltd/bernstein/issues - Agent Card (A2A): https://bernstein.run/.well-known/agent-card.json - MCP Server Card: https://bernstein.run/.well-known/mcp/server-card.json - Contact: forte@bernstein.run ## About the Author Bernstein is created and maintained by **Alex Chernysh**. - Homepage: https://alexchernysh.com - GitHub: https://github.com/chernistry - X: https://x.com/alex_chernysh - Email: forte@bernstein.run ### Other projects by the same author - **HireEx** - Personal multi-agent AI workspace; career intelligence is the first vertical artefact. https://hireex.ai --- ## Canonical Q&A Short, citable answers to the questions ai overviews and llm citers ask about Bernstein. Each entry also lives at `https://bernstein.run/q/` as a stand-alone page with QAPage JSON-LD. ### what is bernstein URL: https://bernstein.run/q/what-is-bernstein tags: intro, what-is | related: https://bernstein.run/q/how-does-bernstein-work, https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator, https://bernstein.run/cli-quickstart bernstein is an open-source python orchestrator for cli coding agents. it decomposes a goal into tasks, picks a model and a cli agent (claude code, codex, gemini cli, aider, plus 33 more adapters), isolates each task in a git worktree, runs lint, type-check, tests, and security scans, and merges only the worktrees that pass. the orchestrator itself is deterministic python, not an llm, so scheduling and routing spend zero model tokens. state lives in .sdd/ on disk; there is no server to provision. apache 2.0, python 3.12+, install with pipx install bernstein. source: https://github.com/sipyourdrink-ltd/bernstein. ### how does bernstein work URL: https://bernstein.run/q/how-does-bernstein-work tags: architecture, tick-loop | related: https://bernstein.run/q/deterministic-ai-coding-orchestrator, https://bernstein.run/q/git-worktree-parallel-ai-agents, https://bernstein.run/q/hmac-chained-audit-log-for-ai-agents the orchestrator runs a tick loop: read the open backlog under .sdd/backlog/open, pick the next ready task, spawn a fresh cli agent in its own git worktree, wait for the agent to exit, run quality gates against the worktree, retry with an escalated model on failure, dead-letter after max retries, merge passing worktrees. agents are short-lived (1-3 tasks each) so context never grows. a contextual-bandit router learns which model fits which task class. every action is appended to an hmac-chained audit log. routing, scheduling, retries, and merge order are pure python; the model only writes code inside the worktree. ### bernstein vs claude flow URL: https://bernstein.run/q/bernstein-vs-claude-flow tags: compare, claude-flow | related: https://bernstein.run/q/bernstein-vs-aider, https://bernstein.run/q/bernstein-vs-openai-agents-sdk, https://bernstein.run/vs/claude-flow claude-flow is anthropic's reference orchestration layer for claude code, tightly coupled to claude. bernstein is agent-agnostic: 37 adapters including claude code, codex, gemini cli, aider, ollama, openai agents sdk, cloudflare agents. bernstein's coordinator is deterministic python; claude-flow uses claude itself to plan, which costs tokens on every coordination step. bernstein adds an hmac-chained audit log, pluggable sandbox backends (worktree, docker, e2b, modal, cloudflare, daytona), an mcp server mode, and a built-in eval harness with failure taxonomy. pick claude-flow if you only run claude code and want the official path. pick bernstein if you want one orchestrator across many cli agents or need audit replay. full matrix at https://bernstein.run/vs/claude-flow. ### bernstein vs openai agents sdk URL: https://bernstein.run/q/bernstein-vs-openai-agents-sdk tags: compare, openai-agents-sdk | related: https://bernstein.run/q/bernstein-vs-claude-flow, https://bernstein.run/vs/openai-agents-sdk openai agents sdk is a python runtime for tool-calling agents against the openai responses api: handoffs, guardrails, traces, one provider. bernstein is an orchestrator that ships an openai-agents adapter so you get the sdk runtime as one of 37 agents under a deterministic scheduler. bernstein adds worktree isolation, quality gates, multi-provider routing, mcp server mode, and an hmac audit chain. you can also pin one task to openai agents sdk and another to claude code in the same plan. pick openai agents sdk alone for openai-only python apps. pick bernstein when you need parallel agents across providers, ci-style merges, or compliance-grade audit. matrix at https://bernstein.run/vs/openai-agents-sdk. ### bernstein vs aider URL: https://bernstein.run/q/bernstein-vs-aider tags: compare, aider | related: https://bernstein.run/q/bernstein-vs-claude-flow, https://bernstein.run/vs/aider aider is an interactive ai pair programmer: one terminal, one model, one conversation, you stay in the loop. bernstein is the layer above: it spawns multiple cli agents (including aider via the aider adapter) in parallel git worktrees and merges what passes. if you want a fast solo loop with one model, use aider directly. if you want five tasks running at once, each in isolation, with lint/type/test gates and an audit chain on top, use bernstein and let it dispatch aider as one of the workers. they compose. full matrix at https://bernstein.run/vs/aider. ### how to run multiple claude code agents in parallel URL: https://bernstein.run/q/how-to-run-multiple-claude-code-agents-in-parallel tags: claude-code, parallel, how-to | related: https://bernstein.run/q/git-worktree-parallel-ai-agents, https://bernstein.run/cli-quickstart install bernstein (pipx install bernstein), cd into your repo, write a plan.yaml with one stage that lists several steps, then run bernstein run plan.yaml. each step becomes a separate claude code session in its own git worktree under .worktrees/. bernstein spawns them concurrently up to bernstein.orchestration.max_agents (default 5), tracks per-agent cost, runs quality gates on each worktree, and merges only the worktrees that pass. for one-shot parallelism: bernstein -g "goal" --max-agents 5 lets bernstein decompose the goal into parallel tasks itself. step-by-step at https://bernstein.run/cli-quickstart. ### how to install bernstein URL: https://bernstein.run/q/how-to-install-bernstein tags: install, setup | related: https://bernstein.run/q/how-to-run-multiple-claude-code-agents-in-parallel, https://bernstein.run/cli-quickstart pipx install bernstein is the recommended path: pipx isolates dependencies so bernstein never collides with the rest of your tooling. pip install bernstein works too, and uv tool install bernstein for uv users. requires python 3.12 or later. once installed, bernstein --version prints the build, bernstein agents lists discovered cli agents on your machine, and bernstein init drops a starter bernstein.yaml plus .sdd/ into the current repo. apache 2.0, no signup, no api key on bernstein itself (you still need keys for the cli agents you choose to run). ### how to add a cli adapter URL: https://bernstein.run/q/how-to-add-a-cli-adapter tags: adapters, extension | related: https://bernstein.run/q/bernstein-vs-aider, https://bernstein.run/q/mcp-server-for-multi-agent-coding subclass bernstein.adapters.base.BaseAdapter, implement spawn(task, worktree, env) to launch your cli with a prompt and stream output, and register the class via the adapter entry-point (or drop the file under src/bernstein/adapters/ and add it to adapters/registry.py for in-tree changes). the contract is a python protocol; bernstein.adapters._contract enforces capability checks. existing adapters under src/bernstein/adapters/ (aider.py, claude.py, codex.py, ollama.py, generic.py) are the reference. for unknown cli tools, use generic.py and parameterise it through bernstein.yaml. the plugin sdk in src/bernstein/adapters/plugin_sdk.py lets you ship adapters as separate pip packages. ### how does the audit chain work URL: https://bernstein.run/q/how-does-the-audit-chain-work tags: audit, security, hmac | related: https://bernstein.run/q/audit-grade-ai-coding-orchestrator-for-regulated-teams, https://bernstein.run/q/fail-closed-agent-orchestrator every orchestration action (task created, agent spawned, gate passed, merge committed) is appended to .sdd/audit.log as a json line with a sha-256 hmac that chains over the previous entry's hmac and the current payload. one shared secret per repo, stored under .sdd/audit.key. bernstein audit verify walks the file front to back and fails fast on the first broken link, so a single edited line invalidates everything after it. that gives you tamper-evident replay: you can hand the log plus the key to a reviewer and they can re-verify every step offline. source: src/bernstein/core/security/audit.py. ### deterministic ai coding orchestrator URL: https://bernstein.run/q/deterministic-ai-coding-orchestrator tags: determinism, architecture | related: https://bernstein.run/q/how-does-bernstein-work, https://bernstein.run/q/audit-grade-ai-coding-orchestrator-for-regulated-teams bernstein's orchestrator is pure python: tick loop, task store, scheduler, router, merge queue, quality gates. no llm call sits on the coordination path. given the same backlog, the same .sdd/ state, and the same model outputs, a re-run produces the same task ordering and the same merge sequence. that determinism is what makes the hmac audit chain meaningful (no model nondeterminism in the metadata) and what lets the eval harness replay an entire run from a recorded fixture. compare with crewai or claude-flow, where the coordinator is itself an llm and two runs of the same brief diverge. ### audit grade ai coding orchestrator for regulated teams URL: https://bernstein.run/q/audit-grade-ai-coding-orchestrator-for-regulated-teams tags: compliance, audit, regulated | related: https://bernstein.run/q/how-does-the-audit-chain-work, https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator bernstein writes an hmac-chained audit log of every orchestration action, scopes credentials per agent (each worker gets only the keys it needs), keeps all state on disk in .sdd/, supports policy-engine and pii-gate plugins under src/bernstein/core/security/, and ships a bernstein audit verify command that re-checks the chain offline. there is no hosted backend in the default install; you run the orchestrator on your own box. note: the project does not claim soc 2, hipaa, or fedramp coverage. those are organisational programs around a tool, not a property of the tool itself. what bernstein gives you is the technical primitives a regulated team needs to make its own case. ### self hosted ai coding agent orchestrator URL: https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator tags: self-hosted, privacy | related: https://bernstein.run/q/what-is-bernstein, https://bernstein.run/q/audit-grade-ai-coding-orchestrator-for-regulated-teams bernstein runs entirely on your machine. pipx install bernstein, run bernstein in your repo, the orchestrator process and every agent it spawns are local. state is .sdd/ files on disk. there is no required cloud control plane, no signup, no telemetry call-home. the http task server binds 127.0.0.1:8052 by default. agents reach their llm providers directly with credentials you supply; bernstein does not proxy or store keys for them. for fleet management across multiple repos use bernstein fleet, still local. apache 2.0, source at https://github.com/sipyourdrink-ltd/bernstein. ### mcp server for multi agent coding URL: https://bernstein.run/q/mcp-server-for-multi-agent-coding tags: mcp, protocol | related: https://bernstein.run/q/how-to-add-a-cli-adapter, https://bernstein.run/q/how-does-bernstein-work bernstein exposes itself as an mcp server. start it with bernstein mcp serve (stdio) or bernstein mcp serve --http (streamable http). the tools published are bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve, plus scenario tools for the eval harness. any mcp client (claude desktop, claude code, codex, custom) can invoke bernstein orchestration as a tool call. it also consumes mcp: each agent worktree gets its own mcp config so workers can call third-party mcp servers. source: src/bernstein/mcp/server.py. ### git worktree parallel ai agents URL: https://bernstein.run/q/git-worktree-parallel-ai-agents tags: git, worktree, isolation | related: https://bernstein.run/q/how-does-bernstein-work, https://bernstein.run/q/how-to-run-multiple-claude-code-agents-in-parallel each task gets its own git worktree under .worktrees/. the worktree shares the object store with the main repo but has its own working copy and head, so two agents can edit overlapping paths at the same time without stepping on each other. bernstein creates the worktree on assignment, the agent does its work, quality gates run inside the worktree, the merge queue picks up passing worktrees in dependency order and squash-merges them into the target branch. on failure or kill the worktree stays on disk so you can inspect or cherry-pick. source: src/bernstein/core/git/worktree.py. ### hmac chained audit log for ai agents URL: https://bernstein.run/q/hmac-chained-audit-log-for-ai-agents tags: audit, hmac | related: https://bernstein.run/q/how-does-the-audit-chain-work, https://bernstein.run/q/audit-grade-ai-coding-orchestrator-for-regulated-teams bernstein writes one json line per orchestration event to .sdd/audit.log. each line carries an hmac-sha256 over (previous-line-hmac || event-payload), using a per-repo key in .sdd/audit.key. mutating or deleting any line breaks the chain at that point. bernstein audit verify walks the file and reports the first broken link. that gives a tamper-evident record of which agent took which action under which model and at what cost, in a format you can replay offline against the source key. format and signing logic: src/bernstein/core/security/audit.py. ### fail closed agent orchestrator URL: https://bernstein.run/q/fail-closed-agent-orchestrator tags: safety, fail-closed | related: https://bernstein.run/q/how-does-the-audit-chain-work, https://bernstein.run/q/what-is-the-dead-letter-queue fail-closed means: on any unrecoverable condition the orchestrator stops merging rather than risk a bad commit. budget exceeded -> drain mode (finish in-flight, start nothing new). quality gate fails after max retries -> task goes to the dead-letter queue, not main. agent heartbeat lost -> reaped, work-in-progress preserved in its worktree for inspection. audit chain broken -> bernstein refuses to start. policy-engine denies a tool call -> agent gets a denial, not a workaround. the default posture is to halt and surface the problem rather than auto-recover by guessing. config under .sdd/config.yaml controls thresholds; defaults live in src/bernstein/core/defaults.py. ### what is the bernstein dead letter queue URL: https://bernstein.run/q/what-is-the-dead-letter-queue tags: safety, dead-letter | related: https://bernstein.run/q/fail-closed-agent-orchestrator, https://bernstein.run/q/how-does-bernstein-work tasks that fail their quality gates after max retries do not merge. instead they move to .sdd/backlog/dead-letter/, with the full agent transcripts, last-known worktree path, gate output, and a synth-generated failure-class label from the eval taxonomy. you can inspect each entry, fix the prompt or split the task, and re-queue. the dead-letter pile is bernstein's escape hatch: rather than burn budget retrying a broken brief, the orchestrator parks it for a human. the eval harness also pulls from this directory to build regression cases for future runs. source: src/bernstein/core/tasks/dead_letter.py. ### how does bernstein cost tracking work URL: https://bernstein.run/q/how-does-cost-tracking-work tags: cost, budget | related: https://bernstein.run/q/fail-closed-agent-orchestrator, https://bernstein.run/q/what-models-does-bernstein-use every agent run reports per-model token usage back to the task server. bernstein attributes that cost to the task, the role (backend, qa, security ...) and the model. bernstein cost prints the breakdown per run. .sdd/metrics/cost.jsonl is the raw record. set bernstein.cost.budget_limit in bernstein.yaml to cap a run; when 80 percent is consumed the orchestrator warns, at 100 percent it drains (finish in-flight tasks, start nothing new). per-agent anomaly detection flags context-growth or token-spike behaviour so a stuck agent gets reaped before it burns the budget. source: src/bernstein/core/cost/. ### what models does bernstein use URL: https://bernstein.run/q/what-models-does-bernstein-use tags: models, routing | related: https://bernstein.run/q/how-does-bernstein-work, https://bernstein.run/q/how-does-cost-tracking-work bernstein itself does not call an llm; the cli agents you configure do. you decide which models per role in bernstein.yaml. the contextual-bandit router picks among the models you allow for a given task class and learns from outcomes. typical configurations: claude opus 4.7 for architect, claude sonnet 4.6 for backend, claude haiku 4.5 for docs, gpt-5 for review, gemini 2.5 pro for ml. local-only setups can route everything through ollama. bernstein cost prints the realised mix after each run. routing source: src/bernstein/core/routing/. ### does bernstein support local models URL: https://bernstein.run/q/does-bernstein-support-local-models tags: local, ollama, privacy | related: https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator, https://bernstein.run/q/what-models-does-bernstein-use yes. the ollama adapter targets the ollama daemon (or any openai-compatible local endpoint), so you can route every task through a locally hosted model with zero outbound api calls. mix local and cloud in the same plan: cheap, low-stakes tasks on a local llama or qwen, harder tasks on a paid model. configure under agents: in bernstein.yaml with the endpoint url and model name. ollama adapter source: src/bernstein/adapters/ollama.py. the clm adapter (sovereign llm gateway) is the production-grade variant for teams running mtls-fronted internal endpoints. ### what quality gates does bernstein run URL: https://bernstein.run/q/what-quality-gates-does-bernstein-run tags: quality, gates | related: https://bernstein.run/q/how-does-bernstein-work, https://bernstein.run/q/what-is-the-dead-letter-queue by default: ruff (lint), pyright or mypy (types), pytest (tests), bandit or semgrep (security). each gate runs inside the agent's worktree before the merge queue considers it. failed gates trigger a retry, optionally with an escalated model (sonnet -> opus). after the retry budget is exhausted the task moves to the dead-letter queue. gates are pluggable via the pluggy hookspecs in src/bernstein/plugins/, so you can add a custom gate (terraform plan, openapi diff, license check) without forking the core. configure which gates run under quality.gates in bernstein.yaml. ### how does bernstein pick tasks for agents URL: https://bernstein.run/q/how-does-bernstein-pick-tasks-for-agents tags: scheduler, routing | related: https://bernstein.run/q/how-does-bernstein-work, https://bernstein.run/q/what-models-does-bernstein-use the scheduler reads .sdd/backlog/open/ each tick, filters out tasks whose dependencies are not yet satisfied, applies a fair-priority ordering (critical -> high -> medium -> low, with starvation guards), and emits the next ready task. the router then chooses the model and the cli agent: a contextual bandit with epsilon-greedy exploration over a feature vector (role, complexity, file scope, historical outcomes). bernstein routing explain prints the candidate set, the bandit's chosen arm, and the regret estimate. source: src/bernstein/core/tasks/scheduler.py and src/bernstein/core/routing/. ### can bernstein run in ci URL: https://bernstein.run/q/can-bernstein-run-in-ci tags: ci, github | related: https://bernstein.run/q/how-to-install-bernstein, https://bernstein.run/q/what-models-does-bernstein-use yes. bernstein run plan.yaml --non-interactive --json-status exits non-zero if any task ends up in the dead-letter queue, which makes it a normal failing ci step. for the github app path, bernstein responds to /bernstein slash-commands in pr comments via src/bernstein/github_app/, and the autofix daemon (bernstein autofix) watches bernstein-opened prs, reads the ci log on failure, spawns an agent against the failing worktree, and pushes a fix commit. the runbook for ci-mode lives under templates/. running inside ephemeral runners works (worktrees are local files); long-running orchestration is happier on a persistent box. ### what is the bernstein eval harness URL: https://bernstein.run/q/what-is-the-bernstein-eval-harness tags: eval, benchmark | related: https://bernstein.run/q/deterministic-ai-coding-orchestrator, https://bernstein.run/q/what-is-self-evolution an in-tree benchmark runner under src/bernstein/eval/. ships a golden suite of curated coding tasks, an llm-as-judge for code quality, a closed failure taxonomy that labels every failure into one of ~20 categories, and a vcr fixture system so an entire run can be re-played deterministically without re-calling the llm. swe-bench integration lives under src/bernstein/benchmark/. you run bernstein eval golden to score a config or a model swap against a baseline before promoting it. the harness is also what the self-evolution loop uses to gate proposed orchestrator changes. ### what is bernstein self evolution URL: https://bernstein.run/q/what-is-self-evolution tags: evolution, experimental | related: https://bernstein.run/q/what-is-the-bernstein-eval-harness, https://bernstein.run/q/deterministic-ai-coding-orchestrator an experimental loop under src/bernstein/evolution/ that lets bernstein propose changes to itself. metrics from past runs flow into a detector, which surfaces opportunities (slow tick, expensive task class, poor gate signal). a creative pipeline drafts a change, an approval gate routes by risk score, the eval harness runs the change against the golden suite, and only proposals that improve the baseline without regressing protected invariants are applied. an invariants guard hash-locks safety-critical files so the loop cannot edit them. circuit-breaker halts evolution on safety violation. opt-in; off by default. ### what sandbox backends does bernstein support URL: https://bernstein.run/q/what-sandbox-backends-does-bernstein-support tags: sandbox, isolation | related: https://bernstein.run/q/git-worktree-parallel-ai-agents, https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator the default backend is a local git worktree. swap it via sandbox.backend in bernstein.yaml: docker (containerised), e2b (cloud sandbox), modal, blaxel, cloudflare workers sandboxes, daytona, runloop, vercel sandboxes. each is a plug-in implementing the sandboxbackend protocol in src/bernstein/core/sandbox/. install the matching extra (pip install "bernstein[e2b]", [modal], [cloudflare], ...). the orchestrator and adapters are unchanged; only the place where the agent's filesystem and process live changes. useful when you want hard isolation, or when the agent needs a different os than the orchestrator host. ### does bernstein have a tui URL: https://bernstein.run/q/does-bernstein-have-a-tui tags: tui, ui | related: https://bernstein.run/q/how-does-bernstein-work yes. bernstein dashboard or bernstein live opens a textual-based tui with the task list, live agent logs, cost sparkline, quality-gate panel, scheduling visualiser, and approval overlay. bindings are vim-friendly, themable, and configurable per project (tui keybindings live under src/bernstein/tui/). a rich-based fallback renders on terminals that do not support textual. for plain text, bernstein status prints a one-shot snapshot. the tui talks to the same task server (127.0.0.1:8052) the cli uses, so you can run it in a separate terminal against a long-running orchestration. ### how do i stop bernstein URL: https://bernstein.run/q/how-do-i-stop-or-pause-bernstein tags: lifecycle, stop | related: https://bernstein.run/q/fail-closed-agent-orchestrator, https://bernstein.run/q/how-does-bernstein-work bernstein stop sends a graceful drain signal: in-flight tasks finish, no new tasks start, the orchestrator exits clean. bernstein stop --hard kills the process group via the pid file under .sdd/runtime/. never grep for the word bernstein and kill matches; you risk hitting unrelated processes that contain the substring. on relaunch the orchestrator reads .sdd/ back, sees the unfinished tasks, and resumes from the last persisted checkpoint (wal in src/bernstein/core/persistence/). worktrees and logs from the previous run are preserved. ### bernstein vs emdash URL: https://bernstein.run/q/how-does-bernstein-compare-to-emdash tags: compare, emdash | related: https://bernstein.run/q/bernstein-vs-claude-flow, https://bernstein.run/q/what-is-bernstein emdash is an electron desktop app that wraps cli coding agents in a gui, 23 adapters, typescript. bernstein is a python cli plus library plus mcp server, 37 adapters, with a textual tui rather than a native window. emdash is the right pick if you want a downloadable desktop ade with one-click setup. bernstein is the right pick if you want headless ci / mcp / library use, a deterministic scheduler, an hmac audit chain, and pluggable cloud sandboxes. they are not direct substitutes; some teams run emdash for interactive work and bernstein for unattended batch runs. ### how does bernstein handle secrets URL: https://bernstein.run/q/how-does-bernstein-handle-secrets tags: secrets, credentials | related: https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator, https://bernstein.run/q/audit-grade-ai-coding-orchestrator-for-regulated-teams credentials are scoped per agent under src/bernstein/core/credential_scoping.py: each spawned worker gets only the env vars it declares it needs, never the orchestrator's full environment. for long-lived credentials, bernstein connect writes to the os keychain (macOS keychain, gnome-keyring, kdewallet) and every subsequent run reads from there, so api keys never live in the repo. pii gating in src/bernstein/core/security/ keeps flagged files out of agent context. the orchestrator never persists keys to .sdd/. bring-your-own-key model: bernstein adds no signup and stores no provider credentials on its side. ### bernstein vs crewai URL: https://bernstein.run/q/bernstein-vs-crewai tags: compare, crewai | related: https://bernstein.run/q/bernstein-vs-claude-flow, https://bernstein.run/q/bernstein-vs-langgraph, https://bernstein.run/q/deterministic-ai-coding-orchestrator crewai is a python framework for building llm-driven multi-agent crews where the coordinator itself is an llm reasoning over role prompts. bernstein orchestrates external cli coding agents (claude code, codex, aider, 34 others) with a deterministic python scheduler and merges their git worktrees. crewai shines when the work is conversational reasoning over tools; bernstein shines when the work is actual code edits that need lint, type, test, security gates before merge. you can run a crewai agent under bernstein via the generic or openai-agents adapter if you want crewai's role logic under bernstein's audit chain and merge queue. compare matrix: https://bernstein.run/vs/crewai. ### bernstein vs langgraph URL: https://bernstein.run/q/bernstein-vs-langgraph tags: compare, langgraph | related: https://bernstein.run/q/bernstein-vs-crewai, https://bernstein.run/q/bernstein-vs-openai-agents-sdk, https://bernstein.run/q/mcp-server-for-multi-agent-coding langgraph is langchain's graph runtime for stateful agent workflows: nodes, edges, checkpoints, time-travel debug. bernstein is a cli-agent orchestrator: it spawns claude code, codex, aider, ollama, and 33 other adapters in git worktrees and merges what passes lint, type, test, security gates. langgraph helps you design the agent's thinking; bernstein dispatches the agents that do the editing. they compose: a langgraph node can invoke bernstein via the mcp server to delegate a coding subtask to a parallel fleet. pick langgraph for orchestrating thought; pick bernstein for orchestrating commits. matrix at https://bernstein.run/vs/langgraph. ### bernstein vs openai swarm URL: https://bernstein.run/q/bernstein-vs-openai-swarm tags: compare, openai-swarm | related: https://bernstein.run/q/bernstein-vs-openai-agents-sdk, https://bernstein.run/q/bernstein-vs-claude-flow openai swarm is a lightweight reference framework from openai for routing between agent objects via handoffs, in-process, openai-only. bernstein orchestrates real cli coding agents (claude code, codex, aider, gemini, plus 33 more) in parallel git worktrees with a deterministic python coordinator and quality gates. swarm is great for prototyping multi-step llm tool-use against the openai api. bernstein is what you reach for once those agents need to actually write code, get lint+type+test gates, and merge to a real branch with an audit trail. matrix at https://bernstein.run/vs/openai-swarm. ### how to resume bernstein after a crash URL: https://bernstein.run/q/how-to-resume-a-killed-bernstein-run tags: recovery, lifecycle | related: https://bernstein.run/q/how-do-i-stop-or-pause-bernstein, https://bernstein.run/q/fail-closed-agent-orchestrator bernstein writes a write-ahead log to .sdd/persistence/ on every state transition. when you restart with bernstein run or bernstein resume, the orchestrator loads the wal, rebuilds the task table, re-attaches to live worktrees under .worktrees/, and continues from the last persisted checkpoint. tasks that were mid-flight when the process died are re-queued with a +1 attempt counter; the prior worktree is preserved so you can diff it before the retry overwrites the branch. never grep for the literal string bernstein and kill matches; use the pid file under .sdd/runtime/ instead (bernstein stop --hard does this for you). ### how to keep claude code from blowing context window URL: https://bernstein.run/q/how-to-keep-claude-code-from-blowing-context tags: claude-code, context, tuning | related: https://bernstein.run/q/how-to-run-multiple-claude-code-agents-in-parallel, https://bernstein.run/q/how-does-bernstein-work the cause of context blow-up is one long-lived agent doing many tasks. bernstein addresses it structurally: each task gets a fresh claude code subprocess in its own git worktree and exits when the task finishes (typically 1-3 tasks per agent). nothing carries over between tasks except the file system, so context starts near-empty every time. for tasks where you do want a longer thread (large refactors), bernstein.orchestration.context_window_strategy can be set to extend or stream, and the caching_adapter prefix-dedups across spawns. tokens still grow inside one task; the win is that you stop paying for hour-old prose to be re-attended. source: src/bernstein/core/streaming_merge.py, src/bernstein/adapters/caching_adapter.py. ### can bernstein use ollama for coding agents URL: https://bernstein.run/q/can-bernstein-use-local-ollama-llama-qwen tags: local, ollama, self-host | related: https://bernstein.run/q/does-bernstein-support-local-models, https://bernstein.run/q/what-models-does-bernstein-use, https://bernstein.run/q/self-hosted-ai-coding-agent-orchestrator yes. set agents..adapter: ollama in bernstein.yaml with endpoint (default http://127.0.0.1:11434) and model (e.g. qwen2.5-coder:32b, llama3.1:70b, deepseek-coder:33b). the ollama adapter at src/bernstein/adapters/ollama.py speaks the ollama or any openai-compatible /v1/chat/completions endpoint, so vllm, llama.cpp server, lm studio, and tabbyapi all work. mix-mode is supported: route docs through ollama and architecture through claude opus in the same plan; bernstein.cost prints the realised split. zero outbound api calls when every agent in a run points at ollama, which is what teams under strict data residency rules ask for. ### how to run bernstein in github actions URL: https://bernstein.run/q/how-to-run-bernstein-in-github-actions tags: ci, github-actions | related: https://bernstein.run/q/can-bernstein-run-in-ci, https://bernstein.run/q/how-to-install-bernstein the supported pattern is bernstein run plan.yaml --non-interactive --json-status as a normal job step. provide the cli agent's api key via secrets (anthropic_api_key, openai_api_key) and let bernstein scope it per worker. the job exits non-zero on any dead-letter, which fails the workflow. worktrees and .sdd/ live in the runner's workspace and are torn down with it. for the slash-command flow (/bernstein fix on a pr), install the github app (src/bernstein/github_app/) and point it at the same secret store. recipe: https://bernstein.run/recipes/github-actions. heavy orchestration is happier on a persistent box than on ephemeral runners. ### how does bernstein handle merge conflicts between parallel agents URL: https://bernstein.run/q/how-to-handle-merge-conflicts-between-agents tags: git, merge, conflicts | related: https://bernstein.run/q/git-worktree-parallel-ai-agents, https://bernstein.run/q/fail-closed-agent-orchestrator the merge queue serializes worktree merges in dependency order, so two passing worktrees never run their merges concurrently. if a worktree's branch has drifted from main while the agent worked, bernstein attempts a 3-way merge; on conflict it triggers the resolver role with a prompt that includes both sides of the diff and the merge-base. if the resolver fails, the task moves to dead-letter rather than committing a botched merge (fail-closed). speculative worktrees (the optional plan-ahead mode) are rebased and re-gated before they enter the queue. source: src/bernstein/core/git/ and src/bernstein/core/orchestration/merge_queue.py. ### cheapest way to run bernstein URL: https://bernstein.run/q/what-is-the-cheapest-way-to-run-bernstein tags: cost, budget, local | related: https://bernstein.run/q/how-does-cost-tracking-work, https://bernstein.run/q/can-bernstein-use-local-ollama-llama-qwen free tier: route every task through ollama against a local model (qwen2.5-coder or deepseek-coder); zero api spend, only your electricity. cheap cloud: claude haiku 4.5 for docs, claude sonnet 4.6 for backend, only escalate to opus on retry; cap with bernstein.cost.budget_limit. mixed: pin the architect role to a paid model, run the rest on ollama. the caching_adapter (src/bernstein/adapters/caching_adapter.py) deduplicates prompt prefixes across spawns so repeat work does not get re-billed. bernstein cost --tail tracks spend live; the drain threshold (default 80 percent) stops new tasks before you blow the cap. ### how to write a bernstein plan yaml URL: https://bernstein.run/q/how-to-write-a-bernstein-plan-yaml tags: plan, yaml, how-to | related: https://bernstein.run/q/how-to-run-multiple-claude-code-agents-in-parallel, https://bernstein.run/q/how-to-install-bernstein minimal plan.yaml: a top-level stages list, each stage holds steps, each step has a name, a role (backend, qa, docs, ...), and an instruction string. file paths under files: scope the worktree. dependencies via depends_on: keep ordering tight. example: stages: [{ name: feat-x, steps: [{ name: design, role: architect, instruction: ...}, { name: implement, role: backend, depends_on: [design], instruction: ...}]}]. run with bernstein run plan.yaml. for ad-hoc goals skip the yaml entirely: bernstein -g "goal" --max-agents 5 lets bernstein decompose the goal into a synthetic plan on the fly. full schema: https://bernstein.run/docs/plan-yaml. ### what is the bernstein autofix daemon URL: https://bernstein.run/q/what-is-the-bernstein-autofix-daemon tags: autofix, ci | related: https://bernstein.run/q/can-bernstein-run-in-ci, https://bernstein.run/q/how-to-run-bernstein-in-github-actions bernstein autofix watches prs opened by bernstein for ci failures. on red ci it pulls the failing job's logs, parses them via src/bernstein/adapters/ci/, spawns an agent against the worktree that opened the pr, gives it the failure trace as context, and pushes a fix commit to the same pr branch. the daemon respects the same budget, audit, and policy gates as a normal bernstein run; it does not bypass approval if approval is required for the touched paths. typical use: the operator goes to sleep, bernstein opens 3 prs overnight, autofix patches the two that go red so the morning queue has 3 green prs ready to review. ### does bernstein use mcp servers URL: https://bernstein.run/q/does-bernstein-have-an-mcp-client tags: mcp, client, config | related: https://bernstein.run/q/mcp-server-for-multi-agent-coding, https://bernstein.run/q/how-to-add-a-cli-adapter yes. each agent worktree gets its own .mcp.json built from the merged config in bernstein.yaml mcp: section plus the agent's own discovery (claude code reads ~/.config/claude/mcp.json, codex reads its own). bernstein passes mcp tool calls through transparently; it does not proxy them. the merger lives in src/bernstein/adapters/claude_mcp_loader.py. you can also expose bernstein itself as an mcp server (see mcp-server-for-multi-agent-coding) so a claude desktop session can dispatch coding tasks into a bernstein fleet via tool calls. mcp config is per task, so a security task can be given a different toolset than a docs task. ### how does bernstein scope tool permissions for agents URL: https://bernstein.run/q/how-does-bernstein-scope-permissions tags: permissions, approval, security | related: https://bernstein.run/q/how-does-bernstein-handle-secrets, https://bernstein.run/q/fail-closed-agent-orchestrator two layers. first, credential scoping (src/bernstein/core/credential_scoping.py): each agent only sees the env vars it declared needing, so a docs agent never gets the prod database url. second, tool-call approval (src/bernstein/core/approval/): every shell, file-write, network, and mcp call routes through a pluggable approval gate. defaults are off-line-safe (writes inside worktree are auto-approved, anything else surfaces). policy plugins under src/bernstein/plugins/ let you wire pii gates, license checks, or organisation-specific denylists. the security_review plugin scans the produced diff before merge. progressive disclosure (plugins/permission_explain.py) shows why approval is being asked for, so you can decide informedly rather than blanket-approving. ### what is the bernstein fleet dashboard URL: https://bernstein.run/q/what-is-the-bernstein-fleet-dashboard tags: fleet, tui, supervise | related: https://bernstein.run/q/does-bernstein-have-a-tui, https://bernstein.run/q/how-does-bernstein-work bernstein fleet supervises multiple bernstein projects from one terminal. point it at a list of repos in ~/.config/bernstein/fleet.yaml and it polls each repo's task server (127.0.0.1:8052 by default), aggregates task counts, costs, and gate health, and renders them in a single textual panel. useful when one operator runs 4-6 parallel bernstein orchestrations across different products: instead of cycling through terminals, the fleet panel surfaces the one that needs attention. all reads are local; the dashboard never phones home. source: src/bernstein/core/fleet/. --- ## Blog Posts - [bernstein 2.x recap: lineage, ten trackers, A2A capability cards, and a CI that started fixing itself](https://bernstein.run/blog/v2-x-recap) - Thirteen releases since the 1.10 recap consolidated into nine themes: a per-artefact transparency log with Ed25519 signatures, ten tracker adapters from Jira to Plane, A2A capability cards, MCP client and server hardening, a Playwright sandbox for UI agents, a secrets broker, supply-chain coverage with SBOM and OSSF Scorecard, calibrated cost guards, and a web UI plus PWA in the wheel. - [bernstein 2.0.0: a web UI ships in the wheel, CLI unchanged](https://bernstein.run/blog/v2-0-release) - Bernstein 2.0 ships a FastAPI + React web UI inside the wheel. CLI and TUI surfaces are unchanged, configs do not move, agents and adapters keep working. - [bernstein 1.10.x recap: agents.md sync, a2a, cost guards](https://bernstein.run/blog/v1-10-x-recap) - five point releases in five days: agents.md cross-cli sync, runtime cost guards, a2a v1.0 signed agent cards, four new cli adapters. - [Shipping the orchestrator onto someone else's box](https://bernstein.run/blog/orchestrator-on-someone-elses-box) - On-prem deployment notes for Bernstein 1.10: cluster mTLS, signed lineage, air-gapped install, lethal-trifecta capability gate. Not an install guide. - [We orchestrate the orchestrators now: Composio + ralphex adapters](https://bernstein.run/blog/orchestrate-the-orchestrators) - Bernstein adapters for Composio's @aoagents/ao and umputun/ralphex. Leaf-node delegation, not deep meta-orchestration: each runs as a single agent in a plan. - [A daemon that closes its own pull requests](https://bernstein.run/blog/autofix-daemon) - How the Bernstein autofix daemon turns a red CI run on a Bernstein-opened PR into a fix commit. Capability gating and budget caps keep it from being a footgun. - [bernstein 1.9.0: ACP bridge, CI autofix daemon, keychain creds](https://bernstein.run/blog/v1-9-release) - ACP bridge so Zed can dispatch tasks, a daemon that closes its own pull requests, OS keychain credentials, sandboxed preview server with a public URL. - [Four commands that take the glue out of multi-agent runs](https://bernstein.run/blog/operator-pack) - Bernstein 1.8.14 ships pr, from-ticket, remote, hooks. The four shell snippets every multi-agent team ends up writing by hand, now built into the CLI. - [Four commands that turn the orchestrator into a service](https://bernstein.run/blog/operator-commands) - 1.8.15 ships a chat bridge, mid-run approval, a tunnel wrapper, and a daemon installer. Less script you sit next to, more thing you install once. - [The install we should have shipped at launch](https://bernstein.run/blog/frictionless-install) - Three months after launch Bernstein got a real curl | sh one-liner. It only happened because a community contributor picked up the issue we kept deferring. - [orchestration primitive vs desktop ade: pick the right layer](https://bernstein.run/blog/orchestrator-vs-desktop-ade) - multi-agent coding split into two shapes: orchestration primitives (bernstein, workz) vs desktop ades (emdash, conductor). when to reach for each. - [getting started: first multi-agent claude code run in 5 min](https://bernstein.run/blog/getting-started) - install bernstein, point it at claude code (or codex/gemini cli), run a goal in parallel, read the tui. five minutes if python 3.12 is ready. - [community spotlight: april 2026 bernstein contributors](https://bernstein.run/blog/community-spotlight-april-2026) - first community spotlight. contributors who shaped bernstein's architecture decomposition, adapter list, windows support, cost-aware router. - [agents on cloudflare: workers, durable objects, r2, d1](https://bernstein.run/blog/cloudflare-cloud-execution) - bernstein 1.8.4 cloudflare backend for ai coding agents: workers run agents, durable workflows handle multi-step tasks, r2 + d1 hold state. - [refactor a 4,000-line python file with 11 parallel ai agents](https://bernstein.run/blog/module-decomposition) - 11 parallel ai coding agents split a 4,198-line python file into 22 sub-packages in three hours. how the decomposition pass actually ran. - [Picking a cheaper model when the task allows](https://bernstein.run/blog/cost-aware-routing) - Bernstein's epsilon-greedy bandit picks a model per task. Internal runs cut roughly in half. Measure your own with bernstein cost. - [bernstein 1.0: open-source orchestrator for ai coding agents](https://bernstein.run/blog/introducing-bernstein) - Orchestrate Claude Code, Codex, Gemini CLI + 40 other CLI coding agents in parallel git worktrees. Deterministic scheduler, HMAC-signed audit chain.