Compare

bernstein vs claude-flow

claude-flow is a Claude-Code-native swarm layer: hooks, memory, and self-organizing agents wired tightly into Claude. Bernstein is one layer up - a deterministic Python scheduler that routes work across 40+ CLI coding agents, with Claude Code as one of them. This page is for picking which abstraction fits the problem in front of you.

Last checked against the upstream README on 2026-05-17. Source: https://github.com/ruvnet/claude-flow.

tl;dr

Dimension	claude-flow	Bernstein
Target agent	Claude Code first; multi-provider routing as a sub-feature	40+ CLI agents (Claude, Codex, Aider, Cursor, Gemini, Goose, ...)
Coordinator	Swarm / hive-mind topologies inside Claude	Deterministic Python scheduler, no LLM in the control loop
Isolation model	Hooks + MCP inside a Claude session	One git worktree per agent, merged only after gates pass
Audit trail	Swarm telemetry, project-defined storage	HMAC-SHA256 hash-chained log on disk under `.sdd/`
Runtime	Node (npm package)	Python (pipx install)
License	MIT	Apache-2.0

what each tool actually does

claude-flow

The upstream README describes claude-flow as multi-agent AI orchestration for Claude Code. In practice that means a Node-based runtime that registers Claude hooks, a vector-indexed memory store, and a set of swarm topologies (hierarchical, adaptive). Specialised agent personas can self-organise around a goal, share memory, and trigger background workers for testing, security audits, and optimisation. The whole thing is built around Claude Code as the execution surface; multi-provider routing exists but is downstream of that.

Source: github.com/ruvnet/claude-flow.

Bernstein

Bernstein is a Python CLI orchestrator. It reads a flat YAML of tasks, picks the best CLI agent for each based on per-role pass-rate history, spawns it in a dedicated git worktree, watches the run, and merges the worktree back only if the configured quality gates (lint, types, tests, security scan) pass. The coordinator is plain Python: no LLM in the control loop, no swarm metaphor. Forty-plus adapters (Claude Code, OpenAI Codex, Aider, Cursor, Gemini CLI, Goose, OpenHands, and so on) plug in behind the same interface. Every routing decision and gate outcome lands in an HMAC-chained log under .sdd/.

Source: github.com/sipyourdrink-ltd/bernstein.

when to pick claude-flow

Your stack is Claude-Code-first and you want to stay there. The hook system and memory loop are wired into Claude in a way nothing one layer up can replicate.
You want a swarm metaphor: many specialised Claude personas organising themselves around a goal, sharing memory through a vector store, and triggering background workers automatically.
You need agent federation across machines - encrypted, peer-to-peer Claude-to-Claude collaboration with a trust score.
You are happy on Node and want a project that bundles the agent runtime, the memory store, and the topology library as one install.

when to pick bernstein

You want to mix models. The same job can run Claude Opus on architect, GPT-5.5 on backend, and DeepSeek on docs - one scheduler, one trace.
The control loop has to be deterministic and replayable. Bernstein is Python with no LLM in the scheduler; a run is reproducible from its trace and the HMAC log catches tampering.
You need real isolation between concurrent agents. Each adapter spawns inside its own git worktree, so two agents editing the same file cannot corrupt each other - the merge step is the only place state combines.
You want to wrap Claude Code (and claude-flow, if you have it) as one player in a larger fleet rather than the whole stack.

same task in both tools

"Have an architect propose a refactor, then a QA agent write tests against it."

claude-flow - swarm prompt

npx claude-flow@alpha init
# inside Claude Code, with the claude-flow MCP server registered:
# > /swarm "refactor src/auth into stateless tokens; QA writes regression tests"

Pattern from the upstream README (2026-05-17). Exact command name may shift between releases.

Bernstein - YAML

# bernstein.yaml
tasks:
  - id: refactor-auth
    role: architect
    agent: claude          # adapter
    model: opus
    prompt: "Refactor src/auth into stateless tokens."
  - id: qa-auth
    role: qa
    agent: aider           # different adapter, different model
    model: sonnet
    depends_on: [refactor-auth]
    prompt: "Write regression tests covering the refactor."

Each task runs in its own git worktree; merge only on green gates.

honest gaps

claude-flow has a much richer Claude-side feature set than Bernstein will ever have - neural-pattern learning, agent federation, a large set of hook integration points, and a packaged plugin ecosystem. If the answer to "what orchestrates my agents" is meant to itself be intelligent, claude-flow is closer to that than Bernstein, which is intentionally dumb in the scheduler. Bernstein is also younger and has a smaller community; the feature surface is narrower by design and the ecosystem of community plugins is smaller. Pick the layer the problem actually lives at.

faq

Does Bernstein replace claude-flow?

Only partially. claude-flow is Claude-Code-specific - its hooks, memory store, and swarm topologies are wired into Claude. Bernstein is a layer above CLI agents in general; it spawns Claude Code (and 40+ others) as workers. If your stack is Claude-only, claude-flow will go deeper than Bernstein on Claude-internal coordination.

Can I run claude-flow under Bernstein?

Yes, indirectly. Bernstein's Claude Code adapter (adapters/claude.py) spawns Claude Code, and Claude Code can have claude-flow registered as an MCP server. Bernstein then owns the per-task git worktree, audit chain, and merge gates; claude-flow owns Claude-side coordination inside each spawned worker.

What does Bernstein not do that claude-flow does?

Bernstein has no neural/SONA-style learning loop on top of Claude, no built-in agent federation across machines, and no Claude-specific hook system. If those primitives are load-bearing for your design, claude-flow is the closer fit.

How is the audit trail different?

Bernstein writes an HMAC-SHA256 hash-chained log under .sdd/ covering routing decisions, agent spawns, tool calls, and quality-gate outcomes. claude-flow records swarm telemetry but the storage shape is its own; if a regulator wants a replayable, tamper-evident chain on disk, that is a Bernstein primitive.

Which one should I pick if my stack is Claude-only?

If your code never leaves Claude Code and you want a swarm metaphor with hooks, vector memory, and self-organising personas, claude-flow is the closer fit. Bernstein adds value when you mix models, want git-worktree isolation between agents, or need a tamper-evident audit log on disk.