Compare

bernstein vs openai agents sdk

These are not really competitors - they live at different abstraction levels. OpenAI Agents SDK is an in-process Python framework for building agents that call OpenAI models. Bernstein is one layer up: a scheduler that spawns whole CLI coding agents in parallel git worktrees. Bernstein ships an adapter that runs the Agents SDK as one of those agents.

Last checked against the upstream README and Quickstart on 2026-05-17. Sources: https://github.com/openai/openai-agents-python, https://openai.github.io/openai-agents-python/quickstart/.

tl;dr

Dimension	OpenAI Agents SDK	Bernstein
Abstraction	In-process Python framework for LLM agents	Out-of-process scheduler for CLI coding agents
What it spawns	OpenAI model calls (Responses API) inside your process	Subprocesses: Claude Code, Aider, Codex CLI, Cursor, ...
Coordination	Handoffs, guardrails, sessions (LLM-driven)	YAML task graph, deterministic Python scheduler (no LLM)
Provider lock-in	OpenAI-first (Responses API native; others via Chat Completions)	Provider-agnostic by design (40+ adapters across vendors)
Isolation	One process, one event loop	One git worktree per agent, merged on green gates
Install	`pip install openai-agents`	`pipx install bernstein`

what each tool actually does

OpenAI Agents SDK

The README describes it as a lightweight, powerful framework for multi-agent workflows. It gives you five primitives in Python: Agent (an LLM plus instructions, tools, and guardrails), Tool (function/MCP/hosted), Guardrail (input/output safety check), Handoff (one agent delegating to another), and Session (automatic conversation history). A Runner executes the graph. Tracing is built in. Everything runs inside one Python process; the model calls leave the box, the orchestration does not.

Source: github.com/openai/openai-agents-python.

Bernstein

Bernstein orchestrates external CLI coding agents - the actual claude, aider, codex, cursor binaries. Each task in bernstein.yaml is mapped to an adapter, spawned in its own git worktree, monitored, and merged back only after quality gates pass. The control loop is plain Python with no LLM. Every routing decision lands in an HMAC-SHA256 hash-chained log under .sdd/. Forty-plus adapters ship in-tree, including one for the OpenAI Agents SDK itself.

Source: github.com/sipyourdrink-ltd/bernstein.

when to pick openai agents sdk

You are building one Python service that talks to OpenAI models. The Agents SDK is purpose-built for that shape and the API surface is small.
You want the SDK's handoffs / guardrails / session primitives - those are real engineering, not framework theatre, and you do not get them for free from Bernstein.
You want OpenAI's built-in tracing dashboard and Responses-API features (built-in web search, file search, code interpreter) wired up out of the box.
Your "agents" are LLM personas with tools, not external CLI tools that edit files. The SDK is at the right altitude for that.

when to pick bernstein

The agents you want to coordinate are themselves CLI coding tools - Claude Code, Aider, Codex CLI - not in-process LLM personas. Bernstein is built for that.
You want to mix providers per task. Claude Opus on architect, GPT-5.5 on backend, DeepSeek on docs, all under one scheduler.
You need isolation by git worktree, so two concurrent agents cannot corrupt each other's edits. The merge is the only place state combines.
You need a tamper-evident, replayable audit log on disk - the HMAC-SHA256 chain under .sdd/ is a Bernstein primitive.

same task in both tools

"Answer a question, then have a second agent fact-check the answer."

OpenAI Agents SDK

import asyncio
from agents import Agent, Runner

agent = Agent(
    name="History Tutor",
    instructions="You answer history questions clearly and concisely.",
)

async def main():
    result = await Runner.run(agent, "When did the Roman Empire fall?")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Quickstart shape, from the upstream docs (2026-05-17). Fact-checking is added with a second Agent plus a Handoff.

Bernstein

# bernstein.yaml
tasks:
  - id: draft
    role: analyst
    agent: claude
    model: opus
    prompt: "Draft a short answer to the user's question."
  - id: factcheck
    role: adversary
    agent: codex          # different binary, different provider
    model: gpt-5.5
    depends_on: [draft]
    prompt: "Read the draft from {{draft.output}} and flag anything unverified."

Each task runs in its own git worktree. The merge to the working branch only happens after the gate set (lint / types / tests / security) is green.

honest gaps

The Agents SDK is the right answer when the work is one Python service and the OpenAI ecosystem is where you want to live. It has handoffs, guardrails, sessions, and tracing as first-class primitives; Bernstein does not reimplement those. The SDK is also backed by OpenAI, so its support for new Responses-API features lands first there. Bernstein is younger, has a narrower contributor base, and is built for a different shape of work (coordinating real CLI tools in git, not orchestrating in-process LLM personas). Pick the layer the problem actually lives at; if you want both, wrap the SDK with the openai_agents adapter and let Bernstein schedule it alongside everything else.

faq

Are these competitors?

Not really. OpenAI Agents SDK is an in-process Python framework for building agents that call OpenAI models via the Responses API. Bernstein is one layer up: a scheduler that spawns whole CLI coding agents (Claude Code, Aider, Codex, etc.) in git worktrees. Bernstein's openai_agents.py adapter wraps the SDK as one of those agents.

Can I use the OpenAI Agents SDK inside Bernstein?

Yes - bernstein/adapters/openai_agents.py spawns an Agents SDK session as one worker among many. The SDK owns its handoffs / guardrails / sessions inside that worker; Bernstein owns the git worktree, the routing decision, and the merge gates around it.

When is the Agents SDK enough on its own?

When the work is a single Python service calling OpenAI models, you control the code end-to-end, you want the SDK's built-in tracing dashboard, and you do not need to involve CLI coding tools (Claude Code, Aider, Codex CLI) or have a tamper-evident audit log on disk.

When is Bernstein the right layer instead?

When the agents you want to coordinate are themselves CLI coding tools, when you want to mix model providers per role, when isolation between concurrent edits must be by git worktree rather than by careful coding, or when you need an HMAC-chained audit log on disk for compliance review.

Does Bernstein support handoffs and guardrails like the SDK?

Bernstein has its own task graph (depends_on / role / agent routing), deterministic in Python and outside the LLM. It does not implement the SDK's handoff primitive verbatim. If you want the SDK's handoff semantics inside one worker, run that worker through the openai_agents adapter and let it do its thing.