From 4,000 Lines to 200: Decomposing Bernstein's Core
Bernstein's orchestrator.py hit 4,198 lines. We used 11 parallel agents — orchestrated by Bernstein itself — to decompose it into 15 sub-packages, each under 400 lines. Here's how that worked and what we learned.
How a file gets to 4,000 lines
It happens gradually. The orchestrator started as a clean 300-line module that managed a tick loop: check for tasks, spawn agents, collect results. Then it grew. Cost tracking logic. Quality gates. Token monitoring. Git worktree management. Heartbeat detection. Idle agent recycling. Shutdown coordination.
Each addition was small and reasonable. But after two months of active development, orchestrator.py was a 4,198-line monolith that imported 47 modules and had 23 public methods. The test file was 2,800 lines. IDE navigation was painful. Merge conflicts were constant because every feature touched the same file.
The rule we now follow: if a module crosses 600 lines, it's time to decompose.
The plan
We defined 15 target sub-packages, each responsible for one concern:
| Sub-package | Responsibility | Lines (after) |
|---|---|---|
orchestration/ | Lifecycle, tick pipeline | ~350 |
agents/ | Spawner, discovery, heartbeat | ~380 |
tasks/ | Task store, retry, scheduling | ~340 |
quality/ | Quality gates, CI monitor | ~290 |
cost/ | Cost tracking, budgets | ~310 |
tokens/ | Token monitoring, intervention | ~250 |
security/ | Audit logs, policy engine | ~270 |
git/ | Worktree management, merge queue | ~280 |
persistence/ | WAL, checkpointing | ~220 |
planning/ | Plan loading, dependencies | ~200 |
routing/ | Model selection, bandit | ~320 |
communication/ | Bulletin board, messaging | ~180 |
server/ | Task server, API | ~260 |
config/ | Configuration, defaults | ~190 |
observability/ | Metrics, tracing | ~240 |
The decomposition needed to be backward-compatible. Existing code importing from bernstein.core.orchestrator import Orchestrator had to keep working.
11 agents, 15 packages
Here's the recursive part: we used Bernstein to execute the decomposition. A YAML plan defined 15 extraction stages with dependency edges (e.g., tasks/ had to be extracted before agents/ because the spawner depends on the task store).
11 agents ran in parallel across independent sub-packages. Each agent:
- Extracted the relevant functions and classes from
orchestrator.py - Created the new sub-package with proper
__init__.pyexports - Updated all internal imports
- Ran the sub-package's tests to verify nothing broke
The whole decomposition took about 3 hours of wall time. A human doing this manually — carefully moving code, fixing imports, running tests after each change — would spend 2-3 days.
The re-export shim pattern
Backward compatibility was the hardest constraint. We solved it with re-export shims. The original orchestrator.py became a thin file that imports from sub-packages and re-exports:
# src/bernstein/core/orchestrator.py (after — ~200 lines, down from 4,198)
"""Orchestrator shim — re-exports from sub-packages for backward compat."""
from bernstein.core.orchestration.lifecycle import Orchestrator
from bernstein.core.orchestration.tick import TickPipeline
from bernstein.core.orchestration.manager import OrchestratorManager
from bernstein.core.orchestration.shutdown import ShutdownCoordinator
__all__ = ["Orchestrator", "TickPipeline", "OrchestratorManager", "ShutdownCoordinator"]Every existing import path works unchanged. New code imports from the specific sub-package. Over time, the shims can be deprecated.
What we learned
Dependency graphs matter more than you think. The extraction order was critical. Extracting git/ before tasks/ would have created circular imports because the merge queue references task completion callbacks. We had to map the dependency graph before writing the plan.
Tests are the safety net. Each extraction step ran the full test suite. We caught 14 import errors, 3 circular dependencies, and 1 subtle bug where a function relied on module-level state that moved to a different file. Without tests, at least half of those would have shipped broken.
600 lines is a good limit. After the decomposition, the largest sub-package is agents/ at ~380 lines. Every module is small enough to read in one sitting, grep effectively, and test in isolation. When a new file starts approaching 600 lines, we split it proactively.
Orchestrators can orchestrate themselves. There's something satisfying about using your own tool to refactor itself. The decomposition was one of our most complex multi-agent runs, and it validated that the parallel execution model works for real refactoring tasks, not just greenfield code generation.
The result
Before: 1 file, 4,198 lines, 47 imports, constant merge conflicts. After: 15 sub-packages, ~280 lines average, clean dependency boundaries, agents can work on different packages without conflicts.
The full source is on GitHub. The re-export shims are in the top-level files like orchestrator.py, spawner.py, and task_lifecycle.py.
Further reading
- How Bernstein routes tasks to the right model — the routing sub-package in action
- Running agents on Cloudflare — cloud execution built on the decomposed architecture
- Getting started — try a multi-agent session yourself