01 - Executive Summary
A single LLM trying to do every task simultaneously is the wrong unit of compute. The Distributed Model splits work across purpose-fit models while keeping a single orchestrator responsible for decisions, code edits, and the final output. The result: faster sessions, lower token cost, and more headroom for hard reasoning, with no quality regression.
02 - The Issue
When one model is asked to read every byte, generate every line, and reason about every decision in a long-running session, three failure modes compound: token cost rises faster than value, context fills up with low-signal data, and the model spends premium reasoning capacity on grunt work. The Distributed Model resolves all three by routing work to whichever model is best suited to it.
The three failure modes that drive this
| Failure mode | What it looks like | How distribution fixes it |
|---|---|---|
| Token cost on grunt work | Premium model reads 200KB of logs at premium rates. | A specialist reads or compresses bulky inputs at a fraction of the cost; the orchestrator only reads what matters. |
| Context bloat | Half the conversation window is filled with tool output noise. | Pre-orchestrator hooks compress mechanically; what enters the context is signal-dense. |
| Reasoning monoculture | One model's blind spots become every blind spot. | Targeted second opinions from different models surface what the orchestrator alone would miss. |
03 - The Strategy
The architecture is intentionally narrow: one model holds the plan and writes the final output, three others handle the work that does not need premium reasoning. Delegation is automatic, never operator-driven; the orchestrator is the only model the user converses with.
Roles
The three triggering layers
04 - The Process
A simplified sequence of one user request through the Distributed Model. Each band is a turn; arrows mark delegation events.
| Step | What happens | Cost class |
|---|---|---|
| 1 | User submits a prompt | n/a |
| 2 | UserPromptSubmit hook silently appends routing guidance | free (local hook) |
| 3 | Orchestrator delegates a bulky read to Gemini Flash | cheap specialist tokens |
| 4 | Specialist returns a compressed summary; orchestrator integrates | premium tokens, but on a small input |
| 5 | PostToolUse hook deterministically compresses any tool output before it enters context | free (local hook) |
| 6 | Orchestrator requests a code-review second opinion from OpenAI o-series | specialist tokens |
| 7 | Consultant returns; orchestrator weighs against its own analysis | premium reasoning on a small payload |
| 8 | Orchestrator delivers the final answer and any file edits | premium tokens, focused |
05 - Use Cases
These are realistic patterns, not edge cases. Each shows what happens with a single model and what changes when the work distributes.
Debugging a slow API endpoint with a 200 KB log
The orchestrator needs to find why an endpoint is slow. Logs are bulky and most of them are noise.
Without distribution
The orchestrator reads the full log at premium rates, fills half its context window with noise, and burns reasoning capacity recognizing repeated entries. Diagnosis still happens, but expensively.With Distributed Model
The PostToolUse hook strips ANSI and collapses repeats deterministically. If the result is still huge, the orchestrator delegates a structured summary to Gemini Flash. Orchestrator reads a 5 KB digest with paths, line numbers, and error counts intact, then focuses reasoning on the actual hot path.Generating CRUD scaffolding for a new resource
A standard hand-off: spec a resource, get a controller, route, validator, test stub.
Without distribution
Orchestrator generates the scaffold itself. Quality is high but cost is non-trivial because the work is mechanical and the orchestrator's premium reasoning is wasted on string assembly.With Distributed Model
Orchestrator delegates the scaffold to DeepSeek V4 with explicit constraints, then reviews and edits. The orchestrator stays in the loop on shape and structure but pays roughly 1/25 the price for the bulk generation step.Sanity-checking an architecture decision before committing
The orchestrator has decided how to refactor a service boundary. The decision has blast radius.
Without distribution
The orchestrator reasons alone, possibly inheriting its own training-set blind spots. The user has to ask a separate model in a separate session for an independent check.With Distributed Model
The orchestrator calls OpenAI o-series for a structured second opinion on the design. Differences are surfaced, areas of agreement are confirmed, and the orchestrator either reaffirms, modifies, or reverses its plan with the consultant's reasoning on record.06 - Outcomes & Metrics
Every claim about distribution should be testable against observable session data, not vibes. The metrics below are what the project will track during the validation window and into operation.
| Metric | Target | Source |
|---|---|---|
| Tokens entering orchestrator context per session | Down materially vs single-model baseline; exact ratio set by week-1 data | Hook log: bytes-in vs bytes-out per tool call |
| Specialist call success rate | > 95% completed without fallback | MCP tool call telemetry |
| Orchestrator decisions overturned by consultant | Surfaced and logged; not zero (zero means consultant adds no signal) | Tribunal verdict files in audits/ |
| Compression hook latency | < 50 ms p95 deterministic; < 3 s p95 if LLM compression enabled | Hook log timing |
| Specialist API monthly spend | < orchestrator-only baseline minus 30% | Provider billing |
| Quality regressions traceable to delegation | Zero. Any traceable regression triggers reversion of the relevant lane | Manual review of delegated outputs vs orchestrator-only baseline |
The first seven days of operation are explicitly a validation window. Every hook invocation logs to .claude/hooks.log with timestamp, tool name, input bytes, output bytes, ms elapsed. At day seven, the project owner reviews the log against the targets above and either promotes the configuration to user-global or iterates on it. No promotion without data.
07 - Risks & Mitigations
A multi-model architecture has more moving parts than a single-model one. The risks below were identified during adversarial review of the design and have explicit mitigations.
08 - Governance & Operation
A summary of the operational discipline. The full governance specification lives in the companion file Distributed-Model-Governance.md.
| Topic | Policy |
|---|---|
| Owner | TBK Labs. Single accountable owner for changes to the distribution lanes, model assignments, and hook logic. |
| Change control | All non-trivial changes go through the design-decisions HTML preview process: labeled options, recommendation, pros/cons. Trivial changes (typo, comment, rename) can be applied directly. |
| Versioning | Strategy and governance docs versioned with major.minor; major bumps when the layer architecture changes, minor when model assignments or thresholds change. |
| Decision log | All non-trivial decisions captured as ADRs in audits/ with date prefix and verdict (AFFIRM / MODIFY / REVERSE / DEFER). |
| Quality gates | Trifecta review (3-provider tribunal) runs on any change touching the orchestrator-specialist contract or routing logic. |
| Cost review | Provider spend reviewed monthly. Lanes that fail to demonstrate value over the orchestrator-only baseline are deprecated. |
| Deprecation | The system can be retired at any time by removing the hook directory and the MCP configuration. No upstream dependencies; no patches to the orchestrator runtime. |