Project Strategy

Distributed Model

A coordinated multi-LLM architecture where one orchestrator delegates specialist work to purpose-fit models, expanding throughput and lowering cost without compromising output quality.

Project: Distributed Model
Document: Strategy
Version: 1.0
Issued: 2026-05-04
Owner: TBK Labs

01 - Executive Summary

A small set of specialist models, coordinated by one.

A single LLM trying to do every task simultaneously is the wrong unit of compute. The Distributed Model splits work across purpose-fit models while keeping a single orchestrator responsible for decisions, code edits, and the final output. The result: faster sessions, lower token cost, and more headroom for hard reasoning, with no quality regression.

4
Models in rotation
Orchestrator + three specialists, each with a defined lane.
3
Triggering layers
Hooks, tool delegation, and parallel sub-agents.
0
Manual prompts
Delegation runs without operator intervention.
11x
Cost floor vs solo
Specialist tokens are 11 to 150 times cheaper than the orchestrator on like-for-like work.

02 - The Issue

A single model is overloaded with work it should not be doing itself.

When one model is asked to read every byte, generate every line, and reason about every decision in a long-running session, three failure modes compound: token cost rises faster than value, context fills up with low-signal data, and the model spends premium reasoning capacity on grunt work. The Distributed Model resolves all three by routing work to whichever model is best suited to it.

BEFORE AFTER (DISTRIBUTED MODEL) SOLO tool outputs file & code reads reasoning & edits single point of contention CLAUDE Orchestrator GEMINI bulk reads DEEPSEEK cheap bulk OPENAI code review specialists handle their lanes; orchestrator decides
The single-model bottleneck versus a coordinated specialist team with one orchestrator.

The three failure modes that drive this

Failure modeWhat it looks likeHow distribution fixes it
Token cost on grunt workPremium model reads 200KB of logs at premium rates.A specialist reads or compresses bulky inputs at a fraction of the cost; the orchestrator only reads what matters.
Context bloatHalf the conversation window is filled with tool output noise.Pre-orchestrator hooks compress mechanically; what enters the context is signal-dense.
Reasoning monocultureOne model's blind spots become every blind spot.Targeted second opinions from different models surface what the orchestrator alone would miss.

03 - The Strategy

One orchestrator, three specialists, three triggering layers.

The architecture is intentionally narrow: one model holds the plan and writes the final output, three others handle the work that does not need premium reasoning. Delegation is automatic, never operator-driven; the orchestrator is the only model the user converses with.

Roles

Claude (Opus / Sonnet)
Orchestrator
Plans, reads relevant inputs, edits files, makes architectural decisions, writes the final code, and integrates output from specialists. The only model the user talks to. Final authority.
Gemini Flash
Bulk Reader
Compresses verbose tool output, summarizes long logs, scans large codebases. Cheap and large-context. Used by hooks for input pre-processing and by the orchestrator on demand for ad hoc summarization.
DeepSeek V4
Bulk Generator
Boilerplate, scaffolding, repetitive transforms. The orchestrator delegates mechanical work where quality drift is low and reviews the result before integrating. R1 variant available for non-coding reasoning verification.
OpenAI o-series / GPT-5
Code Consultant
Second opinions on hard logic, code reviews, debugging contested decisions. The orchestrator calls in this voice when stakes are high or when independent verification is more valuable than another pass from the same model.

The three triggering layers

LAYER 1 Hooks - automatic, no decision needed PostToolUse hook strips noise from every Bash, Read, Grep, Glob result before it enters the orchestrator's context. UserPromptSubmit hook injects routing guidance silently. 01 LAYER 2 Tool delegation - orchestrator decides When a task fits a specialist, the orchestrator calls it as a regular tool. Specialists are exposed as MCP servers; the orchestrator picks based on cost, context size, and task type. 02 LAYER 3 Sub-agent fan-out - parallel work For exploration, the orchestrator spawns multiple sub-agents that route to different specialists in parallel. 03
The three layers stack from automatic, to orchestrated, to parallel. Most savings come from Layer 1.

04 - The Process

A typical request, end to end.

A simplified sequence of one user request through the Distributed Model. Each band is a turn; arrows mark delegation events.

USER ORCHESTRATOR SPECIALISTS HOOKS 1. user prompt 2. routing nudge appended (auto) 3. delegate bulk read 4. compressed summary 5. hook compresses tool output 6. second-opinion request 7. consultant verdict 8. final answer + edits
Sequence of one request. The user only sees steps 1 and 8. Everything between is automatic.
StepWhat happensCost class
1User submits a promptn/a
2UserPromptSubmit hook silently appends routing guidancefree (local hook)
3Orchestrator delegates a bulky read to Gemini Flashcheap specialist tokens
4Specialist returns a compressed summary; orchestrator integratespremium tokens, but on a small input
5PostToolUse hook deterministically compresses any tool output before it enters contextfree (local hook)
6Orchestrator requests a code-review second opinion from OpenAI o-seriesspecialist tokens
7Consultant returns; orchestrator weighs against its own analysispremium reasoning on a small payload
8Orchestrator delivers the final answer and any file editspremium tokens, focused

05 - Use Cases

Three concrete scenarios where the model earns its keep.

These are realistic patterns, not edge cases. Each shows what happens with a single model and what changes when the work distributes.

Use Case A

Debugging a slow API endpoint with a 200 KB log

The orchestrator needs to find why an endpoint is slow. Logs are bulky and most of them are noise.

Without distribution

The orchestrator reads the full log at premium rates, fills half its context window with noise, and burns reasoning capacity recognizing repeated entries. Diagnosis still happens, but expensively.

With Distributed Model

The PostToolUse hook strips ANSI and collapses repeats deterministically. If the result is still huge, the orchestrator delegates a structured summary to Gemini Flash. Orchestrator reads a 5 KB digest with paths, line numbers, and error counts intact, then focuses reasoning on the actual hot path.
Use Case B

Generating CRUD scaffolding for a new resource

A standard hand-off: spec a resource, get a controller, route, validator, test stub.

Without distribution

Orchestrator generates the scaffold itself. Quality is high but cost is non-trivial because the work is mechanical and the orchestrator's premium reasoning is wasted on string assembly.

With Distributed Model

Orchestrator delegates the scaffold to DeepSeek V4 with explicit constraints, then reviews and edits. The orchestrator stays in the loop on shape and structure but pays roughly 1/25 the price for the bulk generation step.
Use Case C

Sanity-checking an architecture decision before committing

The orchestrator has decided how to refactor a service boundary. The decision has blast radius.

Without distribution

The orchestrator reasons alone, possibly inheriting its own training-set blind spots. The user has to ask a separate model in a separate session for an independent check.

With Distributed Model

The orchestrator calls OpenAI o-series for a structured second opinion on the design. Differences are surfaced, areas of agreement are confirmed, and the orchestrator either reaffirms, modifies, or reverses its plan with the consultant's reasoning on record.

06 - Outcomes & Metrics

What success looks like, and how it gets measured.

Every claim about distribution should be testable against observable session data, not vibes. The metrics below are what the project will track during the validation window and into operation.

MetricTargetSource
Tokens entering orchestrator context per sessionDown materially vs single-model baseline; exact ratio set by week-1 dataHook log: bytes-in vs bytes-out per tool call
Specialist call success rate> 95% completed without fallbackMCP tool call telemetry
Orchestrator decisions overturned by consultantSurfaced and logged; not zero (zero means consultant adds no signal)Tribunal verdict files in audits/
Compression hook latency< 50 ms p95 deterministic; < 3 s p95 if LLM compression enabledHook log timing
Specialist API monthly spend< orchestrator-only baseline minus 30%Provider billing
Quality regressions traceable to delegationZero. Any traceable regression triggers reversion of the relevant laneManual review of delegated outputs vs orchestrator-only baseline
Validation discipline

The first seven days of operation are explicitly a validation window. Every hook invocation logs to .claude/hooks.log with timestamp, tool name, input bytes, output bytes, ms elapsed. At day seven, the project owner reviews the log against the targets above and either promotes the configuration to user-global or iterates on it. No promotion without data.

07 - Risks & Mitigations

The honest tradeoffs.

A multi-model architecture has more moving parts than a single-model one. The risks below were identified during adversarial review of the design and have explicit mitigations.

Information loss in bulk-reader summaries
Specialist summarization can drop quantitative detail (counts, distributions) that matters for diagnosis. Mitigation: deterministic compression as the default; LLM summarization only above an explicit byte threshold and only with a prompt template that requires preserving paths, line numbers, and counts.
Inappropriate delegation
Routing guidance could push the orchestrator to delegate when its own reasoning is stronger. Mitigation: the routing nudge explicitly tells the orchestrator to remain the final editor and to delegate only specific task types. Decisions about file edits and plans are never delegated.
Subtle bugs from generated boilerplate
Specialist-generated scaffolds may have defects the orchestrator does not catch on review. Mitigation: orchestrator stays the primary author for anything user-visible; specialist generation is restricted to mechanical work with clear structural constraints; integration always passes through orchestrator review.
Silent configuration failure
A misnamed env var, missing key, or wrong matcher pattern can cause hooks to silently no-op. Mitigation: hook logging from day one. Every invocation writes one line. A missing log signals a misconfiguration. Validation on startup.
External provider downtime
A provider outage could break delegated tool calls mid-session. Mitigation: graceful degradation - the orchestrator falls back to its own work when a specialist returns an error or times out. Sessions slow but do not break.
API key exposure
Provider keys in MCP configuration files are sensitive. Mitigation: keys live in user-scope environment variables, not in committed files. The MCP configuration references variables, never contains literal keys. Configuration is gitignored.

08 - Governance & Operation

How the system is owned, changed, and retired.

A summary of the operational discipline. The full governance specification lives in the companion file Distributed-Model-Governance.md.

TopicPolicy
OwnerTBK Labs. Single accountable owner for changes to the distribution lanes, model assignments, and hook logic.
Change controlAll non-trivial changes go through the design-decisions HTML preview process: labeled options, recommendation, pros/cons. Trivial changes (typo, comment, rename) can be applied directly.
VersioningStrategy and governance docs versioned with major.minor; major bumps when the layer architecture changes, minor when model assignments or thresholds change.
Decision logAll non-trivial decisions captured as ADRs in audits/ with date prefix and verdict (AFFIRM / MODIFY / REVERSE / DEFER).
Quality gatesTrifecta review (3-provider tribunal) runs on any change touching the orchestrator-specialist contract or routing logic.
Cost reviewProvider spend reviewed monthly. Lanes that fail to demonstrate value over the orchestrator-only baseline are deprecated.
DeprecationThe system can be retired at any time by removing the hook directory and the MCP configuration. No upstream dependencies; no patches to the orchestrator runtime.