Distributed Model - Strategy

01 - Executive Summary

A small set of specialist models, coordinated by one.

A single LLM trying to do every task simultaneously is the wrong unit of compute. The Distributed Model splits work across purpose-fit models while keeping a single orchestrator responsible for decisions, code edits, and the final output. The result: faster sessions, lower token cost, and more headroom for hard reasoning, with no quality regression.

Models in rotation

Orchestrator + three specialists, each with a defined lane.

Triggering layers

Hooks, tool delegation, and parallel sub-agents.

Manual prompts

Delegation runs without operator intervention.

11x

Cost floor vs solo

Specialist tokens are 11 to 150 times cheaper than the orchestrator on like-for-like work.

02 - The Issue

A single model is overloaded with work it should not be doing itself.

When one model is asked to read every byte, generate every line, and reason about every decision in a long-running session, three failure modes compound: token cost rises faster than value, context fills up with low-signal data, and the model spends premium reasoning capacity on grunt work. The Distributed Model resolves all three by routing work to whichever model is best suited to it.

The single-model bottleneck versus a coordinated specialist team with one orchestrator.

The three failure modes that drive this

Failure mode	What it looks like	How distribution fixes it
Token cost on grunt work	Premium model reads 200KB of logs at premium rates.	A specialist reads or compresses bulky inputs at a fraction of the cost; the orchestrator only reads what matters.
Context bloat	Half the conversation window is filled with tool output noise.	Pre-orchestrator hooks compress mechanically; what enters the context is signal-dense.
Reasoning monoculture	One model's blind spots become every blind spot.	Targeted second opinions from different models surface what the orchestrator alone would miss.

03 - The Strategy

One orchestrator, three specialists, three triggering layers.

The architecture is intentionally narrow: one model holds the plan and writes the final output, three others handle the work that does not need premium reasoning. Delegation is automatic, never operator-driven; the orchestrator is the only model the user converses with.

Roles

Claude (Opus / Sonnet)

Orchestrator

Plans, reads relevant inputs, edits files, makes architectural decisions, writes the final code, and integrates output from specialists. The only model the user talks to. Final authority.

Gemini Flash

Bulk Reader

Compresses verbose tool output, summarizes long logs, scans large codebases. Cheap and large-context. Used by hooks for input pre-processing and by the orchestrator on demand for ad hoc summarization.

DeepSeek V4

Bulk Generator

Boilerplate, scaffolding, repetitive transforms. The orchestrator delegates mechanical work where quality drift is low and reviews the result before integrating. R1 variant available for non-coding reasoning verification.

OpenAI o-series / GPT-5

Code Consultant

Second opinions on hard logic, code reviews, debugging contested decisions. The orchestrator calls in this voice when stakes are high or when independent verification is more valuable than another pass from the same model.

The three triggering layers

The three layers stack from automatic, to orchestrated, to parallel. Most savings come from Layer 1.

04 - The Process

A typical request, end to end.

A simplified sequence of one user request through the Distributed Model. Each band is a turn; arrows mark delegation events.

Sequence of one request. The user only sees steps 1 and 8. Everything between is automatic.

Step	What happens	Cost class
1	User submits a prompt	n/a
2	UserPromptSubmit hook silently appends routing guidance	free (local hook)
3	Orchestrator delegates a bulky read to Gemini Flash	cheap specialist tokens
4	Specialist returns a compressed summary; orchestrator integrates	premium tokens, but on a small input
5	PostToolUse hook deterministically compresses any tool output before it enters context	free (local hook)
6	Orchestrator requests a code-review second opinion from OpenAI o-series	specialist tokens
7	Consultant returns; orchestrator weighs against its own analysis	premium reasoning on a small payload
8	Orchestrator delivers the final answer and any file edits	premium tokens, focused

05 - Use Cases

Three concrete scenarios where the model earns its keep.

These are realistic patterns, not edge cases. Each shows what happens with a single model and what changes when the work distributes.

Use Case A

Debugging a slow API endpoint with a 200 KB log

The orchestrator needs to find why an endpoint is slow. Logs are bulky and most of them are noise.

Without distribution

The orchestrator reads the full log at premium rates, fills half its context window with noise, and burns reasoning capacity recognizing repeated entries. Diagnosis still happens, but expensively.

With Distributed Model

The PostToolUse hook strips ANSI and collapses repeats deterministically. If the result is still huge, the orchestrator delegates a structured summary to Gemini Flash. Orchestrator reads a 5 KB digest with paths, line numbers, and error counts intact, then focuses reasoning on the actual hot path.

Use Case B

Generating CRUD scaffolding for a new resource

A standard hand-off: spec a resource, get a controller, route, validator, test stub.

Without distribution

Orchestrator generates the scaffold itself. Quality is high but cost is non-trivial because the work is mechanical and the orchestrator's premium reasoning is wasted on string assembly.

With Distributed Model

Orchestrator delegates the scaffold to DeepSeek V4 with explicit constraints, then reviews and edits. The orchestrator stays in the loop on shape and structure but pays roughly 1/25 the price for the bulk generation step.

Use Case C

Sanity-checking an architecture decision before committing

The orchestrator has decided how to refactor a service boundary. The decision has blast radius.

Without distribution

The orchestrator reasons alone, possibly inheriting its own training-set blind spots. The user has to ask a separate model in a separate session for an independent check.

With Distributed Model

The orchestrator calls OpenAI o-series for a structured second opinion on the design. Differences are surfaced, areas of agreement are confirmed, and the orchestrator either reaffirms, modifies, or reverses its plan with the consultant's reasoning on record.

06 - Outcomes & Metrics

What success looks like, and how it gets measured.

Every claim about distribution should be testable against observable session data, not vibes. The metrics below are what the project will track during the validation window and into operation.

Metric	Target	Source
Tokens entering orchestrator context per session	Down materially vs single-model baseline; exact ratio set by week-1 data	Hook log: bytes-in vs bytes-out per tool call
Specialist call success rate	> 95% completed without fallback	MCP tool call telemetry
Orchestrator decisions overturned by consultant	Surfaced and logged; not zero (zero means consultant adds no signal)	Tribunal verdict files in audits/
Compression hook latency	< 50 ms p95 deterministic; < 3 s p95 if LLM compression enabled	Hook log timing
Specialist API monthly spend	< orchestrator-only baseline minus 30%	Provider billing
Quality regressions traceable to delegation	Zero. Any traceable regression triggers reversion of the relevant lane	Manual review of delegated outputs vs orchestrator-only baseline

Validation discipline

The first seven days of operation are explicitly a validation window. Every hook invocation logs to .claude/hooks.log with timestamp, tool name, input bytes, output bytes, ms elapsed. At day seven, the project owner reviews the log against the targets above and either promotes the configuration to user-global or iterates on it. No promotion without data.

07 - Risks & Mitigations

The honest tradeoffs.

A multi-model architecture has more moving parts than a single-model one. The risks below were identified during adversarial review of the design and have explicit mitigations.

Information loss in bulk-reader summaries

Specialist summarization can drop quantitative detail (counts, distributions) that matters for diagnosis. Mitigation: deterministic compression as the default; LLM summarization only above an explicit byte threshold and only with a prompt template that requires preserving paths, line numbers, and counts.

Inappropriate delegation

Routing guidance could push the orchestrator to delegate when its own reasoning is stronger. Mitigation: the routing nudge explicitly tells the orchestrator to remain the final editor and to delegate only specific task types. Decisions about file edits and plans are never delegated.

Subtle bugs from generated boilerplate

Specialist-generated scaffolds may have defects the orchestrator does not catch on review. Mitigation: orchestrator stays the primary author for anything user-visible; specialist generation is restricted to mechanical work with clear structural constraints; integration always passes through orchestrator review.

Silent configuration failure

A misnamed env var, missing key, or wrong matcher pattern can cause hooks to silently no-op. Mitigation: hook logging from day one. Every invocation writes one line. A missing log signals a misconfiguration. Validation on startup.

External provider downtime

A provider outage could break delegated tool calls mid-session. Mitigation: graceful degradation - the orchestrator falls back to its own work when a specialist returns an error or times out. Sessions slow but do not break.

API key exposure

Provider keys in MCP configuration files are sensitive. Mitigation: keys live in user-scope environment variables, not in committed files. The MCP configuration references variables, never contains literal keys. Configuration is gitignored.

08 - Governance & Operation

How the system is owned, changed, and retired.

A summary of the operational discipline. The full governance specification lives in the companion file Distributed-Model-Governance.md.

Topic	Policy
Owner	TBK Labs. Single accountable owner for changes to the distribution lanes, model assignments, and hook logic.
Change control	All non-trivial changes go through the design-decisions HTML preview process: labeled options, recommendation, pros/cons. Trivial changes (typo, comment, rename) can be applied directly.
Versioning	Strategy and governance docs versioned with major.minor; major bumps when the layer architecture changes, minor when model assignments or thresholds change.
Decision log	All non-trivial decisions captured as ADRs in `audits/` with date prefix and verdict (AFFIRM / MODIFY / REVERSE / DEFER).
Quality gates	Trifecta review (3-provider tribunal) runs on any change touching the orchestrator-specialist contract or routing logic.
Cost review	Provider spend reviewed monthly. Lanes that fail to demonstrate value over the orchestrator-only baseline are deprecated.
Deprecation	The system can be retired at any time by removing the hook directory and the MCP configuration. No upstream dependencies; no patches to the orchestrator runtime.