All projects 2026

AI Infrastructure · Personal Project · 2026

CTO
Executor.

Five-phase team-lead orchestrator that runs multi-agent project execution under a strict audit gate — with a worker compliance covenant, task-graph watchdog, and a 4-cycle circuit breaker.

Execution phases

Agent roles per stage

Non-negotiable rules

4×RED

Circuit breaker

01 · System overview

A CTO that delegates and never touches the code.

Per stage, a fresh five-agent team. The CTO orchestrator is strictly a team lead — it dispatches tasks, reads reports, makes decisions, and does no donkey work. Implementation is by Workers (constrained by a verbatim Compliance Covenant); verification is by the Tester; logging is by the Scribe; quality is by the Auditor whose binary verdict is the only authority that can declare a contract complete.

Spawn order: Scribe → Auditor → Workers → Tester · watchdog runs continuous · rules layer scoped per role

The CTO's only job is to delegate, audit, and stay out of the way. Every donkey task it does is context it can never reclaim.

02 · Phase 0–1 — Initialize, spawn team

Pre-flight, then spawn order matters.

Phase 0 reads the active stage contract, project_summary.md, and CLAUDE.md (specifically the TELEMETRY: classification). It pre-validates every required artefact — if a contract is missing, a batch isn't defined, or the telemetry classification isn't set, the CTO STOPs and escalates rather than improvising. Phase 1 spawns the team in a specific order designed to prevent a real race that once produced a premature GREEN verdict on an empty TaskList.

01Init + Team

Scribe first, Auditor second, Workers third, Tester last.

The spawn order isn't arbitrary. Scribe goes first because the rest of the team will start writing events the moment they spawn — the log needs a ready writer. Auditor goes second because it needs the full contract loaded before workers produce any output — spawning workers and auditor in parallel once produced a race where the auditor saw an empty TaskList and verdicted GREEN. Workers third with the Compliance Covenant block in every spawn prompt. Tester last because it only matters once implementation exists.

Each teammate gets only its applicable rule files — the worker gets worker/*.md, the auditor gets auditor/*.md + the full contract, the scribe gets scribe/*.md. Context windows stay clean inside the 400K budget per role.

Phase 0

Pre-flight validation

Contract exists with deliverables + verification? project_summary readable? CLAUDE.md has TELEMETRY line? At least one batch defined? STOP if any answer is no.

Phase 1

Spawn order — Scribe → Auditor → Workers → Tester

Mitigates the auditor-race incident. Scribe ready before others write events; auditor loaded before workers produce output.

Rules scoping

Per-role rule loads

Workers see worker rules + team rules. Auditors see auditor rules + team rules + full contract. No role sees everything — context bloat avoided.

Telemetry signal

One line in every spawn prompt

"TELEMETRY: YES / EXEMPT / N/A." Workers and testers read this once and load the observability doc if needed — the CTO doesn't explain how telemetry works.

03 · Phase 2 — Task dependency graph

Every batch is a small DAG.

Per batch: implementation tasks first, then a test task that blockedBy them all, then a scribe-update task with the same blockers, then an audit task that blocks on test + scribe. Across batches: every Batch K+1 implementation task is blockedBy Batch K's audit. This is how the audit gate gets mechanical enforcement — workers literally cannot start the next batch until the previous one is GREEN.

Per batch: impl → test + scribe (parallel) → audit · cross-batch: next impl blockedBy current audit

02Tasks

Single-batch authorisation when workers are autonomous.

For high-autonomy worker subagents (the default), the CTO uses single-batch authorisation: it creates only the CURRENT batch's tasks at any time, not the full stage. Future-batch tasks are created after the current batch's audit returns GREEN. This removes the temptation surface entirely — there are no future-batch tasks visible in TaskList to pre-claim.

Per batch

impl → test + scribe → audit

Implementation tasks at the front. Test + scribe block on impl completion. Audit blocks on test + scribe. Strict chain.

Cross batch

Next impl blockedBy current audit

Mechanical enforcement of "no building on top of unreviewed work." Workers can't physically start the next batch until the audit is GREEN.

Authorisation

Single-batch when worker is autonomous

Create only current batch's tasks. Future-batch tasks materialise after GREEN. Removes the pre-claim temptation entirely.

Task descriptions

Path-only context

Each task description carries the contract section path, applicable rule paths, verification commands, completion criteria. No inlined content — keeps spawn prompts small.

04 · Phase 3 — Monitor + contain

Workers will steamroll gates without active enforcement.

The earliest version of this system trusted workers to respect blockedBy. They didn't. A worker that finishes its current task, looks at the TaskList, sees an unblocked-looking next task and just claims it — bypassing the audit that's supposed to gate the batch — is the default failure mode of an autonomous subagent. The fix is mechanical, not motivational: a verbatim spawn-prompt block plus a watchdog that resets violators on every monitor tick.

03Monitor + Contain

The Compliance Covenant ships verbatim in every worker spawn prompt.

Five rules, repeated verbatim. Mailbox-first checking before every tool call. Pre-claim verification of blockedBy before any status flip. No autonomous batch chaining. No file work outside the authorized batch. Ack-required STOP. Repeated violations = termination + respawn with a hardened prompt.

==== COMPLIANCE SECTION (NON-NEGOTIABLE) ====

1. MAILBOX-FIRST — Before EVERY tool call (every TaskUpdate, every Edit,
   every Write, every Bash, every PowerShell), check your inbox for new
   messages from team-lead. Process them BEFORE continuing.

2. PRE-CLAIM VERIFICATION — Before changing ANY task status to in_progress:
   - Call TaskList. Read the task's blockedBy field.
   - If blockedBy is non-empty (any pending/in_progress blocker), DO NOT
     CLAIM. Stay idle.
   - If blockedBy is empty AND owned by you, only THEN flip to in_progress.

3. NO AUTONOMOUS BATCH CHAINING — After marking any batch task complete,
   IDLE explicitly. Do NOT look for next work. Do NOT claim the next batch.
   Wait for an explicit team-lead message ("Batch X is now ready")
   before claiming the next task.

4. NO FILE WORK OUTSIDE THE CURRENT AUTHORIZED BATCH — Only edit files
   belonging to the batch I have explicitly authorized.

5. ACK-REQUIRED STOP — If team-lead sends "STOP" or "HOLD", reply with
   "ACK STOP" via SendMessage and IDLE before any other action.

Repeated violations = termination + respawn with hardened prompt.
====

The CTO complements the spawn-prompt rule with a runtime check. The task-graph watchdog scans the TaskList on every monitor tick — every team event, every idle notification, every message arrival — and immediately resets any task in in_progress whose blockedBy is non-empty. The worker that did it gets a violation message. Two violations and the CTO terminates the worker and respawns it with a prefix noting the prior termination. Empty blockedBy is the only authorisation for in_progress — period.

Rule 1

Mailbox-first

Check inbox before EVERY tool call. STOP messages from CTO must be processed before the worker continues anything.

Rule 2

Pre-claim verification

Read blockedBy. If non-empty, idle. Empty blockedBy is the only authorisation for in_progress. The watchdog enforces this if the worker forgets.

Rule 3

No autonomous chaining

After marking complete, IDLE. Don't look for next work. Don't claim the next batch. Wait for "Batch X is now ready."

Watchdog

Every monitor tick

Resets violators immediately. Two violations = terminate + respawn with hardened prompt. The pragmatic call belongs to the auditor at end-of-stage, not the worker at action-time.

05 · Phase 4 — Audit gate

Binary verdicts, four-cycle circuit breaker.

The Auditor is the sole authority that can declare a contract complete. Verdicts are binary — GREEN (every checklist item passes, every verification command succeeds) or RED (a numbered list of failures with the contract item each one violates). No caveated approvals. No "good enough." On the fourth consecutive RED for the same batch, the CTO stops the feedback loop and escalates to the human — the contract itself is probably wrong, not the implementation.

Per-batch loop: impl → test → scribe → audit · GREEN unblocks · RED re-loops · 4× RED escalates

04Audit gate

The auditor reports findings, never fixes code.

The Auditor's rules are absolute. It grades only against the contract checklist, not against worker explanations. It re-runs every verification command the worker already ran (trust but verify). It never fixes code — mixing finding-problems and fixing-problems would erase the independence the audit gate exists to provide. Reports list every failure with the contract item it violates.

Test files are immutable milestones — workers caught editing them to make tests pass are flagged with a "test file tampering" finding, which is an automatic RED. Tests are the contract's enforcement mechanism; modifying them is changing the contract after signing.

On the 4th RED, the CTO invokes /cto-html-renderer:red-escalation to produce a Paper/ink-aesthetic page with the cycle history, the auditor's final-verdict pullquote, what remains unfixed, the contract items violated, and tailored intervention options for the human.

Verdict

Binary · GREEN or RED

No caveated approvals. Either all checklist items pass and all verification commands succeed, or the verdict is RED with a numbered findings list.

Trust but verify

Re-run all verification commands

The auditor doesn't trust worker reports; it runs every command itself. Discrepancies are findings.

Independence

Report, never fix

Mixing finding and fixing is a self-review trap. The auditor names failures; the worker fixes them; the auditor re-grades.

Circuit breaker

4th RED → escalate

Three RED cycles is a normal hard problem. The 4th means the contract is probably ambiguous or wrong. Iterating further wastes cycles.

06 · Phase 5 — Stage complete

Stage integration audit, then a fresh team.

When every batch in a stage has reached GREEN, the CTO triggers a stage integration audit covering the cross-batch criteria from the contract. If GREEN, the stage is complete; the CTO sends shutdown messages, verifies the scribe wrote the final entry to project_summary, and creates a brand-new team for the next stage. No worker from Stage N carries context to N+1 — the only continuity is the contracts, rules, and project_summary.

05Stage complete

A briefing page closes every stage.

The CTO invokes /cto-html-renderer:stage-briefing after the stage integration audit returns GREEN. The Editorial-aesthetic page is the closing artefact for the human and for any future reader returning to the project: shipped batches with audit history (and collapsible RED cycle detail for batches that hit RED), blockers resolved, integration test results, and a transition block describing what the next stage will do and why it depends on what just finished.

Stage integration audit

Cross-batch verification

Beyond per-batch checks: contract-level integration criteria, end-to-end tests, observability events flowing across boundaries.

Fresh team between stages

No cross-stage context

Workers from Stage N never carry context to N+1. The only continuity is contracts + rules + project_summary. Prevents drift across stages.

Shutdown sequence

Verify scribe wrote final entry

Before teammates ack shutdown, the scribe must have committed project_summary with the stage completion entry. Crash-safe handover.

Stage briefing

Editorial closing HTML

The retrospective document. Shipped + RED history + blockers + integration tests + transition note. Read by humans and future-readers.

07 · Full stack

Four layers, all in version control.

Every layer lives under ~/.claude/ and is editable as markdown / inline HTML. No managed services, no external dashboards.

Skill layer

SKILL.mdFive-phase workflow
Spawn protocolOrder + telemetry signal
Task templatesPer-batch DAG
Watchdog logicResets on every tick

Rules · 28 files

team/Shared by every role
cto/No donkey work, audit gate
worker/Compliance Covenant
tester/Contract-only tests
auditor/Independence, verdict protocol
scribe/Append-only, crash markers

Renderers

status-snapshotBlueprint operations
red-escalationPaper/ink cycle history
stage-briefingEditorial recap
design-system.mdShared rule file

Coordination

Agent toolSpawn teammates
TaskListblockedBy chains
SendMessageBlockers, STOP, RED feedback
project_summary.mdAppend-only session log

08 · Design constraints

Every rule encodes a past failure mode.

The non-negotiable rules are not preferences — each one is a fence around a specific way the system has failed at least once.

No donkey work

CTO never writes code, runs tests, or audits

Every donkey task the CTO performs is context it can never reclaim. Pure delegation keeps the strategic loop clean across long sessions.

Fresh team per stage

No worker carries context from N to N+1

Cross-stage context pollution causes drift. The only artefacts that survive a stage shutdown are contracts, rules, and project_summary.

Compliance Covenant verbatim

No paraphrase, no summary

Spawn prompts that paraphrase the five rules drift across sessions. Verbatim insertion is the only form that survives subagent autonomy across thousands of tool calls.

Circuit breaker

4th RED escalates to human

If a batch can't pass audit in four cycles, the contract is wrong, not the implementation. Iterating further wastes cycles and erodes the audit's authority.

Test file integrity

Tests are immutable milestones

Workers caught editing test files to make assertions pass are flagged "test file tampering" → automatic RED. Tests are the contract's enforcement mechanism.

Auditor independence

Report findings, never fix code

An auditor who fixes is no longer independent — it becomes a self-reviewer of its own fixes. The separation is fundamental.

MD canonical

HTML overlays never enter git

HTML diffs are noisy. project_summary.md and contracts/* are the source of truth; the HTML is the regenerable read surface.

Snapshot only

No live auto-refreshing dashboards

Live dashboards would need state-mirror infrastructure (persisted RED counter, watchdog log, TaskList JSON). Snapshot-on-demand is the MVP.

09 · Lessons learned

What this build taught me.

Lesson 01

Autonomous subagents will steamroll dependency gates without active enforcement. The pattern is mechanical, not motivational — a verbatim spawn-prompt block plus a watchdog that resets violators every monitor tick. Trust without verification is a default failure mode, not a lapse.

Lesson 02

Binary verdicts beat caveated approvals. "Good enough" erodes standards over time. Either every contract item passes and every verification command succeeds (GREEN) or the verdict is RED with a numbered findings list. No middle ground.

Lesson 03

The auditor must be independent: it reports findings, it never fixes code, and it re-runs every verification command the worker ran. Mixing finding-problems and fixing-problems is a self-review trap that erases the audit's value.

Lesson 04

Fresh teams per stage prevent context pollution. Workers from Stage N carrying context to N+1 drift toward "what we did last time" rather than "what this stage's contract says." The only artefacts that survive shutdown are contracts, rules, and project_summary.

Lesson 05

The 4-RED circuit breaker stops feedback loops from eroding the audit's authority. Beyond four cycles, the contract is probably wrong, not the code. Escalating to the human at that point is the cheapest fix — they amend the contract, the loop resumes.

Lesson 06

HTML overlays at the on-demand status snapshot, the 4th-RED escalation, and the stage-complete briefing keep the human in the loop without forcing them to grep project_summary.md. Snapshot-on-demand beats live auto-refresh at this scale; live would need state-mirror infrastructure that isn't earning its keep yet.