Skip to content
Santiago Isaza.

AI Infrastructure · Personal Project · 2026

CTO
Executor.

Five-phase team-lead orchestrator that runs multi-agent project execution under a strict audit gate — with a worker compliance covenant, task-graph watchdog, and a 4-cycle circuit breaker.

5
Execution phases
5
Agent roles per stage
28
Non-negotiable rules
4×RED
Circuit breaker

01 · System overview

A CTO that delegates and never touches the code.

Per stage, a fresh five-agent team. The CTO orchestrator is strictly a team lead — it dispatches tasks, reads reports, makes decisions, and does no donkey work. Implementation is by Workers (constrained by a verbatim Compliance Covenant); verification is by the Tester; logging is by the Scribe; quality is by the Auditor whose binary verdict is the only authority that can declare a contract complete.

CTO EXECUTOR CTO PURE DELEGATION Scribe SPAWNED 1st APPEND-ONLY LOG CRASH MARKERS Auditor SPAWNED 2nd SOLE AUTHORITY GREEN/RED VERDICT Worker × N SPAWNED 3rd COMPLIANCE COVENANT IMPLEMENTATION Tester SPAWNED 4th CONTRACT-ONLY TESTS NO TEST MODIFICATION Watchdog EVERY MONITOR TICK RESETS VIOLATORS 2× = RESPAWN 1 2 3 4 RULES LAYER · ~/.claude/rules/cto-orchestration/ team/ · cto/ · worker/ · tester/ · auditor/ · scribe/ · design-system.md 28 NON-NEGOTIABLE RULE FILES · LOADED PER ROLE

Spawn order: Scribe → Auditor → Workers → Tester · watchdog runs continuous · rules layer scoped per role

The CTO's only job is to delegate, audit, and stay out of the way. Every donkey task it does is context it can never reclaim.

02 · Phase 0–1 — Initialize, spawn team

Pre-flight, then spawn order matters.

Phase 0 reads the active stage contract, project_summary.md, and CLAUDE.md (specifically the TELEMETRY: classification). It pre-validates every required artefact — if a contract is missing, a batch isn't defined, or the telemetry classification isn't set, the CTO STOPs and escalates rather than improvising. Phase 1 spawns the team in a specific order designed to prevent a real race that once produced a premature GREEN verdict on an empty TaskList.

01Init + Team

Scribe first, Auditor second, Workers third, Tester last.

The spawn order isn't arbitrary. Scribe goes first because the rest of the team will start writing events the moment they spawn — the log needs a ready writer. Auditor goes second because it needs the full contract loaded before workers produce any output — spawning workers and auditor in parallel once produced a race where the auditor saw an empty TaskList and verdicted GREEN. Workers third with the Compliance Covenant block in every spawn prompt. Tester last because it only matters once implementation exists.

Each teammate gets only its applicable rule files — the worker gets worker/*.md, the auditor gets auditor/*.md + the full contract, the scribe gets scribe/*.md. Context windows stay clean inside the 400K budget per role.

Phase 0
Pre-flight validation
Contract exists with deliverables + verification? project_summary readable? CLAUDE.md has TELEMETRY line? At least one batch defined? STOP if any answer is no.
Phase 1
Spawn order — Scribe → Auditor → Workers → Tester
Mitigates the auditor-race incident. Scribe ready before others write events; auditor loaded before workers produce output.
Rules scoping
Per-role rule loads
Workers see worker rules + team rules. Auditors see auditor rules + team rules + full contract. No role sees everything — context bloat avoided.
Telemetry signal
One line in every spawn prompt
"TELEMETRY: YES / EXEMPT / N/A." Workers and testers read this once and load the observability doc if needed — the CTO doesn't explain how telemetry works.

03 · Phase 2 — Task dependency graph

Every batch is a small DAG.

Per batch: implementation tasks first, then a test task that blockedBy them all, then a scribe-update task with the same blockers, then an audit task that blocks on test + scribe. Across batches: every Batch K+1 implementation task is blockedBy Batch K's audit. This is how the audit gate gets mechanical enforcement — workers literally cannot start the next batch until the previous one is GREEN.

BATCH DAG Impl 1 WORKER Impl 2 WORKER Impl N WORKER Test batch TESTER · blockedBy 1..N Scribe update SCRIBE · blockedBy 1..N Audit batch AUDITOR · blockedBy Test + Scribe Batch K+1 impl 1 blockedBy AUDIT K Batch K+1 impl 2 blockedBy AUDIT K

Per batch: impl → test + scribe (parallel) → audit · cross-batch: next impl blockedBy current audit

02Tasks

Single-batch authorisation when workers are autonomous.

For high-autonomy worker subagents (the default), the CTO uses single-batch authorisation: it creates only the CURRENT batch's tasks at any time, not the full stage. Future-batch tasks are created after the current batch's audit returns GREEN. This removes the temptation surface entirely — there are no future-batch tasks visible in TaskList to pre-claim.

Per batch
impl → test + scribe → audit
Implementation tasks at the front. Test + scribe block on impl completion. Audit blocks on test + scribe. Strict chain.
Cross batch
Next impl blockedBy current audit
Mechanical enforcement of "no building on top of unreviewed work." Workers can't physically start the next batch until the audit is GREEN.
Authorisation
Single-batch when worker is autonomous
Create only current batch's tasks. Future-batch tasks materialise after GREEN. Removes the pre-claim temptation entirely.
Task descriptions
Path-only context
Each task description carries the contract section path, applicable rule paths, verification commands, completion criteria. No inlined content — keeps spawn prompts small.

04 · Phase 3 — Monitor + contain

Workers will steamroll gates without active enforcement.

The earliest version of this system trusted workers to respect blockedBy. They didn't. A worker that finishes its current task, looks at the TaskList, sees an unblocked-looking next task and just claims it — bypassing the audit that's supposed to gate the batch — is the default failure mode of an autonomous subagent. The fix is mechanical, not motivational: a verbatim spawn-prompt block plus a watchdog that resets violators on every monitor tick.

03Monitor + Contain

The Compliance Covenant ships verbatim in every worker spawn prompt.

Five rules, repeated verbatim. Mailbox-first checking before every tool call. Pre-claim verification of blockedBy before any status flip. No autonomous batch chaining. No file work outside the authorized batch. Ack-required STOP. Repeated violations = termination + respawn with a hardened prompt.

==== COMPLIANCE SECTION (NON-NEGOTIABLE) ====

1. MAILBOX-FIRST — Before EVERY tool call (every TaskUpdate, every Edit,
   every Write, every Bash, every PowerShell), check your inbox for new
   messages from team-lead. Process them BEFORE continuing.

2. PRE-CLAIM VERIFICATION — Before changing ANY task status to in_progress:
   - Call TaskList. Read the task's blockedBy field.
   - If blockedBy is non-empty (any pending/in_progress blocker), DO NOT
     CLAIM. Stay idle.
   - If blockedBy is empty AND owned by you, only THEN flip to in_progress.

3. NO AUTONOMOUS BATCH CHAINING — After marking any batch task complete,
   IDLE explicitly. Do NOT look for next work. Do NOT claim the next batch.
   Wait for an explicit team-lead message ("Batch X is now ready")
   before claiming the next task.

4. NO FILE WORK OUTSIDE THE CURRENT AUTHORIZED BATCH — Only edit files
   belonging to the batch I have explicitly authorized.

5. ACK-REQUIRED STOP — If team-lead sends "STOP" or "HOLD", reply with
   "ACK STOP" via SendMessage and IDLE before any other action.

Repeated violations = termination + respawn with hardened prompt.
====

The CTO complements the spawn-prompt rule with a runtime check. The task-graph watchdog scans the TaskList on every monitor tick — every team event, every idle notification, every message arrival — and immediately resets any task in in_progress whose blockedBy is non-empty. The worker that did it gets a violation message. Two violations and the CTO terminates the worker and respawns it with a prefix noting the prior termination. Empty blockedBy is the only authorisation for in_progress — period.

Rule 1
Mailbox-first
Check inbox before EVERY tool call. STOP messages from CTO must be processed before the worker continues anything.
Rule 2
Pre-claim verification
Read blockedBy. If non-empty, idle. Empty blockedBy is the only authorisation for in_progress. The watchdog enforces this if the worker forgets.
Rule 3
No autonomous chaining
After marking complete, IDLE. Don't look for next work. Don't claim the next batch. Wait for "Batch X is now ready."
Watchdog
Every monitor tick
Resets violators immediately. Two violations = terminate + respawn with hardened prompt. The pragmatic call belongs to the auditor at end-of-stage, not the worker at action-time.

05 · Phase 4 — Audit gate

Binary verdicts, four-cycle circuit breaker.

The Auditor is the sole authority that can declare a contract complete. Verdicts are binary — GREEN (every checklist item passes, every verification command succeeds) or RED (a numbered list of failures with the contract item each one violates). No caveated approvals. No "good enough." On the fourth consecutive RED for the same batch, the CTO stops the feedback loop and escalates to the human — the contract itself is probably wrong, not the implementation.

RED CYCLE Workers IMPL TASKS Tester VERIFICATION Scribe PROJECT_SUMMARY Auditor VERDICT GREEN UNBLOCK NEXT RED FIX + RE-AUDIT cycle 2, 3, 4 … 4th RED → ESCALATE TO HUMAN CIRCUIT BREAKER · CONTRACT IS PROBABLY WRONG

Per-batch loop: impl → test → scribe → audit · GREEN unblocks · RED re-loops · 4× RED escalates

04Audit gate

The auditor reports findings, never fixes code.

The Auditor's rules are absolute. It grades only against the contract checklist, not against worker explanations. It re-runs every verification command the worker already ran (trust but verify). It never fixes code — mixing finding-problems and fixing-problems would erase the independence the audit gate exists to provide. Reports list every failure with the contract item it violates.

Test files are immutable milestones — workers caught editing them to make tests pass are flagged with a "test file tampering" finding, which is an automatic RED. Tests are the contract's enforcement mechanism; modifying them is changing the contract after signing.

On the 4th RED, the CTO invokes /cto-html-renderer:red-escalation to produce a Paper/ink-aesthetic page with the cycle history, the auditor's final-verdict pullquote, what remains unfixed, the contract items violated, and tailored intervention options for the human.

Verdict
Binary · GREEN or RED
No caveated approvals. Either all checklist items pass and all verification commands succeed, or the verdict is RED with a numbered findings list.
Trust but verify
Re-run all verification commands
The auditor doesn't trust worker reports; it runs every command itself. Discrepancies are findings.
Independence
Report, never fix
Mixing finding and fixing is a self-review trap. The auditor names failures; the worker fixes them; the auditor re-grades.
Circuit breaker
4th RED → escalate
Three RED cycles is a normal hard problem. The 4th means the contract is probably ambiguous or wrong. Iterating further wastes cycles.

06 · Phase 5 — Stage complete

Stage integration audit, then a fresh team.

When every batch in a stage has reached GREEN, the CTO triggers a stage integration audit covering the cross-batch criteria from the contract. If GREEN, the stage is complete; the CTO sends shutdown messages, verifies the scribe wrote the final entry to project_summary, and creates a brand-new team for the next stage. No worker from Stage N carries context to N+1 — the only continuity is the contracts, rules, and project_summary.

05Stage complete

A briefing page closes every stage.

The CTO invokes /cto-html-renderer:stage-briefing after the stage integration audit returns GREEN. The Editorial-aesthetic page is the closing artefact for the human and for any future reader returning to the project: shipped batches with audit history (and collapsible RED cycle detail for batches that hit RED), blockers resolved, integration test results, and a transition block describing what the next stage will do and why it depends on what just finished.

Stage integration audit
Cross-batch verification
Beyond per-batch checks: contract-level integration criteria, end-to-end tests, observability events flowing across boundaries.
Fresh team between stages
No cross-stage context
Workers from Stage N never carry context to N+1. The only continuity is contracts + rules + project_summary. Prevents drift across stages.
Shutdown sequence
Verify scribe wrote final entry
Before teammates ack shutdown, the scribe must have committed project_summary with the stage completion entry. Crash-safe handover.
Stage briefing
Editorial closing HTML
The retrospective document. Shipped + RED history + blockers + integration tests + transition note. Read by humans and future-readers.

07 · Full stack

Four layers, all in version control.

Every layer lives under ~/.claude/ and is editable as markdown / inline HTML. No managed services, no external dashboards.

Skill layer

  • SKILL.mdFive-phase workflow
  • Spawn protocolOrder + telemetry signal
  • Task templatesPer-batch DAG
  • Watchdog logicResets on every tick

Rules · 28 files

  • team/Shared by every role
  • cto/No donkey work, audit gate
  • worker/Compliance Covenant
  • tester/Contract-only tests
  • auditor/Independence, verdict protocol
  • scribe/Append-only, crash markers

Renderers

  • status-snapshotBlueprint operations
  • red-escalationPaper/ink cycle history
  • stage-briefingEditorial recap
  • design-system.mdShared rule file

Coordination

  • Agent toolSpawn teammates
  • TaskListblockedBy chains
  • SendMessageBlockers, STOP, RED feedback
  • project_summary.mdAppend-only session log

08 · Design constraints

Every rule encodes a past failure mode.

The non-negotiable rules are not preferences — each one is a fence around a specific way the system has failed at least once.

No donkey work
CTO never writes code, runs tests, or audits
Every donkey task the CTO performs is context it can never reclaim. Pure delegation keeps the strategic loop clean across long sessions.
Fresh team per stage
No worker carries context from N to N+1
Cross-stage context pollution causes drift. The only artefacts that survive a stage shutdown are contracts, rules, and project_summary.
Compliance Covenant verbatim
No paraphrase, no summary
Spawn prompts that paraphrase the five rules drift across sessions. Verbatim insertion is the only form that survives subagent autonomy across thousands of tool calls.
Circuit breaker
4th RED escalates to human
If a batch can't pass audit in four cycles, the contract is wrong, not the implementation. Iterating further wastes cycles and erodes the audit's authority.
Test file integrity
Tests are immutable milestones
Workers caught editing test files to make assertions pass are flagged "test file tampering" → automatic RED. Tests are the contract's enforcement mechanism.
Auditor independence
Report findings, never fix code
An auditor who fixes is no longer independent — it becomes a self-reviewer of its own fixes. The separation is fundamental.
MD canonical
HTML overlays never enter git
HTML diffs are noisy. project_summary.md and contracts/* are the source of truth; the HTML is the regenerable read surface.
Snapshot only
No live auto-refreshing dashboards
Live dashboards would need state-mirror infrastructure (persisted RED counter, watchdog log, TaskList JSON). Snapshot-on-demand is the MVP.

09 · Lessons learned

What this build taught me.

Lesson 01
Autonomous subagents will steamroll dependency gates without active enforcement. The pattern is mechanical, not motivational — a verbatim spawn-prompt block plus a watchdog that resets violators every monitor tick. Trust without verification is a default failure mode, not a lapse.
Lesson 02
Binary verdicts beat caveated approvals. "Good enough" erodes standards over time. Either every contract item passes and every verification command succeeds (GREEN) or the verdict is RED with a numbered findings list. No middle ground.
Lesson 03
The auditor must be independent: it reports findings, it never fixes code, and it re-runs every verification command the worker ran. Mixing finding-problems and fixing-problems is a self-review trap that erases the audit's value.
Lesson 04
Fresh teams per stage prevent context pollution. Workers from Stage N carrying context to N+1 drift toward "what we did last time" rather than "what this stage's contract says." The only artefacts that survive shutdown are contracts, rules, and project_summary.
Lesson 05
The 4-RED circuit breaker stops feedback loops from eroding the audit's authority. Beyond four cycles, the contract is probably wrong, not the code. Escalating to the human at that point is the cheapest fix — they amend the contract, the loop resumes.
Lesson 06
HTML overlays at the on-demand status snapshot, the 4th-RED escalation, and the stage-complete briefing keep the human in the loop without forcing them to grep project_summary.md. Snapshot-on-demand beats live auto-refresh at this scale; live would need state-mirror infrastructure that isn't earning its keep yet.