Skip to content
Santiago Isaza.

AI Infrastructure · Personal Project · 2026

Project
Architect.

Seven-phase planning skill that turns a fuzzy idea into a frozen set of stage contracts — with parallel research agents, 400K token budgeting, and HTML overlays at every human-review gate.

7
Planning phases
3
Parallel research agents
5
Review-gate overlays
7
Deliverable templates

01 · System overview

Seven phases, one frozen plan.

The architect walks a project from a blank slate to a frozen set of stage contracts. Every phase has a gate the human approves before the architect proceeds. Every gate writes a markdown checkpoint to disk so the next session can resume from the last approved phase — and renders an HTML overlay so the human actually reads what they're approving.

PROJECT ARCHITECT Phase 0 CLASSIFY SIZE · STAGES Phase 1 DISAMBIG 3–5 QUESTIONS Phase 2 RESEARCH 3 PARALLEL Phase 3 SCOPE TIER CLASSIFY Phase 4 ARCHITECT 400K BUDGET Phase 5 CONTRACTS HIL DRAFT Phase 6 HANDOVER CLAUDE.MD to executor FRESH TEAM checkpoint phase-0.md checkpoint phase-1.md + HTML overlay tech-mapping + HTML overlay scope-tiers + HTML overlay arch-review + HTML overlay contract-review + HTML overlay decision-timeline gate gate gate gate gate gate gate Every gate requires human approval · every approval freezes the markdown checkpoint · the HTML overlay is ephemeral

Seven phases, five HTML review gates, seven markdown checkpoints — all under one 400K context budget

Ask only the three to five questions whose answers would fork the architecture. Save the rest for Phase 3.

02 · Phase 0–1 — Classify, Disambiguate

Start narrow, ask only what forks.

A planning skill that asks 15 generic questions earns nothing but user fatigue. The architect's first move is to classify the project size (Small / Medium / Large → 1, 3, or 5 stages), then ask only the 3–5 questions whose answers would fork the technology stack, the deployment model, the data architecture, or the core API. Everything else is deferred to Phase 3.

01Classify + Disambig

The fork test, applied to every candidate question.

Each Phase 1 question must pass a single test: if the answer were different, would the architecture be different? If the answer is no, the question is logged for Phase 3 and skipped now. Each question must also explain its architectural impact in parentheses, so the user knows what they're committing to.

The example the skill carries in its own documentation: "Is this deployed to cloud or on-premise? (Forks: containerization strategy, database choice, auth model)." The parenthetical makes the question feel less like a survey and more like a decision with consequences.

Phase 0
Size classification
Small / Medium / Large map to 1, 3, and 5 stages. The classification drives the entire downstream batch count and team composition.
Phase 1
Critical-path questions only
Maximum 3–5. Each must list the architectural axes it forks. Anything that doesn't fork the architecture is deferred to Phase 3.
Discipline
Single message
All Phase 1 questions arrive in one message. No drip-feed. The user can answer them at their own cadence; the architect waits.
Insurance
Crash checkpoint
docs/architect-outputs/phase-1.md captures every Q+A verbatim. Session-recovery rereads it to resume without restarting the conversation.

03 · Phase 2 — Parallel research

Three Explore agents, one synthesised report.

Technology selection is parallelised. The architect dispatches up to three Explore subagents — one per technology domain (backend, frontend, data/infra) — each instructed to rank candidates by performance first, capability second, power ceiling third, maturity fourth. Team familiarity is not a selection criterion: unfamiliar tools can be learned, but performance ceilings cannot be worked around.

PHASE 2 · RESEARCH Architect DISPATCHER Explore · Backend FRAMEWORKS · ORMS ASYNC PATTERNS Explore · Frontend FRAMEWORKS · STATE STYLING · CHARTS Explore · Data + Infra DATABASES · QUEUES CONTAINERIZATION Synthesised Tech Mapping Report

Three agents fan out, return ranked tables, architect synthesises into one comparable matrix

02Research

Performance first, familiarity never.

Each Explore agent receives the same context — project description, Phase 1 answers, hard constraints — and an explicit instruction: find the most efficient and powerful technology for the use case; team familiarity is NOT a factor. The output per agent is a comparison table ranked by performance, with benchmarks and capability scores cited.

The architect synthesises the three reports into a single Technology Mapping Report with the same columns across all domains. The Blueprint-aesthetic HTML overlay (/architect-html-renderer:tech-mapping) renders the comparison side-by-side so the user can pick decisively at the Phase 2 gate.

Dispatch
Agent tool · subagent_type Explore
Three parallel research subagents, one per domain. Each scoped to its category — no cross-pollination, no shared context bloat.
Criteria
Performance > capability > ceiling > maturity
Strict priority order. Familiarity appears only as a tie-breaker when all four upper criteria are equal across candidates.
Synthesis
Mapping Report
Cross-domain table with category / choice / performance edge / maturity / capability score / justification. One row per chosen tech.
Review
Blueprint comparison HTML
The tech-mapping renderer turns the MD into a per-domain comparison page with the recommended choice highlighted. User picks at the gate.

04 · Phase 3 — Refine scope

Numbered ambiguity, three-tier scope.

After the user provides domain context, the architect enumerates every remaining ambiguity as a numbered list with its architectural impact, then classifies every feature into one of three tiers: Essential v1 (system doesn't work without it), Valuable v2 (significant value, defer with extension point), Possible Future (note in timeline only).

03Scope

Stop asking when no remaining answer changes the architecture.

The skill defines explicit merge-point detection: stop asking ambiguity questions when (a) no remaining answer would change the architecture, (b) the user signals readiness ("just build it"), (c) the last three answers were "your call," or (d) all core entities and business rules are documented. Beyond that, asking more is just stalling.

The tier classification is shown to the user in an Editorial-aesthetic HTML overlay (/architect-html-renderer:scope-tiers) — three columns, each feature with its architectural-impact line, future-tier items annotated with their promotion condition.

Ambiguity resolution
Numbered impact list
Every open question gets a number and an "Impact:" line. The numbering makes it easy for the user to answer by reference.
Discovery tracks
Backend · frontend · data
Parallel sub-discoveries when needed — entities + relationships + business rules on the backend, user roles + workflows on the frontend, sources + transformations on the data side.
Tier classification
Essential / Valuable / Future
Every feature mentioned is sorted. Valuable items get an extension point documented; Future items get a promotion condition.
Review
Editorial 3-column HTML
Scope decisions feel like editorial choices about what the product is — the warm cream + serif aesthetic reflects that.

05 · Phase 4 — Staged architecture

A 400K token budget, drawn at design time.

The most consequential phase. The architect designs the staged architecture (1 / 3 / 5 stages per project size), computes the token budget against the 400K context window, draws the dependency graph between stages, and produces stage docs in dual resolution (a 400K-sized compact version for tight loads, a 1M-sized full version with ASCII diagrams). For projects with both backend and frontend, a mandatory observability layer is part of the architecture — telemetry isn't retrofittable.

400K BUDGET 8% 12% 5% 12% 4% 59% remaining for code FIXED · CLAUDE.MD + DOMAIN + .CONTEXT.MD VARIABLE · ACTIVE STAGE DOC + TASK PROMPT REMAINING FOR CODE

Fixed + variable ≤ 41% of context window · target ≥60% headroom for code per task

04Architect

Dual-resolution stage docs and a Mermaid dependency graph.

Each stage has two docs: the 400K version (~150 lines, bullets and tables only) for compact context loads, and the 1M version (~400 lines, ASCII diagrams + code patterns + interface definitions) for the full-context model. The architect produces both. Workers and auditors choose which to load based on the model they're running on.

The architecture review HTML overlay (/architect-html-renderer:architecture-review) is load-bearing — every downstream contract depends on this approval. The page shows the cross-stage Mermaid dependency graph, the directory tree, per-stage drill-down panels, and the token budget visualisation in a single Blueprint-aesthetic view.

Stages
1 / 3 / 5 by project size
Small = 1 stage, Medium = 3 (backend / frontend / integration), Large = 5. Each stage independently testable.
Dual resolution
400K compact + 1M full
Workers on the 400K model load the compact doc. Auditors and architects on the 1M model load the full doc with ASCII diagrams.
Observability
Mandatory for backend + frontend
If the project has both layers, the architect MUST include a dual-destination event bus (JSONL + WebSocket) in the architecture. Retrofitting is 10× harder.
Review
Blueprint plan-review HTML
The highest-stakes gate in the project — Mermaid dep graph + directory tree + per-stage panels + token budget. Must be reviewed as HTML, not chat text.

06 · Phase 5–6 — Contracts, handover

Iterative contract drafting, then a clean handover.

Phase 5 drafts the stage contracts in tight HIL collaboration. Each contract has a deliverables table, interface contracts with exact signatures, verification commands with expected output, and a completion checklist. The architect iterates with the human until each contract passes a single quality test: can a worker with only this contract and the project rules determine, with zero ambiguity, whether their work is complete?

05Contracts + Handover

Per-stage contracts, then CLAUDE.md as the directory.

The contract review HTML overlay (/architect-html-renderer:contract-review) is invoked once per stage. The Blueprint page renders the deliverables table, every interface signature in its own block with a parameter table and return type, each verification command paired with its expected output, and the batch dependency chain as a Mermaid diagram.

Phase 6 hands over three layers of documentation. Stage contracts are immutable after planning — workers and auditors grade against them. .context.md files are living docs updated by workers as they implement, one per architectural boundary, capped at 8–10 total. project_summary.md is the append-only session log written by the scribe. CLAUDE.md routes to all of them — it is a directory, not a warehouse.

Contract drafting
HIL iteration
Suggest breakdown → draft → human reviews HTML overlay → iterate until quality test passes → freeze in contracts/.
Three documentation layers
Contracts · context · summary
"What to deliver" (immutable) + "what was built" (living, per-module) + "what happened" (append-only session log).
CLAUDE.md
Directory, not warehouse
Routes to contracts and rules by file path. Never inlines them. Keeps the worker boot context small.
Decision timeline
Editorial filterable table
Every decision with its trigger-for-change. Read across the project's lifetime — the Editorial aesthetic gives it editorial weight.

07 · Five review-gate overlays

The architect's read-once review surface.

Markdown specs past ~100 lines stop being read — both by humans and by future-you. The HTML overlay at each gate is the intervention. Five purpose-built renderer commands in architect-html-renderer turn every architect output into a scannable page in the right aesthetic for that artefact. The markdown stays canonical and git-tracked; the HTML overlay is ephemeral, regenerable, and never enters the repo.

07Overlays

Two aesthetics. Five renderers. One design system.

The shared design system lives as a rule file at ~/.claude/rules/cto-orchestration/design-system.md. It explicitly assigns one aesthetic per artefact: Blueprint (deep slate + accent blue, IBM Plex Sans + IBM Plex Mono, subtle grid background) for engineering review, Editorial (warm cream + deep navy, Instrument Serif + JetBrains Mono) for decisions read across the project's lifetime. The discipline is what makes five different renderers feel like one system.

Phase 2
tech-mapping · Blueprint
Per-domain comparison table with performance metrics, capability scores, and the recommended choice highlighted.
Phase 3
scope-tiers · Editorial
Three-column tier breakdown (Essential v1 / Valuable v2 / Possible Future) with architectural-impact lines.
Phase 4
architecture-review · Blueprint
Mermaid dependency graph + directory tree + per-stage drill-down panels + token budget visualisation. The load-bearing gate.
Phase 5b · per stage
contract-review · Blueprint
Deliverables + interface contracts + verification commands + batch dependency chain. One invocation per contract.
Phase 6e
decision-timeline · Editorial
Filterable decision log grouped by phase with trigger-for-change column. Read across the project's lifetime.
Discipline
Anti-slop ruleset
Forbidden fonts (Inter, Roboto), forbidden palettes (indigo/violet, cyan-magenta-pink), forbidden patterns (emoji headers, gradient text). Two violations = regenerate.

08 · Full stack

Everything in version control.

Four layers, all editable as markdown / Astro / inline HTML, all in ~/.claude/.

Skill layer

  • SKILL.mdSeven-phase workflow
  • references/Deep guides per phase
  • templates/Seven deliverable templates
  • scripts/init-project-docs.sh

Templates

  • stage-doc-400k.template.mdCompact stage reference
  • stage-doc-1m.template.mdFull stage reference
  • stage-contract.template.mdHIL contract structure
  • domain-context.template.mdEntities + business rules
  • context-md.template.mdPer-module living doc
  • project-summary.template.mdAppend-only session log
  • batch-plan.template.mdExecution batches

Renderers

  • tech-mappingBlueprint comparison
  • scope-tiersEditorial 3-column
  • architecture-reviewBlueprint + Mermaid
  • contract-reviewBlueprint per stage
  • decision-timelineEditorial table

Agents

  • Explore × 3Parallel research subagents
  • Plan agentArchitecture design (Phase 4)
  • Skill toolRenderer invocation per gate
  • Crash recoveryRe-reads checkpoints on resume

09 · Design constraints

Every rule encodes a past failure mode.

The non-negotiable rules in the architect's behaviour are not preferences — each one is a fence around a specific way the skill has failed at least once.

Critical-path discipline
3–5 fork-the-architecture questions max
Generic surveys produce user fatigue and inferior answers. The fork test makes each question worth answering.
Performance first
Familiarity is not a selection criterion
Unfamiliar tools can be learned; performance ceilings cannot be worked around. Familiarity returns only as a tiebreaker.
400K token budget
≥60% headroom for code per task
Fixed (CLAUDE.md + domain + .context.md) plus variable (stage doc + task prompt) must leave at least 60% of the context window free.
Crash checkpoints
Every phase gate writes phase-N.md
Re-running the skill on a half-finished project reads the checkpoints and resumes — no recap, no restart, no lost decisions.
Contract quality test
Worker + rules → "done?" with zero ambiguity
If a contract leaves room for interpretation, the auditor can't grade against it. The HIL drafting loop catches ambiguity before freeze.
Observability mandatory
Backend + frontend → telemetry layer in architecture
Retrofitting telemetry is 10× harder than building it in. Even simple projects grow; debugging without it is blind.
MD canonical
HTML overlays never enter git
HTML diffs are noisy. Every overlay is regenerable from its source MD, so commits stay focused on the spec, not the rendering.
CLAUDE.md is a directory
Routes to contracts + rules · never inlines
Inlining bloats every worker spawn. The CLAUDE.md sends them to the right file; their context loads only what they need.

10 · Lessons learned

What this build taught me.

Lesson 01
Questions cost user time. Ask only the 3–5 that fork the architecture; defer everything else to Phase 3 where it can be enumerated as numbered ambiguity. A planning skill that asks 15 generic questions earns nothing but fatigue and inferior answers.
Lesson 02
For ambitious projects, performance-first selection beats team-familiarity. Unfamiliar tools can be learned in a weekend; a fundamental performance ceiling can't be worked around at any cost. Familiarity returns only as a tiebreaker.
Lesson 03
Token budgeting at design time prevents context starvation later. The 400K rule — fixed + variable ≤ 40% — gives every task at least 60% of the window free for code. Designing the budget AFTER the architecture is too late.
Lesson 04
HIL contract drafting catches ambiguity before it propagates. Once a contract is frozen and the executor team is running, ambiguity becomes a RED cycle. The quality test — "can a worker determine 'done?' with zero ambiguity" — is the cheapest filter.
Lesson 05
Crash checkpoints are insurance for the future-you who returns after the session crashed. The architect writes phase-N.md after every gate; recovery rereads them rather than restarting. The cost is small; the benefit shows up exactly once but it pays for itself the first time.
Lesson 06
HTML overlays at the review gate dramatically raise the chance the spec actually gets read. A 400-line markdown stage doc gets skimmed; a Blueprint HTML page with Mermaid, directory tree, and token budget viz gets reviewed. The markdown stays canonical; the HTML is the read-once review surface.