Skip to content
Santiago Isaza.

Clever_PO · F1 Power-Unit Validation System · Solo Engineer · 2026

F1 Power-Unit
Validation System.

A solo engineer R&D project proving what frontier AI can do for F1 powertrain testing and validation — a full Windows-native app shipped end-to-end with Claude Code as the engineering partner.

4
Surfaces, one backend
7
Service modules
1
Engineering-owned Data Folder
0
Admin perms needed to install

01 · System overview

One backend, four surfaces, one rule.

Clever_PO is a monorepo with three deployment surfaces sharing a single FastAPI backend, plus a fourth deployment target — an MCP server — that exposes the same business logic as tools Claude Code and Claude Desktop can call directly. An external Data Folder, versioned by the engineering team, sits beside the codebase and carries the TR configs, limits, parameter lists, and post-process functions the runtime loads at startup. The rule that holds the whole thing together: business logic lives in FastAPI, never in a UI.

CLEVER_PO Data Folder engineering-owned Engineering team authors rules · reviews concessions apps/web Next.js 15 · Tailwind v4 shadcn/ui apps/desktop Tauri 2.x + NSIS installer services/mcp_server MCP tools Claude Code / Desktop REST API third-party automation OpenAPI 3.x services/api FASTAPI · PYDANTIC V2 7 INDEPENDENT SERVICE MODULES SINGLE SOURCE OF TRUTH services/worker Celery · Redis 7 long-running jobs packages/db SQLModel schemas shared contract PostgreSQL SQLite (desktop) infrastructure/ Podman · Ansible Caddy · backup cron events.jsonl + WebSocket stream live HITL dashboard reads at startup

System architecture — engineering owns the rules, FastAPI owns the logic, every surface speaks the same language

The rule is that business logic lives in FastAPI. Once that holds, adding the desktop wrapper, the MCP server, and the REST API stops feeling like product work and starts feeling like wiring.

02 · How it works · seven steps

From dyno telemetry to certified pass-off.

A validation session has a fixed shape. An operator opens a subtest, the app generates the test-session metadata, ingests Parquet telemetry, evaluates engineering limits against the TR config, attributes any failures to either the Rig or the Unit Under Test, and then either certifies the unit, raises a concession with an audit trail, or fires off a structured notification email. Every step is logged through the telemetry layer; every step is callable from the MCP server.

01Configure

Engineering-owned Data Folder loads first.

At startup the app validates that a configured folder — on OneDrive, a NAS, or a local disk — contains the four engineering-owned subfolders: json_configs/ for TR definitions, limits/ for per-channel envelopes, parameter_lists/ for the allowed parameter sets, and post_process_functions/ for any custom analysis scripts the engineering team wants to drop in. Each is OK-checked before the app proceeds. The whole flow is designed so that changing a limit, adding a TR, or revising a parameter list does not require a code release. The right people own the right artefacts.

Clever_PO Data Folder configuration modal — engineering-owned TR configs, limits, parameter lists and post-process functions validated at startup
The startup validator checks all four engineering subfolders before the app proceeds.
Source
json_configs
Per-TR test definitions — what subtests run, in what order, against which channels.
Source
limits
Per-channel envelope rules — t-matrix expressions, scalar bounds, and per-channel Owner attribution (Rig | UUT).
Source
parameter_lists
Allowed parameter sets per test type — the form's drop-downs read from here, so adding a new dyno line doesn't need code.
Source
post_process_functions
Custom Python analysis scripts engineers can drop in — picked up at runtime, runnable per subtest.
02Metadata

The app generates the test-session metadata.

Every test session opens with a structured metadata surface: Serial Number, Sequence, Part #, Mileage, TR Number, Test Type (e.g. Test 02 — Full Pass Off), Dyno location, Operator, Assigned PU, PU Type, Run Number, and Associated Serial Number. The TR Number is auto-looked-up against the Data Folder the moment it is typed, so a wrong code never makes it past the form. Each subtest the TR defines (Subtest_01, Subtest_02, Subtest_03) is gated behind the previous one in the bottom tab strip, preventing an operator from skipping ahead. The right-hand RESULTS panel shows Rig and UUT in Pending until telemetry has been imported and the limits engine has run.

Subtest entry surface — operator generates the test metadata: Serial, Part, Mileage, TR Number, Test Type, Dyno location, Operator, Assigned PU, Run Number, Associated Serial
Every field is typed; the TR Number is auto-validated against the Data Folder.
Field
Serial / Part / Mileage
Unit identification — what is being tested, what hardware revision, how many cycles already on it.
Field
TR Number
Auto-looked-up against the Data Folder — a typo never propagates downstream.
Field
Test Type · Dyno · Operator
Session classification — which test, on which rig, by whom.
Field
Assigned PU · Run # · Assoc. Serial
Engineering linkage — connects this run to the powertrain identity, the run sequence, and any related serials.
03Evaluate

Polars parses, the t-matrix evaluates, the Owner column attributes.

The operator selects a Parquet log and the app's high-throughput Parquet processor — Polars with pl.scan_parquet streaming — ingests it. Pandas is explicitly forbidden in the codebase; the streaming path is what lets a dyno-cell workstation handle multi-GB telemetry files without paging out. Once parsed, the limits engine evaluates each engineering channel against its configured envelope per Pass-Off Point, computes the running value, and renders the comparison table. The most distinctive column is Owner — every failure is attributed to either the Rig (test rig issue — environment, instrumentation, harness) or the UUT (unit under test — the actual hardware being validated). That classification is what makes the rest of the workflow legitimate: a Rig failure does not mean the unit is bad, and the concession path treats the two cases categorically differently.

Limit comparison after Parquet import — channel-by-channel evaluation with Running Value vs Limit Value, FAIL/PASS attribution per Owner (Rig vs UUT), RESULTS panel flipped to FAIL
Running Value vs Limit Value per channel, with the Owner column attributing each failure to Rig or UUT.
Ingest
Polars · pl.scan_parquet
Streaming reader — handles multi-GB Parquet on a workstation without paging out. Pandas is forbidden codebase-wide.
Schema
t-matrix JSONB
Per-channel × per-pass-off-point limit expressions stored as JSONB — engineering edits in the Data Folder, the API evaluates against the running value.
Attribution
Rig | UUT owner
The single most important column the system adds. Routes a Rig failure to a different conversation than a UUT failure.
Output
Comparator table
Channel × Pass Off Point × Limit expression × Running Value × Limit Value × Owner — the artefact engineers act on.
04Concession

Audit-trailed exception handling.

When a failure is reviewable rather than rejectable, the engineer raises a concession. The dialog enforces a Standard Concession or Fail Concession type, a reason of at least two sentences (no one-line "looks fine" approvals), and a concession code. The decision is written to the database against the test session, so a later auditor can reconstruct exactly which channel failed, by how much, and on what justification the deviation was accepted. The concession is part of the audit trail, not a paper trail that lives separately in someone's inbox.

Concession Required dialog — Standard vs Fail Concession, structured reason (≥ 2 sentences), and a concession code recorded against the session
Standard vs Fail Concession, structured reason, concession code — all persisted against the session.
Classification
Standard | Fail Concession
Engineering distinguishes between deviations that fit known tolerances and deviations that require explicit failure acceptance.
Reason
≥ 2 sentences enforced
The form refuses one-liners. Forces the engineer to explain the deviation in enough detail that a future auditor can understand.
Identifier
Concession code
Traceable identifier — links the decision to engineering's broader concession ledger.
Persistence
Audit row on the session
Stored against the test session record — reconstructable years later without asking around.
05Notify

Deterministic fail-email notification.

When a fail email is needed, the app generates it for the operator instead of leaving them to write it by hand. The body is deterministic — TR Code, Serial Number, Sub-test, ISO timestamp, Total Failures, then a fixed ASCII table of every breached channel (Channel · NPOTP · Limit · Actual · Expected · Owner), then a summary line splitting Rig vs UUT failure counts. The email opens in Outlook addressed to the engineer and test-lead distribution lists, ready to send. The same email goes out the same way every time, so the receiving engineers can grep their inbox by TR Code and reconstruct a unit's history without having to ask around.

Auto-composed failure email body — TR Code / Serial / Sub-test / timestamp / total-failures notification with ASCII table of failed limits and Rig-vs-UUT summary, opened directly in Outlook
Deterministic body — the same shape every time, so the receiving inbox stays grep-able.
Subject
TR · Serial · Sub-test
A consistent subject line means the receiving engineers can filter, search, and triage by TR Code without parsing the body.
Body
ASCII fail table
Channel · NPOTP · Limit · Actual · Expected · Owner — reads in plain text in any inbox, no rendering surprises.
Summary
Rig vs UUT counts
The triage signal — five Rig failures and seven UUT failures has a different meaning than the reverse.
Destination
Outlook compose
Pre-addressed to the engineer + test-lead distribution lists. Ready to send; no copy-paste fingerprints.
06Compare

Multi-system Comparator — cross-serial trend analysis.

Once a session is recorded, it becomes part of a queryable history. The Comparator surface lets an engineer pick N serial numbers, choose a channel (for example TemperatureChannelNeg), and overlay every selected unit's curve against the configured t-matrix envelope (the dashed yellow upper and lower bounds). The Raw / t-Matrix toggle controls whether the engineer is looking at raw measurements or matrix-normalised values, and the date and sub-test filters narrow the dataset. This is the surface where a development engineer answers "is this drift unit-specific or fleet-wide?" without exporting CSVs or opening Excel.

CleverPO Comparator — select multiple serials, overlay a channel against the t-matrix envelope (dashed yellow), toggle Raw vs t-Matrix view, filter by date range and sub-test
Multi-serial overlay against the t-matrix envelope — the engineer's answer to "unit-specific or fleet-wide?"
Selection
N serial numbers
Side-by-side overlay across units. Hard to do in Excel; trivial here.
View
Raw / t-Matrix toggle
Switch between raw measurements and matrix-normalised values — different questions need different views.
Filter
Date · sub-test · status
Narrow the dataset to the question being asked. Filters stack.
Reference
t-matrix envelope
Dashed yellow upper/lower bounds — the engineering-defined acceptable region, drawn on every chart.
07Install

Native Windows installer — no admin password needed.

The desktop build ships as an NSIS Windows installer that bundles the Python runtime, the FastAPI backend, the Next.js UI assets, and the Tauri shell. The installer extracts everything into the user's profile so an admin password is not required, and the result is a Start-menu app that runs entirely offline against a local SQLite database — same UI, same limits engine, same MCP surface, just on a workstation instead of a server. The dyno cell does not need network connectivity to validate a unit.

Clever_PO Windows installer — Python runtime, FastAPI backend, Next.js UI and Tauri shell bundled into a single NSIS installer
Single NSIS bundle — Python + FastAPI + Next.js + Tauri, into the user profile, no admin needed.
Bundle
Python + FastAPI + Next.js + Tauri
Everything the desktop needs in one .exe. The Tauri shell runs the FastAPI backend as a sidecar process.
Install scope
User profile · no admin
Per-user install means a dyno-cell engineer doesn't need to chase IT for a privileged install. Friction removed.
Runtime
SQLite local DB
Single-file database in the user profile — offline-capable, no server connectivity required.
Distribution
NSIS Windows installer
Standard Windows installer flow — predictable for IT teams that need to whitelist the binary.

03 · MCP-native architecture

The MCP server is a first-class deployment target.

Clever_PO is MCP-native by design: the same FastAPI service modules that the web UI and the desktop sidecar call are also wrapped as Model Context Protocol tools the MCP server exposes to Claude Code and Claude Desktop. An engineer can run an entire pass-off conversation from a chat session — "open subtest 1 of TR CBA_001 for serial XXX_001, import the latest Parquet, evaluate limits, tell me which channels failed on the UUT side" — and the MCP server fulfils each step by calling the exact same code path the web UI uses. There is no separate "chat surface" codebase; every new capability added to FastAPI is one thin wrapper away from being conversational.

MCP-NATIVE Engineer Claude Desktop services/mcp_server services/api · FastAPI "evaluate limits for XXX_001" tool: evaluate_limits(serial, tr) call same service module comparator table + owner tags structured response "5 channels failed · UUT side"

MCP sequence — Claude Desktop drives a real validation, calling the same FastAPI service modules the web UI uses

Why MCP
Validation that can be driven from a chat
"Open subtest, evaluate limits, raise a concession with reason X" — a complete pass-off without touching the web UI.
Tool surface
Tools follow the workflow
Open subtest, import telemetry, evaluate limits, raise concession, draft fail email, query comparator, dump audit history.
No code duplication
One FastAPI module per business action
Each MCP tool is a thin wrapper around an existing service module. The chat path and the UI path cannot diverge.
Same persistence
Same SQLModel, same audit trail
A concession raised via chat lands in the database identically to one raised via UI. The auditor cannot tell which path was used — by design.

04 · Telemetry + HITL development loop

Observability is the AI coder's nervous system.

A custom telemetry layer is built into the application from the beginning. While the app runs — both in production and during development — it emits structured events to a JSONL log file and broadcasts the same events live over a WebSocket stream. One vocabulary, two consumers. When Playwright drives the UI during development, Claude Code and a human reviewer both subscribe to the WebSocket; when a session passes silently but an internal error event fires, the loop catches the discrepancy in real time.

TELEMETRY · HITL Claude Code Clever_PO app events.jsonl + WS Human reviewer apply code change · drive via Playwright emit events as it runs error/warning events live event stream Both Claude and the human see the same source of truth next iteration decision

The dev loop — telemetry collapses "Playwright says green" + "telemetry says error" into one observable stream

05 · Engineering-owned config sample

A TR config, a per-channel limit, and a Polars evaluation.

All three of these artefacts live in the Data Folder — they are owned by the engineering team, versioned in their git repo, and reloaded by the Clever_PO API at startup. The Python at the bottom is what FastAPI runs against the rendered envelope.

// data-folder/json_configs/CBA_001.json — engineering-owned TR definition
{
  "tr_code": "CBA_001",
  "test_type": "Test 02 — Full Pass Off",
  "subtests": [
    { "id": "Subtest_01", "channels": ["TemperatureChannelNetPost", "newNPO"] },
    { "id": "Subtest_02", "channels": ["TemperatureChannelPos_mean"] }
  ]
}

// data-folder/limits/TemperatureChannelNetPost.json — per-channel envelope rule
{
  "channel": "TemperatureChannelNetPost",
  "owner": "UUT",
  "envelope": {
    "type": "tmatrix",
    "pass_off_points": {
      "1":   "x == 2",
      "101": "x == 2",
      "201": "x == 2",
      "202": "x == 2",
      "203": "x == 2"
    }
  }
}

// services/api/limits.py — Polars evaluates the envelope against the running value
df = pl.scan_parquet(parquet_path)
running = df.filter(pl.col("channel") == channel).select("value").collect()
verdict = evaluate_envelope(envelope, running, owner="UUT")
// → pass / fail with Rig|UUT attribution, ready for the comparator table

06 · Engineering constraints

The decisions that shaped the architecture.

Every choice below traces back to a specific constraint or principle that the project committed to from day one. Treating these as architecture, not implementation detail, kept the four-surface build coherent across months.

Data layer
Polars-only · no Pandas anywhere
pl.scan_parquet streaming for multi-GB Parquet on a workstation. The "no Pandas" rule is enforced in CI.
Failure attribution
Rig vs UUT Owner as a primitive
The single most useful column the system adds. Routes Rig failures to a different conversation than UUT failures.
Chat surface
MCP-as-deployment, not chat-skin
The MCP server is a first-class target alongside web + desktop + REST. Same FastAPI modules; cannot drift.
Desktop
Tauri sidecar runs FastAPI
The desktop app wraps the same Next.js UI; the Tauri shell launches the Python backend as a child process. No reimplementation.
Persistence
SQLModel · dual-DB (Postgres + SQLite)
One schema package targets PostgreSQL in server mode and SQLite in desktop mode. Engineering writes models once.
Engineering rules
JSONB t-matrix in DB · files in Data Folder
Engineering edits the limits as JSON files; the API reads them, stores t-matrix configurations in JSONB columns, evaluates them per session.
Distribution
NSIS installer · no admin needed
Per-user install removes IT friction. A dyno-cell engineer installs in 30 seconds; the app runs entirely offline.
Quality gate
400 / 9 / 0 pytest baseline · no regressions
Every bug becomes a failing test before the fix. Every commit holds the baseline. The auditor sees the test counts on every PR.

07 · Full tech stack

Everything in the codebase.

Broken down by category. Choices favour typed contracts at every boundary, local execution where possible, and one backend per business action over per-surface duplication.

Backend

  • PythonFastAPI orchestrator
  • FastAPIHTTP + OpenAPI surface
  • Pydantic v2Boundary types
  • SQLModelPostgres + SQLite schemas
  • Celery · Redis 7Long-running jobs

Frontend

  • Next.js 15Web app
  • TypeScriptTyped everywhere
  • Tailwind v4Utility CSS
  • shadcn/uiComposable primitives
  • RechartsComparator visualisation

Desktop · Data

  • Tauri 2.xNative desktop shell
  • NSISWindows installer
  • SQLiteLocal DB (desktop mode)
  • PolarsDataFrames (no Pandas)
  • DuckDBAd-hoc analytical SQL

Infra · Dev partner

  • Claude CodeDevelopment partner
  • MCP serverConversational deployment target
  • PodmanContainer runtime
  • AnsibleServer provisioning
  • CaddyReverse proxy + TLS

08 · Lessons learned

What this build taught me.

Lesson 01
Configuration ownership is architecture, not file-system bureaucracy. Putting TR configs, limits, parameter lists and post-process functions in a Data Folder — owned by engineering, not by developers — collapsed an entire class of "ship a release to change a number" conversations. The right people own the right artefacts.
Lesson 02
Owner attribution is the single most valuable column the system adds. Classifying every breached limit as a Rig issue or a UUT issue is what makes the concession workflow legitimate. Without that split, every failed test ends up in the same blunt bucket and every concession reads like an excuse.
Lesson 03
MCP is a first-class deployment target, not chat-skin novelty. Treating Claude Code and Claude Desktop as legitimate users of the application — not as a wrapper over it — let me drive complete validation workflows from a conversation, against the same FastAPI service modules the web UI uses. Adding a new tool is one wrapper away from any new endpoint.
Lesson 04
Observability is the AI coder's nervous system. Pairing Playwright for visual reality with a structured telemetry stream gave both the agent and the human the same evidence at the same time. Claude could no longer claim "done" if the telemetry contradicted the UI; the human reviewer could no longer feel out of the loop, because every event was streamed live.
Lesson 05
Choose the data layer once, choose it well. Polars from day one kept the pipeline fast and predictable as the catalogue grew. No retrofit, no rewrite, no Pandas fallback — the same call to pl.scan_parquet handles a single subtest and a full multi-GB session.