How I ship software with agents
Working software shipped by an agent harness that refuses to lie about done. Every feature passes type-checks, tests, and a production build before it can call itself finished.
I'm not a software engineer. I'm an operator who builds working systems with AI. The honesty here is enforced by machinery, not trust.
0.99100 steps37%
A 99%-reliable-per-step agent finishes a 100-step feature correctly about a third of the time. Gates fix that, not a smarter prompt.
harness ▸ T-12 · saved-state hydration · enforce done-gate
done-gate ▸ passed: task is done
harness ▸
An agent's per-step reliability compounds to roughly a third over a long feature; a gated pipeline with a type-check, tests, and production-build done-gate is the fix.
Reliability comes from enforcement, not better prompts
An agent that's reliable per step still fails a long feature without enforcement. You don't fix that with a smarter prompt. You fix it with gates the agent cannot talk past.
- 01
A gated pipeline
Every feature runs spec → plan → build ⇄ review → QA → security → ship. Each phase produces an artifact on disk, gated before the next begins.
- 02
Hooks, not the honor system
Destructive commands blocked, protected files locked, database security required on every change, and a done-gate that runs type-check, tests, and a production build. A rule in a prompt drifts; a hook holds.
- 03
Honest status, fresh-eyes review
Workers report done, done-with-concerns, blocked, or needs-context. Blocked is a valid answer, forced-green is forbidden. A reviewer that didn't write the code reads every diff first.
Live work, not screenshots
Real shipped products, each built or proven through this harness.
A public knowledge product with an authenticated saved-library, shipped through this harness, live on Vercel.
A narrated demo of a supply-chain-finance prototype: smart contracts on a public testnet, ZK-proof linkage, role-based interfaces.
The harness's first validation build, where this pipeline was proven before it built anything else.
What your team gets
Features that are actually done when they say they're done. A delivery system where the non-negotiables are enforced, not hoped for, and an operator who built it and runs it daily.