Skip to content

Stop shipping vibes.Ship verified code.

Describe what you need. Shipwright builds the service, runs the tests, audits for production failures, and fixes what breaks. You get code that works.

Free while in beta · Uses your existing API keys · Your code stays local

shipwright0:00

Backend

Multi-Model

Opus / Sonnet / Haiku / Gemini

Multi-CLI

Claude Code / Codex / Gemini CLI

Self-Correcting

Pipeline

The Reality

AI can plan, scaffold, and generate.
None of it is verified.

Youprompt.Codeappears.Itlooksright.Itcompiles.Thetestspass.

Thenyoudeployandtheroutesaren'twired.Theconfigreadsfromenvvarsthatdon'texist.Thetestswereassertingtrue.Theauthmiddlewareisinthefilebutnothingimportsit.

Soyouwriteanotherprompt.

“Fix the routes.”
Done. Updated route registration.
Auth is broken now.
Fixed auth middleware import.
Two tests failing.

Itgoesincircles.Thecodewasneververified.You'redebuggingoutputfromsomethingthatdoesn'tknowwhatworkingmeans.

Somepeoplethinkthefixismoreagents.AteamofAIagentsthatmanageeachother,revieweachother'scode,coordinatetasks.It'sfaster,sure.Butfasteratwhat?Yougetthesamebrokencode,generatedinparallel.ReviewingcodewithanotherLLMisnotverification.Itisasecondopinionfromthesamesource.

Othersthinktheanswerisbetterspecs.DetailedPRDs,carefultaskbreakdowns,acceptancecriteria.Thathelps.Butaspectellsanagentwhattobuild.Itdoesn'tverifywhatgotbuilt.

Thehardpartwasneverwritingthecode.Itwasknowingifthecodeworks.

What if AI followed the same engineering process your team does?

Plan. Build. Test. Audit. Correct. Ship.

01
Input

Pick a packA verified, production-ready service you build on top of.. Add your requirements.

Start from a verified, production-ready service. Shape it to your models and business logic with a PRD. Skip the boilerplate entirely.

Input
firebase-auth
Auth, sessions, RBAC, rate limiting
pack
Your requirements
Custom claims: role, orgId, tier
Sessions expire after 24 hours
Admin can revoke any session
Audit trail on all auth events
Rate limit failed logins
Existing codebase
src/routes/4 existing routes
src/middleware/2 middleware
src/config/env + plugins
Pack + requirements + codebase loaded
02
Discovery

Specialist agents. Coordinated output.

Each agent has a role and produces a specific artifact. Architecture feeds into test plans. Threat models feed into security checks. Live API research gets cached and shared. Every artifact is validated before planning starts.

Discoveryspecialist agents
Product24 requirements, 9 assumptions
Architect14 endpoints, 16 ADRs
QA Lead42 test scenarios
SecurityRBAC matrix, threat model
Domain6 domains detected
Doc Research8 API doc packets
PRD Validator94% coverage
spec/
architecture.md
product.contract.yaml
test_plan.md
rbac_matrix.md
threat_model.md
decisions/
ADR-001 … ADR-016
docs/
firebase-auth/overview.md
node-jwt/overview.md
Gate: PASS14 spec files · 16 ADRs · 8 doc packets
03
Plan

32 tickets. 5 dependency layers.

Ticket DAG generated from specs. Dependencies mapped. Six validation layers before a single line of code.

Ticket DAG32 tickets · 5 layers
Scaffold
Feature
Test
Config
L1
T-01
T-02
T-03
T-04
T-05
L2
T-06
T-07
T-08
T-09
T-10
T-11
T-12
L3
T-13
T-14
T-15
T-16
T-17
T-18
T-19
L4
T-20
T-21
T-22
T-23
T-24
T-25
T-26
L5
T-27
T-28
T-29
T-30
T-31
T-32
Validation
Dependencies
Requirements
Security
Scaffold
Tests
Integrity
Plan locked
04
Review

It knows when to ask.

Assumptions flagged. Permission diffs surfaced. You approve what matters, or let it run on auto.

Review Gate3 items
Flagged Assumptions
Session expiry: 24 hoursAUTO
Rate limit: 5 attempts per minuteFLAG
OAuth provider: Google only?ASK
Permission Diff
+POST /auth/revoke
+DELETE /sessions/:id
~PATCH /users/:id/role
Mode
Auto
Confirm
Reviewed · Proceeding
05
Configure

Your models. Your keys. Your call.

Pick which model runs each tier. Bring your own API keys, connect your CLI, or let us handle it.

Configurationconstraints.yaml
Model Tiersconfigurable
fastgemini-flashScaffold, config, simple edits
defaultsonnetFeatures, tests, discovery
criticalopusAuth, security, plan verification
Execution Mode
CLI-ConnectedYour machine, your Claude sub
BYOKYour API keys, our infra
Fully ManagedWe handle everything
constraints.yaml
model_tiers:
fast: "haiku"
default: "gemini"
critical: "opus"
execution_tier: "byok"
06
Execution

Agents spin up. Context is scoped.

Each ticket gets its own agent. The orchestrator assembles its context from previous agents' outputs: the relevant spec sections, domain docs, dependency files. No shared 200k-token window. No prompt soup.

Executing0/12 tickets
spawn·haikufast
T-01scaffoldRUNNING
T-02error-classes
T-03auth-adapter
T-04route-hub
T-05session-store
T-06middleware
T-07rate-limiter
Context — T-014.2KB
pack.contract.yaml
07
Verification

Every ticket verified. Every check run.

Unit tests, lint, types, formatting, security audit, license scan, Docker build, container health, and a 47-criteria readiness check. Every check is deterministic tooling, not LLM review. Nothing ships until everything passes.

Verification0/11 checks
Code Quality
npm test
eslint .
tsc --noEmit
prettier --check
Security
npm audit
secrets scan
license check
Container
docker build
docker run → /health
docker run → /ready
Readiness
47-criteria scan
11 checks passed · ready for audit
08
Audit

Something breaks. The pipeline catches it.

47 criteria scanned. Security headers missing. Container running as root. Dead exports in plugins. Every other tool ships this. The pipeline flags it.

scaffoldDONE
error-classesDONE
auth-adapterDONE
route-hubDONE
session-storeDONE
middlewareDONE
09
Self-Correction

Corrective tickets. Automatic re-run.

Each finding becomes a ticket with priority, assignee, and source criteria. The engine assigns the fix. The same automated checks re-run. The loop closes itself.

scaffoldDONE
error-classesDONE
auth-adapterDONE
route-hubDONE
session-storeDONE
middlewareDONE
10
Complete

Full trace. Every decision. Every dollar.

32 tickets completed, 4 corrective fixes, 43/47 criteria passed, $14.20 total cost. Token-level billing, model attribution, and a complete audit trail.

Run ReportPASS
Tickets completed28/28
Hardening tasks12 applied
Tests354 passing
Requirement coverage94%
Duration3h 12m
Total cost$14.20
Artifacts
report.md
compliance_audit.json
events.ndjson (72 events)
decisions/ADR-001 … ADR-016
Doc cache updated · 3 patterns savednext run: faster
Real Builds

Read the output yourself.

Requirements, architecture, test plans, security models, and audit trails from real pipeline runs.

55/55tickets
456/456tests passing
5h 23mbuild time
11.2klines
FAQ

Questions? Answers.

Ready to try it?

Run Shipwright on your own PRD. See what your AI-generated code is actually missing.

Free while in beta · Uses your existing API keys · Your code stays local