Stop shipping vibes.Ship verified code.

Describe what you need. Shipwright builds the service, runs the tests, audits for production failures, and fixes what breaks. You get code that works.

Free while in beta · Uses your existing API keys · Your code stays local

shipwright0:00

▊

Backend

Multi-Model

Opus / Sonnet / Haiku / Gemini

Multi-CLI

Claude Code / Codex / Gemini CLI

Self-Correcting

Pipeline

The Reality

AI can plan, scaffold, and generate.
None of it is verified.

Youprompt.Codeappears.Itlooksright.Itcompiles.Thetestspass.

Thenyoudeployandtheroutesaren'twired.Theconfigreadsfromenvvarsthatdon'texist.Thetestswereassertingtrue.Theauthmiddlewareisinthefilebutnothingimportsit.

Soyouwriteanotherprompt.

“Fix the routes.”

Done. Updated route registration.

Auth is broken now.

Fixed auth middleware import.

Two tests failing.

Itgoesincircles.Thecodewasneververified.You'redebuggingoutputfromsomethingthatdoesn'tknowwhatworkingmeans.

Somepeoplethinkthefixismoreagents.AteamofAIagentsthatmanageeachother,revieweachother'scode,coordinatetasks.It'sfaster,sure.Butfasteratwhat?Yougetthesamebrokencode,generatedinparallel.ReviewingcodewithanotherLLMisnotverification.Itisasecondopinionfromthesamesource.

Othersthinktheanswerisbetterspecs.DetailedPRDs,carefultaskbreakdowns,acceptancecriteria.Thathelps.Butaspectellsanagentwhattobuild.Itdoesn'tverifywhatgotbuilt.

Thehardpartwasneverwritingthecode.Itwasknowingifthecodeworks.

Research:Veracode 2025 Stack Overflow 2025

What if AI followed the same engineering process your team does?

Plan. Build. Test. Audit. Correct. Ship.

Input

Pick a packA verified, production-ready service you build on top of.. Add your requirements.

Start from a verified, production-ready service. Shape it to your models and business logic with a PRD. Skip the boilerplate entirely.

Input

firebase-auth

Auth, sessions, RBAC, rate limiting

pack

Your requirements

Custom claims: role, orgId, tier

Sessions expire after 24 hours

Admin can revoke any session

Audit trail on all auth events

Rate limit failed logins

Existing codebase

src/routes/4 existing routes

src/middleware/2 middleware

src/config/env + plugins

Pack + requirements + codebase loaded

Discovery

Specialist agents. Coordinated output.

Each agent has a role and produces a specific artifact. Architecture feeds into test plans. Threat models feed into security checks. Live API research gets cached and shared. Every artifact is validated before planning starts.

Discoveryspecialist agents

Product24 requirements, 9 assumptions

Architect14 endpoints, 16 ADRs

QA Lead42 test scenarios

SecurityRBAC matrix, threat model

Domain6 domains detected

Doc Research8 API doc packets

PRD Validator94% coverage

spec/

architecture.md

product.contract.yaml

test_plan.md

rbac_matrix.md

threat_model.md

decisions/

ADR-001 … ADR-016

docs/

firebase-auth/overview.md

node-jwt/overview.md

Gate: PASS14 spec files · 16 ADRs · 8 doc packets

Plan

32 tickets. 5 dependency layers.

Ticket DAG generated from specs. Dependencies mapped. Six validation layers before a single line of code.

Ticket DAG32 tickets · 5 layers

Scaffold

Feature

Test

Config

T-01

T-02

T-03

T-04

T-05

T-06

T-07

T-08

T-09

T-10

T-11

T-12

T-13

T-14

T-15

T-16

T-17

T-18

T-19

T-20

T-21

T-22

T-23

T-24

T-25

T-26

T-27

T-28

T-29

T-30

T-31

T-32

Validation

Dependencies

Requirements

Security

Scaffold

Tests

Integrity

Plan locked

Review

It knows when to ask.

Assumptions flagged. Permission diffs surfaced. You approve what matters, or let it run on auto.

Review Gate3 items

Flagged Assumptions

Session expiry: 24 hoursAUTO

Rate limit: 5 attempts per minuteFLAG

OAuth provider: Google only?ASK

Permission Diff

+POST /auth/revoke

+DELETE /sessions/:id

~PATCH /users/:id/role

Mode

Auto

Confirm

Reviewed · Proceeding

Configure

Your models. Your keys. Your call.

Pick which model runs each tier. Bring your own API keys, connect your CLI, or let us handle it.

Configurationconstraints.yaml

Model Tiersconfigurable

fast→gemini-flashScaffold, config, simple edits

default→sonnetFeatures, tests, discovery

critical→opusAuth, security, plan verification

Execution Mode

CLI-ConnectedYour machine, your Claude sub

BYOKYour API keys, our infra

Fully ManagedWe handle everything

constraints.yaml

model_tiers:

fast: "haiku"

default: "gemini"

critical: "opus"

execution_tier: "byok"

Execution

Agents spin up. Context is scoped.

Each ticket gets its own agent. The orchestrator assembles its context from previous agents' outputs: the relevant spec sections, domain docs, dependency files. No shared 200k-token window. No prompt soup.

Executing0/12 tickets

spawn·haikufast

T-01scaffoldRUNNING

T-02error-classes

T-03auth-adapter

T-04route-hub

T-05session-store

T-06middleware

T-07rate-limiter

Context — T-014.2KB

pack.contract.yaml

Verification

Every ticket verified. Every check run.

Unit tests, lint, types, formatting, security audit, license scan, Docker build, container health, and a 47-criteria readiness check. Every check is deterministic tooling, not LLM review. Nothing ships until everything passes.

Verification0/11 checks

Code Quality

npm test

eslint .

tsc --noEmit

prettier --check

Security

npm audit

secrets scan

license check

Container

docker build

docker run → /health

docker run → /ready

Readiness

47-criteria scan

11 checks passed · ready for audit

Audit

Something breaks. The pipeline catches it.

47 criteria scanned. Security headers missing. Container running as root. Dead exports in plugins. Every other tool ships this. The pipeline flags it.

scaffoldDONE

error-classesDONE

auth-adapterDONE

route-hubDONE

session-storeDONE

middlewareDONE

Self-Correction

Corrective tickets. Automatic re-run.

Each finding becomes a ticket with priority, assignee, and source criteria. The engine assigns the fix. The same automated checks re-run. The loop closes itself.

scaffoldDONE

error-classesDONE

auth-adapterDONE

route-hubDONE

session-storeDONE

middlewareDONE

Complete

Full trace. Every decision. Every dollar.

32 tickets completed, 4 corrective fixes, 43/47 criteria passed, $14.20 total cost. Token-level billing, model attribution, and a complete audit trail.

Run ReportPASS

Tickets completed28/28

Hardening tasks12 applied

Tests354 passing

Requirement coverage94%

Duration3h 12m

Total cost$14.20

Artifacts

report.md

compliance_audit.json

events.ndjson (72 events)

decisions/ADR-001 … ADR-016

Doc cache updated · 3 patterns savednext run: faster

Input

firebase-auth

Auth, sessions, RBAC, rate limiting

pack

Your requirements

Custom claims: role, orgId, tier

Sessions expire after 24 hours

Admin can revoke any session

Audit trail on all auth events

Rate limit failed logins

Existing codebase

src/routes/4 existing routes

src/middleware/2 middleware

src/config/env + plugins

Pack + requirements + codebase loaded

1 of 10 stages10% complete

Real Builds

Read the output yourself.

Requirements, architecture, test plans, security models, and audit trails from real pipeline runs.

55/55tickets

456/456tests passing

5h 23mbuild time

11.2klines

Clone it. Run the tests. Use it as a template.

Full source, 354 tests, Dockerfile. Everything the pipeline generated. Yours to ship or build on.

.ai/product.contract.yaml92 lines

1# Firebase Auth Pack v1.2 -- Requirements
2# Source: PRD + domain research + Firebase Admin SDK docs
3# 38 requirements extracted, validated against live API
4# 9 capabilities, 22 authenticated endpoints + 2 public
5
6requirements:
7  - id: REQ-001
8    title: Single token verification with optional revocation check
9    risk: high
10    acceptance_criteria:
11      - POST /verify accepts Firebase ID token in request body
12      - Optional checkRevoked flag triggers network revocation check
13      - Returns uid, email, emailVerified, claims, iss, aud, iat, exp
14      - Returns generic 401 for any verification failure
15      - Does NOT distinguish between failure reasons in HTTP response
16
17  - id: REQ-003
18    title: User lookup by UID, email, and phone
19    risk: medium
20    acceptance_criteria:
21      - GET /users/:uid returns full user profile
22      - GET /users/by-email/:email returns user by email
23      - GET /users/by-phone/:phone returns user by phone (E.164)
24      - POST /users/batch accepts up to 100 mixed identifiers
25      - Returns 404 for unknown user, 400 for malformed input
26
27  - id: REQ-005
28    title: User CRUD with batch operations
29    risk: medium
30    acceptance_criteria:
31      - POST /users creates user (email, password, displayName, phone)
32      - PATCH /users/:uid updates whitelisted properties
33      - DELETE /users/:uid deletes single user (204)
34      - POST /users/:uid/disable and /enable toggle account state
35      - POST /users/batch-delete accepts up to 1000 UIDs
36      - GET /users returns paginated listing (maxResults, pageToken)
37
38  - id: REQ-007
39    title: Custom claims management
40    risk: high
41    acceptance_criteria:
42      - PUT /users/:uid/claims sets claims (replaces all existing)
43      - DELETE /users/:uid/claims clears all claims (204)
44      - Reserved claim names validated before SDK call
45      - Claims size validated (max 1000 chars serialized)
46
47  - id: REQ-009
48    title: Session cookie lifecycle
49    risk: high
50    acceptance_criteria:
51      - POST /sessions creates cookie from ID token + expiresIn
52      - POST /sessions/verify validates cookie with optional revocation
53      - Duration validated (5 min to 14 days per Firebase limits)
54
55  - id: REQ-011
56    title: Custom tokens and refresh token revocation
57    risk: high
58    acceptance_criteria:
59      - POST /tokens/custom mints token for UID with optional claims
60      - POST /users/:uid/revoke invalidates all refresh tokens
61      - Returns tokensValidAfterTime for confirmation
62
63  - id: REQ-013
64    title: Email action link generation
65    risk: medium
66    acceptance_criteria:
67      - POST /email-actions/password-reset generates reset link
68      - POST /email-actions/verification generates verification link
69      - POST /email-actions/sign-in generates passwordless sign-in link
70      - sign-in requires actionCodeSettings with url + handleCodeInApp
71
72  # ... 31 more requirements (REQ-002 through REQ-038)
73  # Full file: github.com/useshipwright/shipwright/builds/firebase-auth/spec/
74
75non_functional_requirements:
76  - id: NFR-001
77    title: Token verification latency
78    target: "<50ms p99"
79  - id: NFR-002
80    title: Test coverage
81    target: ">80% line coverage"
82  - id: NFR-003
83    title: Startup time
84    target: "<3s to healthy"
85
86constraints:
87  framework: Fastify 5
88  language: TypeScript (strict mode)
89  runtime: Node.js 22 LTS
90  test_runner: vitest
91  package_manager: pnpm
92  container: Docker (multi-stage, non-root)

FAQ

Questions? Answers.

Ready to try it?

Run Shipwright on your own PRD. See what your AI-generated code is actually missing.

Free while in beta · Uses your existing API keys · Your code stays local

Stop shipping vibes.Ship verified code.

AI can plan, scaffold, and generate.
None of it is verified.

Pick a packA verified, production-ready service you build on top of.. Add your requirements.

Specialist agents. Coordinated output.

32 tickets. 5 dependency layers.

It knows when to ask.

Your models. Your keys. Your call.

Agents spin up. Context is scoped.

Every ticket verified. Every check run.

Something breaks. The pipeline catches it.

Corrective tickets. Automatic re-run.

Full trace. Every decision. Every dollar.

Read the output yourself.

Identity Service

Job Ingestion Pipeline

Flight Deal Detector

Questions? Answers.

Ready to try it?

Stop shipping vibes.Ship verified code.

AI can plan, scaffold, and generate.None of it is verified.

Pick a packA verified, production-ready service you build on top of.. Add your requirements.

Specialist agents. Coordinated output.

32 tickets. 5 dependency layers.

It knows when to ask.

Your models. Your keys. Your call.

Agents spin up. Context is scoped.

Every ticket verified. Every check run.

Something breaks. The pipeline catches it.

Corrective tickets. Automatic re-run.

Full trace. Every decision. Every dollar.

Read the output yourself.

Identity Service

Job Ingestion Pipeline

Flight Deal Detector

Questions? Answers.

Ready to try it?

AI can plan, scaffold, and generate.
None of it is verified.