Spec to Ship: A Multi-Agent Pipeline That Turns Feature Ideas into Production Code

View on GitHub

Pipeline Architecture

The Problem

The gap between having a spec and having working code is where most engineering time disappears. Writing the spec is fast. Implementing it, writing tests for it, and reviewing it for security issues takes most of the sprint. For well-defined features with clear acceptance criteria, most of that work is mechanical translating specs into code following established patterns yet it consumes senior engineering time that could be spent on harder problems.

NEO autonomously built a system to close that gap. Spec-to-Ship is a multi-agent pipeline that takes a feature idea as input and outputs tested, production-ready TypeScript code. No manual implementation steps. No separate test-writing pass.

Five Agents, One Pipeline

The pipeline coordinates five specialized agents in sequence. Each agent has a defined role and hands its output directly to the next.

The Architect Agent receives the raw feature description and produces a comprehensive technical specification. It determines the right approach, defines the data structures, maps out the implementation strategy, and documents the acceptance criteria. This output drives everything downstream.

The Planner Agent takes the spec and converts it into a dependency-ordered task list. Order matters here. Tasks that depend on each other need to be sequenced correctly, and tasks that are independent can be parallelized. The planner handles this analysis so the engineer agent doesn't have to.

The Engineer Agent implements the code. Full TypeScript with complete type safety. It works through the task list produced by the planner, generating the actual implementation files. This is the primary output of the pipeline.

The QA Agent writes Vitest test suites against the implementation. It tests to the acceptance criteria defined in the spec, not just to the code. That distinction matters: code coverage isn't the same as spec compliance.

The Reviewer Agent conducts the final audit. Security checks, performance review, code quality assessment. It's the last gate before the output is considered done.

Why Agent Specialization Works

You could attempt this with a single LLM prompt. It won't work well. A single prompt trying to simultaneously architect, plan, implement, test, and review produces mediocre results across all dimensions because the model can't hold all those concerns in focus at once.

Specialization changes this. Each agent's context is tuned for its task. The architect has the feature description. The engineer has the spec. The QA agent has both the spec and the implementation. Each agent receives exactly what it needs and nothing more.

The result is that each stage produces noticeably better output than a single-prompt approach. The architecture is more coherent. The implementation matches the spec. The tests actually test what they should.

Reliability Under Real Conditions

Production systems fail in predictable ways. Rate limits hit. JSON parsing fails on unexpected model output. Long-running jobs time out. NEO built explicit handling for all of these.

Exponential backoff retry logic handles rate limits automatically. The system waits, then retries with increasing delays to avoid hammering the API. JSON parsing failures trigger automatic recovery, re-prompting the model with clearer output constraints. The entire pipeline has a 20-minute timeout limit so runaway jobs don't accumulate.

Every run produces a timestamped output directory with all artifacts and logs preserved. If something goes wrong, you can inspect exactly what each agent produced and where the failure occurred.

Running It

The prerequisites are minimal: Node.js 20 or higher, and an OpenRouter API key in a .env file. Install dependencies with npm install.

Two interfaces are available. The interactive CLI (npm run start) walks you through providing the feature description and runs the pipeline. The web dashboard (npm run dev) provides a browser-based interface at localhost:3000 with a more visual representation of pipeline progress.

Both interfaces produce the same output: TypeScript implementation files, test suites, and a reviewer report, organized in a timestamped directory.

Where This Fits in Real Engineering

This isn't a replacement for senior engineers. The pipeline excels at the implementation and testing phases of well-defined features. Features with clear acceptance criteria and bounded scope are where it performs best.

It's particularly useful for:

For ambiguous problems or features requiring deep domain knowledge, human judgment remains essential for the architectural decisions. But once those decisions are made, Spec-to-Ship can handle the downstream execution.

The Broader Point

Software teams spend enormous amounts of time on implementation work that is largely mechanical: translating specs into code that follows established patterns, writing tests that verify spec compliance, reviewing for common issues. These tasks are high-volume and time-consuming without being intellectually demanding.

Multi-agent pipelines like this one are a better fit for mechanical implementation work than for open-ended architectural thinking. That's not a limitation. It's a design choice. Knowing what to automate and what to keep human is most of the engineering judgment here.

Production ML Engineering at Scale

How to Build This with NEO

Open NEO in VS Code or Cursor and describe what you want to build. A good starting prompt for this project:

"Build a Node.js multi-agent pipeline with five sequential agents Architect, Planner, Engineer, QA, and Reviewer that takes a plain-language feature description and produces typed TypeScript implementation files, Vitest test suites, and a security audit report. Use OpenRouter for LLM calls with exponential backoff for rate limits and a 20-minute hard timeout. Output all artifacts to a timestamped directory. Add both an interactive CLI and a Next.js web dashboard at localhost:3000 with live pipeline progress."

Build with NEO →

NEO generates the agent orchestration layer, per-agent prompt templates, retry logic, and the web dashboard. From there you iterate ask it to add dependency-ordered task sequencing in the Planner agent, add JSON schema validation for inter-agent handoffs to catch malformed outputs, or add a comparison mode that runs two different LLM models through the same spec and diffs their implementations.

To run the finished project:

git clone https://github.com/dakshjain-1616/Spec-To-Ship
cd Spec-To-Ship
npm install

Add OPENROUTER_API_KEY=sk-or-... to a .env file, then run:

npm run start

Enter a feature description and watch the five agents work in sequence producing TypeScript code, tests, and a reviewer report in a timestamped output directory, typically in 3 to 10 minutes.

NEO built a spec-to-ship multi-agent pipeline where a plain-language feature description automatically becomes typed TypeScript code, Vitest test suites, and a security audit produced by five specialized agents working in sequence. See what else NEO ships at heyneo.com.

Try NEO in Your IDE

Install the NEO extension to bring AI-powered development directly into your workflow: