We use cookies to improve your experience and understand site usage. By continuing, you agree to our Privacy Policy and Terms of Service.

Use cases

What people build with NEO

Real projects built by NEO — from LLM benchmarks to agent swarms. Pick a workflow below to browse, or start with a featured use case.

Featured

150+ tasks, 10 categories

Evaluate & Benchmark

Benchmarking LLMs on Real Tasks

An async LLM benchmarking platform that evaluates models from OpenAI, Anthropic, Google, and more across 150+ real-world tasks covering coding, reasoning, structured output, and...

Dual-LLM optimization loop

Evaluate & Benchmark

Auto prompt optimization

Closed-loop system: an optimizer LLM writes prompts and reads failure summaries, a target LLM runs batches against synthetic data, and a JSON ledger tracks every iteration until scores converge.

+4.62% returns, 10 agents

Build Agents

Trading Agent Swarm

10 specialized agents coordinating over async message bus: +4.62% returns across 250 days of S&P 500 data.

Browse by workflow

Same stack you're already debugging

Agents with brittle tool calls. Prompts that need another pass. Evals before you trust a model swap. NEO lives in VS Code or Cursor and helps you turn that work into real code and runs, so you iterate on behavior, not boilerplate.

Get started