Use cases

What people build with NEO

Real projects built using NEO, from LLM benchmarks to agent swarms. Pick a workflow below to browse, or start with a featured use case.

24 use cases · 7 workflows · production-tested

Featured

150+ tasks, 10 categories

Evaluate & Benchmark

Benchmarking LLMs on Real Tasks

An async LLM benchmarking platform that evaluates models from OpenAI, Anthropic, Google, and more across 150+ real-world tasks covering coding, reasoning, structured output, and long-context retrieval.

Read case study

Dual-LLM optimization loop

Evaluate & Benchmark

Auto prompt optimization

Closed-loop system: an optimizer LLM writes prompts and reads failure summaries, a target LLM runs batches against synthetic data, and a JSON ledger tracks every iteration until scores converge.

Read case study

+4.62% returns, 10 agents

Build Agents

Trading Agent Swarm

10 specialized agents coordinating over async message bus: +4.62% returns across 250 days of S&P 500 data.

Read case study

Browse by workflow

Same stack you're already debugging

Agents with brittle tool calls. Prompts that need another pass. Evals before you trust a model swap. NEO lives in VS Code or Cursor and helps you turn that work into real code and runs, so you iterate on behavior, not boilerplate.

Get started

Cursor

Claude Code

Codex

Windsurf

Zed

Antigravity

Continue

Cursor

Claude Code

Codex

Windsurf

Zed

Antigravity

Continue

eval_pipeline.py

  def run_eval(model, tasks):-     results = model.predict(tasks)+     results = batch_eval(model, tasks)+     log_regression(results)      return score(results)