Low-Latency Model Router: Stop Hardcoding LLM Choice Per Request

Most teams begin with one model string in code and call it done. That works until traffic patterns change, latency spikes, or a cheaper model can handle 80% of your workload just as well. This project treats model selection as a runtime decision instead of a compile-time constant.
Why This Router Matters
The core value is not just model scoring. It is operational flexibility. You can optimize for speed in interactive chat, optimize for cost in batch jobs, and still preserve fallback behavior when a provider degrades.
Because routing is local and score-based, decision overhead stays under 1 ms, and cache hits further reduce repeated request work.
Scoring Strategy
Each model gets a weighted composite score over latency, token cost, and quality. The same catalog can produce different winners depending on your priority mode.
| Mode | Latency | Cost | Quality |
|---|---|---|---|
speed | 0.70 | 0.20 | 0.10 |
cost | 0.20 | 0.70 | 0.10 |
quality | 0.10 | 0.20 | 0.70 |
balanced | 0.40 | 0.30 | 0.30 |
API Surface
POST /route
GET /models
GET /metrics
GET /health
GET /cache/stats
DELETE /cache
Run the Project
git clone https://github.com/dakshjain-1616/low-Latency-Model-Router
cd low-Latency-Model-Router
pip install -r requirements.txt
cp .env.example .env
python start_router.py
python -m src.cli.commands models
python -m src.cli.commands route 'What is 2+2?' --priority quality --dry-run
python -m src.cli.commands benchmark --iterations 10
If you are building against OpenRouter and want one stable orchestration layer instead of model churn in application code, this project is a pragmatic template.
Where This Fits in Production
Use this pattern when you have mixed workloads and no single model is consistently optimal. The router gives you policy-level control (speed, cost, quality) without changing application code for every model decision.
It is also a good foundation for adding budget guards, latency caps, request-class-based routing, and fallback observability over time.
Try NEO in Your IDE
Install the NEO extension to build and iterate on projects like this directly in your editor:
- VS Code: NEO in VS Code
- Cursor: Install NEO for Cursor