RAG with Retrieval-Time Semantic Deduplication: 30-50% Fewer Tokens, Better Answers

Traditional RAG usually takes top-k chunks and forwards them straight into generation. In real corpora, those chunks are frequently repetitive, which wastes context window budget and dilutes signal.
This project inserts a semantic dedup stage between retrieval and generation. It preserves coverage while removing overlap, so the model sees more unique information per token.
Why It Works
After retrieval, chunks are embedded locally and compared with cosine similarity. A greedy filter keeps high-relevance passages and drops near-duplicates above a configurable threshold.
That one change often improves factual sharpness while lowering spend, especially in documentation-heavy datasets.
Run the Project
git clone https://github.com/dakshjain-1616/RAG-with-Retrieval-Time-Semantic-Deduplication
cd RAG-with-Retrieval-Time-Semantic-Deduplication
pip install -r requirements.txt
export OPENROUTER_API_KEY=sk-or-...
python run_rag.py ingest --docs ./data
python run_rag.py query --query 'What is semantic deduplication?' --log-metrics
python run_rag.py retrieve --query 'What is semantic deduplication?' --threshold 0.95 --log-metrics
The canonical CLI entrypoint is python run_rag.py with ingest, retrieve, query, and generate subcommands, so it is easy to benchmark each stage independently.
Architecture Walkthrough
The rag retrieval semantic deduplication repository is organized around a clear pipeline, so you can trace the full flow from input handling to final output without guesswork. This makes onboarding easier for new contributors and helps teams debug faster when behavior changes after updates.
Practical Use Cases
If you are evaluating rag retrieval semantic deduplication for production, start with a small real-world dataset, run the included commands end to end, and compare output quality, latency, and operational complexity. This gives a practical signal that is stronger than a toy demo.
Implementation Notes
The project is useful as both a standalone tool and a reference implementation. You can copy patterns from this codebase into your own stack, especially around evaluation discipline, reproducibility, and operator visibility.
Try NEO in Your IDE
Install the NEO extension to bring AI-powered development directly into your workflow:
- VS Code: NEO in VS Code
- Cursor: Install NEO for Cursor