Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls
30-40% of RAG context is semantically redundant. Same information from docs, code, memory, and tools competing for attention. The model gets confused. Outputs become non-deterministic.
Everyone frames this as "save tokens." Wrong framing. The real issue is reliability—same workflow, same data, different results every run.
You can't prompt your way out of bad input.
Distill fixes the input:
1. Over-fetch from vector DB (50 chunks) 2. Agglomerative clustering groups similar chunks 3. Select best representative from each cluster 4. MMR reranking for diversity
Result: 8-12 diverse chunks. ~12ms overhead. Zero LLM calls. Deterministic.
Written in Go. Works with Pinecone, others like Qdrant, Weaviate are coming soon. Runs post-retrieval, pre-inference.
GitHub: https://github.com/Siddhant-K-code/distill Playground: https://distill.siddhantkhare.com
Happy to discuss the algorithms, tradeoffs, or use cases.
No comments yet.