Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls

2 points by sidk24 a month ago · 0 comments · 1 min read

30-40% of RAG context is semantically redundant. Same information from docs, code, memory, and tools competing for attention. The model gets confused. Outputs become non-deterministic.

Everyone frames this as "save tokens." Wrong framing. The real issue is reliability—same workflow, same data, different results every run.

You can't prompt your way out of bad input.

Distill fixes the input:

1. Over-fetch from vector DB (50 chunks) 2. Agglomerative clustering groups similar chunks 3. Select best representative from each cluster 4. MMR reranking for diversity

Result: 8-12 diverse chunks. ~12ms overhead. Zero LLM calls. Deterministic.

Written in Go. Works with Pinecone, others like Qdrant, Weaviate are coming soon. Runs post-retrieval, pre-inference.

GitHub: https://github.com/Siddhant-K-code/distill Playground: https://distill.siddhantkhare.com

Happy to discuss the algorithms, tradeoffs, or use cases.

No comments yet.

Settings

Show HN: Distill – Remove redundant RAG context in 12ms, no LLM calls

Keyboard Shortcuts