Show HN: Advanced Chunking in JavaScript/TypeScript with Chonkie
Hi HN,
We’re Shreyash and Bhavnick. We built Chonkie, an open-source library for advanced chunking and embedding of text and code. It was previously Python-only, but we just released a TypeScript version: https://github.com/chonkie-inc/chonkie-ts
Many AI projects in JS/TS (like those using Vercel's AI SDK or Mastra) rely on basic text splitters. But better chunking = better retrieval = better performance. That’s what Chonkie is built for.
Current native chunkers (in TS):
- Code Chunker – handles Python, TypeScript, etc.
- Recursive Chunker – rule-based, hierarchical splitting
- Token Chunker – split by token count (fully customizable)
- Sentence Chunker – split on sentence boundaries. Delimiters are customizable, so it works for multiple languages.
All chunkers support custom tokenizers, chunk overlap, delimiters, and more.
Coming soon in native TS (already available via the API client):
- Semantic Chunker – splits texts wherever it detects a shift in meaning.
- SDPM Chunker – merges semantically similar disjoint chunks
- Late Chunker – generates context-aware embeddings for each chunk
- Slumber Chunker – LLM-refined recursive chunks. Significantly reduces token usage (and thus cost) while maximizing chunk quality.
- Embeddings Refinery - Embed chunks with any embedding model
- Overlap Refinery – Create overlaps between consecutive chunks for better context preservation.
Chonkie is free, open-source, and MIT licensed. GitHub: https://github.com/chonkie-inc/chonkie-ts
We’d love your feedback, ideas, or contributions. Thanks! I love the typescript library. Python has always been friction for me when building AI apps. I'm going to deploy them on the web and, therefore, would prefer to have web-native tooling. Glad you like it! Are there other ways we might be able to tune the chunkers or describe the data that we might want chunked to get the best results? Or perhaps in the playground a way to easily given a type of input data run different chunkers side by side, or pipe them into each other to see best results? We don't have this yet but we will soon. Finding the right setup for your data is definitely tougher than it needs to be. when do you think overlap refinery will be available in the ts library? and how does it work? We're aiming to launch it by Monday