Semantic search using llm context embeddings in web worker / js / browser
Background
If you want to do semantic search in the browser, you can use the GTE-Small model from Hugging Face / ONNX Runtime Web. You can also use other models, but this is a good starting point.
Model
- GTE Small ONNX model
- 60 MB
- This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
- Performs better than the OpenAI text-embedding-ada-002 Embeddings
- Takes roughly 500ms per embedding
Build & Run
nvm use yarn install yarn start
Test
- Check the console