Settings

Theme

Show HN: BM25opt – 30-40 x faster BM25 search algorithms (FOSS)

github.com

1 points by jankovicsandras a year ago · 2 comments · 1 min read

Reader

BM25opt is a score-compatible optimized rewrite of the popular https://github.com/dorianbrown/rank_bm25 , which is used by e.g. LangChain and Llamaindex. It's much faster and tries to fix some issues as well.

https://en.wikipedia.org/wiki/Okapi_BM25

compressedgas a year ago

I expected them to be API compatible. Not that it matters to me but I had looked to see.

  • jankovicsandrasOP a year ago

    This is a good point and was a difficult design decision. The reasons for changing the API are:

    - easier to use with untokenized corpus and questions

    - to fix issues with the tokenizing ( e.g. https://github.com/dorianbrown/rank_bm25/issues/38 ); also rank_bm25 provides no default tokenizer, a naive split-on-whitespace is a wrong choice

    - considerably simplify the code (way less SLOC)

    - point out the similarities of the algorithms for educational purpuses / further development

    In practice, the differences are minimal ( see Example 3: comparison with rank_bm25 ).

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection