Fuzzify: A tiny library for fuzzy search

23 points by rumblertumbler a year ago · 12 comments

Reader

iirc the go-to data structure for this kind of problem is called a “vantage point tree” (vp tree).

Another approach that I haven’t explored is to prep the array of words to search into a tree. normal words in Latin languages have lots of common prefixes and suffixes so you can dramatically reduce the amount of nodes (see my own old blog which compresses a scrabble dictionary https://williame.github.io/post/87682811573.html - same prefix suffix sharing our work on non-scrabble rotations too). Now walk the tree doing Levenshtein but checking multiple words at once?

jbaiter a year ago

Not for strings, no. The default algorithm would be Levenshtein edit distance and friends (i.e. what this library uses). If you want to get fancy, you could even go for something like Levenshtein automata (which are used by Lucene to implement fuzzy search across terms).
- myaccountonhn a year ago
  
  In my case I needed to match multiple names (consisting of 2-5 words), I had better matches and performance using the bag of words model (https://en.wikipedia.org/wiki/Bag-of-words_model).

rgbjoy a year ago

I’m a bit confused. “Unit” brings up Turkmenistan before United States.

Arch485 a year ago

Turkmenistan is 8 edits, United States is 9. Levenshtein Distance is not actually a good search algorithm :)
card_zero a year ago

That's correct, yes. Turkmenistan: 8 deletions (Trkmesan). United States: 9 deletions (ed States), including the space character.
d0100 a year ago

You'll have to add more weights to substring matches, fuzzy search by itself is usually not enough for intuitive search

mightyham a year ago

There are tons of js fuzzy search libraries with basically the same functionality. Is there any reason I should use this over others?

finger a year ago

Regarding the name, doesn’t it implify fuzzification of data?

Settings

Fuzzify: A tiny library for fuzzy search

Keyboard Shortcuts