Settings

Theme

Which vector similarity metric should I use?

imaurer.com

2 points by imaurer 3 years ago · 4 comments

Reader

sharemywin 3 years ago

Does this seem right?

| Task | Distance Measure |

|-------------------------------|-----------------------|

| Document classification | Cosine Distance |

| Semantic search | Cosine Distance |

| Recommendation systems | Cosine Distance |

| Image recognition | Euclidean Distance (L2)|

| Speech recognition | Euclidean Distance (L2)|

| Handwriting analysis | Euclidean Distance (L2)|

| Recommendation systems | Inner Product (Dot Product)|

| Collaborative filtering | Inner Product (Dot Product)|

| Matrix factorization | Inner Product (Dot Product)|

| Image processing | L2-Squared Distance |

| Error detection and correction| Hamming Distance |

| DNA sequence comparison | Hamming Distance |

| Taxicab geometry | Manhattan Distance |

| Chessboard distance | Manhattan Distance |

messe 3 years ago

Even ignoring vector magnitudes, wouldn't cosine distance as a measure of similarity only make sense if you're working with a convex set? That seems like it's far from a guarantee working in a high-dimensional space.

  • imaurerOP 3 years ago

    Yes, cosine distance works best in convex or normalized sets. Thinking about adding this caveat. Thanks for the question.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection