Settings

Theme

Count–Min Sketch

en.wikipedia.org

2 points by deborahjacob 2 years ago · 2 comments

Reader

two_handfuls 2 years ago

How does this compare to HLL?

  • deborahjacobOP 2 years ago

    Count–Min Sketch requires each input to pass through multiple independent hash functions, which is computationally expensive. The workaround is to use a single hash function but use part of its output to split values into one of many buckets hence, simulating a situation in which we had m different hash functions. This costs nothing in terms of accuracy but saves computing many independent hash functions. This procedure is called stochastic averaging and has a predictable bias towards larger estimates. Durand-Flajolet corrected this bias using an algorithm called LogLog. HLL uses a different type of averaging. Instead of the geometric mean used in LogLog, Flajolet et al. proposed using the harmonic mean. https://engineering.fb.com/2018/12/13/data-infrastructure/hy...

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection