Settings

Theme

Consistently faster and smaller compressed bitmaps with Roaring (2016)

arxiv.org

54 points by hambandit a year ago · 8 comments

Reader

o11c a year ago

Title is confusing: this is not about the original "Roaring", but an extension of it called "Roaring+Run".

Here, "bitmap" = "set of sometimes-compact integers". The "uncompressed" and several "rle" implementations are obvious. Hm, except this only seems to be talking about a particularly-naive RLE approach (suitable for storage but not computation)? If you're doing computation I expect you to use absolute offsets rather than relative ones which means you can just do binary search (the only downside of this is that you can't use variable-length integers now).

Roaring is just a fixed 2-level trie, where the outer node is always an array of pointers and where the inner nodes can be either uncompressed bitvectors (if dense) or an array of low-half integers (if sparse). Also, it only works for 32-bit integers at a fundamental level; significant changes are needed for 64-bit integers.

This paper adds a third representation for the inner node, the bsearch'able absolute RLE method I mentioned earlier (before even reading the paper beyond the abstract).

Overall there's neither anything novel nor anything particularly exciting about this paper, but if you ignore all the self-congratulations it might work as a decent intro to the subject? Except maybe not since there are a lot of things it fails to mention (the ping-pong problem, deduplicated tries, the approach of using a separate set for sparse values in the same range, the "overall sparse but locally semi-dense" representation that uses fixed-size single-word bitsets, ...)

  • Drup a year ago

    You seem well versed into that corner. Do you have a good (and reasonably complete) introduction/exploration for these memory-efficient data-structure for computation ?

    I've been working on memory representation of algebraic data types quite a bit, and I've always wondered if we could combine them with succinct data-structures.

softwaredoug a year ago

Roaring bitmaps are really useful for doing phrase search in search engines.

Basically you can find cases where one term is immediately before another by left shifting the right terms roaring encoded positions in all documents and bitwise-anding the similarly roaring-encoded payload of the preceding term. All with a highly compressed representation of term positions.

With something like numpy you can do this in a handful of logical operations in python.

https://softwaredoug.com/blog/2024/01/21/search-array-phrase...

pncnmnp a year ago

Hey everyone, if you're looking for a more approachable guide on bitmap compression, I wrote a blog post on it this year: https://pncnmnp.github.io/blogs/roaring-bitmaps.html. It covers run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.

  • skybrian a year ago

    That’s a good read, thanks! The history is interesting, but in practice, I’m wondering if there’s any reason not to skip it and just learn about roaring bitmaps?

rurban a year ago

I found them consistently slower in unary unicode tables, with sizes from 1.000 to 30.000. Binary search and hashtables were both better.

The reason might be that ordered search tables and perfect hashtables can be dumped to C code statically, roaring not, so there is the dynamic alloc and deserialization overhead.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection