FeatureBase: Open-Source, Real-Time Database Built on Roaring Bitmaps

37 points by clowen 3 years ago · 13 comments (12 loaded)

Reader

JAA1337 3 years ago

I've seen bitmaps mentioned a number of times lately. I must admit it is not something I am all that familiar with. Can someone explain to me why bitmaps are more valuable than standard column oriented databases?

I havn't wrapped my head around how this helps speed up queries while data is being ingested.

kurosknight 3 years ago

They are useful for categorical variables. For example, is a record in the "Likes motorcycles" category? They are fast because (well, one reason) bitwise logical operations are very fast for CPUs to do.
Adtech is an example of a sector that benefits from this...they slice and dice datasets a lot to target ad campaigns and such. Being able to do that quickly is useful.
- JAA1337 3 years ago
  
  So are you saying that the data is stored in categories which allows for those types of lookups to run faster? Do you have specifics on how the design of a bitmap based database achieves this? How does it maintain these relationships? Just through 0 and 1's?
  I guess it's easy for me to visualize both row and column based. Im struggling with the bitmaps concept.
  - kordlessagain 3 years ago
    
    Here's a good write up on some of what you are asking in their blog: https://www.featurebase.com/blog/bitmaps-making-real-time-an...
- witchbane 3 years ago
  
  So is the thinking you run them alongside a traditional RDBMS as sort of a cache or view optimized for bitwise operations?

seebs 3 years ago

I'm super hyped about this, I've been working on this for the last couple-few years and I'm optimistic about the return to being primarily an open-source thing.

wentejing 3 years ago

Why is the bigmap database faster than other distributed database? and what's the differences?

kordlessagain 3 years ago

Instead of storing values, like "dog", "cat", or "mouse" it stores (in this example) three binary numbers:
000 - whatever needs to associate with animals, but has no associations currently
001 - whatever it is is associated with having a "mouse" included
111 - whatever it is is associated with having a "dog", a "cat" and a "mouse" included
In the past, high cardinality data sets weren't good for storing in binary form, or a binary index, but nowadays there are ways around this. So, that list of animals could be quite large.
The primary reason it's so much faster is that many CPUs nowadays can do 10s of lookups in a single instruction cycle. That makes them extremely fast.

elina123 3 years ago

any ideas on real life Use Cases?

clowenOP 3 years ago

I am planning to use it in a project to make the new Congressional API data more approachable for people.
https://api.congress.gov/#/
Hopefully make it easy to find all the bills that your specific congress person was involved in for example.
kordlessagain 3 years ago

From what I understand, machine learning models may use this in ETL pipelines as well as serving as part of the models themselves. There's an article on that here: https://medium.com/analytics-and-data/overview-of-the-differ...
FeatureBase could be the "feature store" in the middle of the batch prediction section's diagram, or simply be a drop-in replacement for the model's registry.
jaffee 3 years ago

many! It was originally developed for marketing use cases- helping marketers understand up-to-date use her behavior and find interesting segments.
But really it's useful anytime you need low latency analytics on fresh data.

Settings

FeatureBase: Open-Source, Real-Time Database Built on Roaring Bitmaps

Keyboard Shortcuts