Bitmapist: We built an open-source cohorts analytics tool that saved millions
doist.dev52 points by amix 21 hours ago
52 points by amix 21 hours ago
User behavior analytics have created some interesting specialized data systems.
It's interesting that the authors chose to use Redis but how does it scale with a lot of events?
A few other interesting projects from the past that either have to do with user behavior analytics or using bitmaps.
TrailDB: https://github.com/traildb/traildb old but still a fascinating project in my opinion. Not related to bitmaps but they've done some very clever things on the storage level to compress and query the events in a way that fits well this particular workload.
FeatureBase: https://github.com/FeatureBaseDB/featurebase this one was built on top of bitmaps but they didn't market it as a solution specifically for behavioral analytics, although I'm sure it was used for that.
Of course there are the Mixpanels and Amplitudes of the work too that they had to build specialized storage and query engines for this particular workload.
Regardless of how fascinating I find these systems though as specialized compute engines for a specific workload, it seems that the use case itself is not lucrative enough to sustain companies built around them. I'm not sure why is that but it's interesting to see companies building infrastructure for this particular case on top of Spark for example which ends up being so painful in the mid-long run.
(edit: apparently they answered my question already while I was writing this!)
Really cool article, we might give it a try at our company. Any recommendations regarding common pitfalls when working with Bitmapist?
Thank you! Starting with the Redis server is a solid choice. However, if you're aiming to support a large user base or handle many events (like billions), I highly recommend considering the setup of bitmapist-server, our Go server. You can find it here: https://github.com/Doist/bitmapist-server — it will do a 400x+ reduction in memory used.