Settings

Theme

How to write a Bloom filter in C++

blog.michaelschmatz.com

130 points by schmatz 10 years ago · 20 comments

Reader

nly 10 years ago

I feel the use of vector<bool> is an iffy choice.

The Bitcoin codebase has a simple Bloom filter implementation you can take a look at that has been in use for some time

https://github.com/bitcoin/bitcoin/blob/master/src/bloom.h

https://github.com/bitcoin/bitcoin/blob/master/src/bloom.cpp

  • mavam 10 years ago

    What's wrong with vector<bool> (other than its name)? The interface is exactly what you need to implement a bit-level abstraction in a language where this isn't a first-class primitive.

  • bmohlenhoff 10 years ago

    std::bitset is also a good choice if the number of desired bits is known at compile time

    • schmatzOP 10 years ago

      I think this would probably be the best choice after templating the filter

nate_martin 10 years ago

This example works well for raw data but not for complex types. You could make the filter a template, taking the key and a "hasher" function as template args.

  • mavam 10 years ago

    Even better: N3980 [1]

    This proposal decouples the implementation of hash functions from how types get hashed.

    [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n398...

  • schmatzOP 10 years ago

    Great suggestion; I wasn't sure the idiomatic way to template this, thanks for letting me know!

    • nate_martin 10 years ago

      Probably something like this:

      template< class Key, class Hash = std::hash<Key> > class BloomFilter;

      • bradleyjg 10 years ago

        I don't use c++ so I'm not sure how std:hash works or gets implemented, but the way that guava (Google's java library) does it is by passing in a key and a funnel object. The funnel object is essentially responsible for decomposing the object into a byte stream. The advantage of doing it this way rather than making the caller specify his own hash is that you can use murmurhash3 which you thought had the best properties for the bloom filter.

      • schmatzOP 10 years ago

        I updated the blog post with your suggestion; CDN should be updated soon :)

sokoloff 10 years ago

Learned something today. Thanks for the article.

Minor nit: it will save readers time if you call out that "p is the false positive error rate". (You reference the error rate, but don't attach a variable name to it.) I had to go to an external reference to figure that out, which meant I learned something else of course.

barsonme 10 years ago

Or in C[0] or in Go[1]...

[0] - https://github.com/EricLagergren/bloom [1] - https://github.com/EricLagergren/bloom-c

j_s 10 years ago

Is there a standard implementation "everybody uses"? Bloomd seems popular as a bloom filter server.

https://github.com/armon/bloomd

m00dy 10 years ago

I have also experiments about Bloom Filter in python https://github.com/erenyagdiran/BloomFilter

nvcken 10 years ago

How is your laptop spec ? please

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection