Benchmarking BDB, CDB and Tokyo Cabinet on large datasets

41 points by anuraggoel 17 years ago · 23 comments

Reader

vicaya 17 years ago

Sorry, but 500MB DB size is a tiny dataset these days (anything < 1GB is tiny, < 4GB is small, < RAM on a single node (~8GB-64GB) is medium, < Disks on a single node (~128GB to a few TB) is large, huge dataset requires multiple nodes and typically above 128TBs.)

psadauskas 17 years ago

I agree, plus TC has a ton of parameters that can be tweaked, and the defaults are pretty small. The one that has the most pronouced effect is the bucket size, or the "width" of the hash table. The bigger, the less chance of collisions, which means you have to follow a linked-list to find the exact record. He used 11M keys, so a bnum in the range of 40M would be much quicker.
I benchmarked TC b+tree on a 1TB db with ~350M keys, and it worked great. I would publish the numbers, but I'm embarrassed that they aren't very rigorous.
cdb docs say it has a limit of 4GB, which makes it pretty much worthless for anything I would use it for.
- rcoder 17 years ago
  
  Actually, the core CDB data structure only limits keys and values to 4GB each, not in total:
  http://www.unixuser.org/~euske/doc/cdbinternals/index.html
  The hash algorithm used also only produces a 32-bit key, meaning you'll be limited to 2^32 total records. Again, though, unless your data is of trivial size, that gives you considerably more room to work with than a hard 4GB limit.
  Edit: doh! can't use double-asterisk for exponent on HN
  - epi0Bauqu 17 years ago
    
    Then why does the Perl implementation (http://search.cpan.org/~msergeant/CDB_File-0.96/CDB_File.pm) warn of the error:
    CDB database too large -- You attempted to create a cdb file larger than 4 gigabytes.
    ?
    
    asb 17 years ago
    
    I believe rcoder is saying the 4GB database limit is a property of the implementation rather than the cdb format.
- jhy 17 years ago
  
  Do you have a pointer to any more info on tuning TC? (Apart from the API guide)
- zandorg 17 years ago
  
  I guess someone could tweak cdb to be >4GB, but these C libraries are always insanely convoluted.
  - rcoder 17 years ago
    
    > ...these C libraries are always insanely convoluted...
    If you haven't actually looked at the code, you might want to avoid making such a overly-general statement. CDB is a very simple data structure (basically a two-stage hash table) serialized to disk in a format that makes lookup fast. You can check the link I posted above to see a simple explanation of the format, and this page to see examples for usage (from an API-compatible reimplementation):
    http://www.corpit.ru/mjt/tinycdb.html
    Since the core algorithm is so simple, creating a 64-bit version should be similarly easy, at least on a UNIX-like system (trying to run code designed by Dan Bernstein on a non-UNIX system would be...interesting).
    
    zandorg 17 years ago
    
    Sorry, yes, I didn't look at the code. I was thinking of early Berkeley DB code.

asb 17 years ago

It would be nice if DJB would release CDB under the public domain like he has now done with most of his other software. However, there's a great public domain implementation by Michael Tokarev I'd recommend: http://www.corpit.ru/mjt/tinycdb.html

zandorg 17 years ago
A caveat to TinyCDB in Windows:
Don't forget (In Windows) to specify O_BINARY eg:
```
  O_RDONLY |O_BINARY
```
Also the case in Berkeley DB 1.86.
Otherwise it opens a file in ASCII mode, when Windows writes \r\n when it reads in \n... , and dies after just a few records.
- asb 17 years ago
  
  I have a Lua binding to tinycdb. I hadn't considered Windows compatibility and this bug exists in my code. I've just committed a fix, thanks.
kree10 17 years ago

The first lines of cdb.c and cdb.h read "/* Public domain */". I would expect DJB to spell this out more explicitly if he releases an updated cdb package. I say "if" because the last release was Feb 2000.
- asb 17 years ago
  
  I hadn't noticed that. Wikipedia explains the licensing in more details (http://en.wikipedia.org/wiki/Constant_Data_Base#Library) - the library code is public domain (and included in the public domain djbdns), but the rest of the package is under djb's annoying "You may distribute unmodified copies" license.

smanek 17 years ago

In my experience, the downside of BDB is simply the license. There is no way for me to use it in my program (and distribute my program) without opening up my source. I can't simply 'query' a BDB server - by using BDB at all I have to link to it and also open source my own code as part of the Sleepycat License (or so I've been told - IANAL).

zandorg 17 years ago

DB up to and including 1.86 are Berkeley license (basically 'do anything with this but include this notice' but without open source requirements), but DB up to and including 1.86 only allow < 2GB databases. :-(
I tried to upgrade it with 64-bit integers, but it's such convoluted C code I can't make head nor tail of it.
Also try buying BDB ($20,000). I emailed them asking for a discount, but Oracle (who owns them now) never replied. :-(
- jrockway 17 years ago
  
  If by convoluted, you mean "half the source code is detailed comments", then sure...
jrockway 17 years ago

You can just talk to BDB (running in a separate process) over an RPC server. Yes, you have to write that server... but it would be trivial to do, and certainly less costly than licensing BDB.

rcoder 17 years ago

I'm a bit surprised at the poor showing of Tokyo Cabinet in this test, given the amount of buzz it's gathered in the last few months, but the CDB results honestly don't surprise me in the least -- it's a special-purpose datastructure wrapped in a thin library that just smokes any other DB type if your workload allows for a complete rebuild of the database on each update.

henryl 17 years ago

As someone mentioned up above, the author probably didn't tune the bucket number appropriately.
kvs 17 years ago

I am a bit confused by the table. "bdb_btree 14.3/s " means 14.3 records per second? Is that the correct interpretation?

antirez 17 years ago

It's a shame there is not Redis among the tested KV DBs. I guess the networking layer is not an option in this specific use case.

Settings

Benchmarking BDB, CDB and Tokyo Cabinet on large datasets

Keyboard Shortcuts