Benchmarking BDB, CDB and Tokyo Cabinet on large datasets

4 min read Original article ↗

At my job we have need of a high-performance hash lookup database in our antispam product. It's used to store Bayes tokens for quick lookups on individual scanning systems, and is read-only in the fast path (mail scanning) with updates taking place in another process. For the last few years, we've been using a plain old BerkeleyDB hash database via Perl's DB_File, but with all the hype about Tokyo Cabinet and its benchmark results I figured it was time to take a look.

The first thing I did was to run the TC benchmark suite on my system, and wow, it was fast. Just like it says on the box, 100% as-advertised. Next, I hacked up a quick script to import our Roaring Penguin Training Network data -- approximately 11M keys, as variable-length character strings, pointing to small bits of data for each. Unfortunately, TC didn't do well. After about a half hour, I was still only halfway through the data import, and each successive addition of 100k keys was taking noticeably longer as time went on. A few minutes of poking around on various forums and reading docs led me to some tuning parameters that got me down to a database creation time of about 30 minutes. Much better than before, but it still paled in comparison to the BerkeleyDB hash database creation time of about five minutes.

Some of those forum posts about tuning mentioned CDB as another alternative. CDB is a constant database (ie: non-updateable) so it's not really comparable to TC or BDB in the general case, but for our workload it would fit. So... with three databases to choose from, it's time for some comparison benchmarks.

The Setup

The benchmarking tool was written in Perl, using DB_File, CDB_File, and the TokyoCabinet module. The data for benchmarking was the uncompressed RPTN data from April 24, comprised of 11004950 keys pointing to an encoded string containing spam, ham, and generation counts. Writes were performed to a separate ext2 partition to avoid any journalling overhead.

The following tuning parameters were used on creation:

  • BerkeleyDB Hash

    cachesize of 96MB (what we currently use in production)
    
  • BerkeleyDB BTree

    cachesize of 96MB (I was lazy and didn't want to tune it, so it's copied from the Hash testcase)
    
  • CDB

    No tuning
    
  • Tokyo Cabinet Hash

    xmsiz value of 750MB (necessary to make it complete in under an hour)
    cache value of 12000000 (same)
    

No tuning was performed for reads.

For the read test, 1024 random words were selected from /usr/share/dict/words, and looked up in each hash.

The results

  • Creation times:

    Benchmark: timing 1 iterations of bdb_btree, bdb_hash, cdb_hash, tc_hash...
     bdb_btree: 3394 wallclock secs (410.94 usr + 322.49 sys = 733.43 CPU) @  0.00/s (n=1)
       tc_hash: 2038 wallclock secs (185.55 usr + 82.70 sys = 268.25 CPU) @  0.00/s (n=1)
      bdb_hash: 289 wallclock secs (258.62 usr + 14.97 sys = 273.59 CPU) @  0.00/s (n=1)
      cdb_hash: 101 wallclock secs (91.68 usr +  5.58 sys = 97.26 CPU) @  0.01/s (n=1)
              s/iter bdb_btree  bdb_hash   tc_hash  cdb_hash
    bdb_btree    733        --      -63%      -63%      -87%
    bdb_hash     274      168%        --       -2%      -64%
    tc_hash      268      173%        2%        --      -64%
    cdb_hash    97.3      654%      181%      176%        --
    
  • File sizes:

    cdb_hash  file is 535773153 bytes
    tc_hash   file is 587887360 bytes
    bdb_btree file is 610369536 bytes
    bdb_hash  file is 671383552 bytes
    
  • Random reads:

    Benchmark: timing 2000 iterations of bdb_btree, bdb_hash, cdb_hash, tc_hash...
     bdb_btree: 156 wallclock secs (38.48 usr + 101.73 sys = 140.21 CPU) @ 14.26/s (n=2000)
      bdb_hash: 125 wallclock secs (31.76 usr + 78.93 sys = 110.69 CPU) @ 18.07/s (n=2000)
       tc_hash: 37 wallclock secs (26.86 usr + 10.40 sys = 37.26 CPU) @ 53.68/s (n=2000)
      cdb_hash: 21 wallclock secs (12.06 usr +  8.78 sys = 20.84 CPU) @ 95.97/s (n=2000)
                Rate bdb_btree  bdb_hash   tc_hash  cdb_hash
    bdb_btree 14.3/s        --      -21%      -73%      -85%
    bdb_hash  18.1/s       27%        --      -66%      -81%
    tc_hash   53.7/s      276%      197%        --      -44%
    cdb_hash  96.0/s      573%      431%       79%        --
    

As can be seen from those results, CDB kills all comers in this simulation of our normal workload. Perhaps there are ways to tune Tokyo Cabinet to perform better on large data sets?

Based on this, I'll be running more realistic tests on CDB to see if we can get those sorts of numbers in production.