TokuMX 1.4: Major improvements to MongoDB sharding and replication

20 points by zardosht 12 years ago · 19 comments

Reader

Why doesn't 10gen buy them?

leif 12 years ago

From where I'm standing it appears that 10gen is still iterating fast on features in MongoDB. They're trying to figure out text search, they're expanding the aggregation framework rapidly (which looks really interesting), they're adding security features like auditing, etc. etc.
They have some unfortunate behaviors that are all attributable to their storage implementation, most prominently locking, fragmentation, and slow performance out of memory. While they could "just buy TokuMX" and solve these problems with money, it would then put their engineering team in a position where they would need to relearn a big portion of their codebase, and spend time backporting features they've prototyped to TokuMX. It would basically halt new development for a few months while they learn the new code, too.
The way I see it, MongoDB will continue prototyping interesting features and polishing some of their existing ones, and TokuMX will incorporate the ones with the most promise. But to integrate the codebases would slow down MongoDB considerably, and I don't think they can afford that right now. I'm perfectly happy to sit back and merge the best features from MongoDB as they mature.
Put another way, if you were working on a product and someone came to you and said "here let me fix a bunch of things by replacing some of the fundamental subsystems with code you don't know," would you do it? Maybe if you were in more of a maintenance mode, you'd evaluate it for a while and take the time to learn the code and eventually incorporate it, but not if it was going to distract you from adding features.
nasalgoat 12 years ago

Based on my conversations with their CTO, 10gen (nee MongoDB Inc.) is philosophically against what Toku is up to in terms of indexing and optimization.
They're trying to generalize while Toku aims at very specific query optimization.
- esmet 12 years ago
  
  Interesting perspective, but it turns out that the opposite is true. Using better indexing is a general improvement to database performance and manageability.
  - nasalgoat 12 years ago
    
    His argument was that the indexing they do only improves a specific type of query. Personally I think he's wrong but that was his view a year or so ago.
    
    esmet 12 years ago
    
    I see. To clarify, TokuMX's indexing technology doesn't try to improve specific queries or patterns - it simply makes general index maintenance significantly cheaper and less space intensive, so your application can define the indexes it needs, not just the ones it can afford.

Any way to have wget-able download links for the .deb? Using a browser to go the dl page isn't that easy when on a server...

leif 12 years ago

This isn't a great answer, but I think I owe you an honest one. Our marketing department wants to be able to throw the "please put your email here if you want" form up before a download. We are trying to find a way to reconcile our sales/marketing goals with what we know are our fellow engineers' needs and we hope to make downloads easier in the future. Providing packages at all, over just a single binary tarball, is a step in the right direction, I think.
For now the best I can do for you is tell you that if you email me I can hook you up. Short of that, if you search twitter for "severalnines wget" you can find a wget hack that achieves the result you want.
- fellars 12 years ago
  
  thank you for your honest answer. Some food for thought for your marketing department: I'm interested in your product, but because I can't easily incorporate it into a puppet script to install into my virtualbox dev environment like I can with MongoDB, I'm probably gonna pass on it for now.

jontobs 12 years ago

Great Stuff! Compression and document level locking are awesome! New features = GRAVY!

ddorian43 12 years ago

Now they only need to set the sharding rethinkdb-style and they win.

leif 12 years ago

It's unclear exactly what you mean by "rethinkdb-style" because that could mean a number of things, but stay tuned for our posts on this next week, I think you'll be pleasantly surprised.
- ddorian43 12 years ago
  
  also what would be really cool for very-big-data + ~bigger latency is ~index compression.
  Bascially Hypertable (based on Bigtable) compresses data in blocks, but in the index saves only the ids of the first and last documents in the block. This could be hard for secondary indexes (maybe?)?
  - leif 12 years ago
    
    All TokuMX data and index storage is block compressed. It's not hard, it's on by default.
- ddorian43 12 years ago
  
  check my reply to zardosht below/above
zardoshtOP 12 years ago

ddorian, Can you elaborate what that means?
- ddorian43 12 years ago
  
  What i mean, every node is the same, no mongos , you just connect to one random mongod and it handles the mongos funcionality.
  So if you grow, you add 1 node, not a replica-set(that could be 3 nodes if you have 3x replication)
  - leif 12 years ago
    
    Unfortunately, this would break compatibility with existing MongoDB applications more than we would probably be willing to do. However, there's no reason RethinkDB couldn't use Fractal Tree indexing instead of B-trees, given some engineering effort.
    
    ddorian43 12 years ago
    
    but rethinkdb doesn't have range-sharding (they had it but they are/did change it to random(id), also no sharding(custom_field))

Settings

TokuMX 1.4: Major improvements to MongoDB sharding and replication

Keyboard Shortcuts