Settings

Theme

Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB

tokutek.com

37 points by zardosht 12 years ago · 18 comments

Reader

justin66 12 years ago

Does Tokutek intend to patent this consensus algorithm? If there's any possibility at all - I'm asking because of your track record - we're legally better off not reading about it at all, which is sad.

  • leif 12 years ago

    It won't be patented. We hope others find it helpful and consider implementing it in similar systems.

fintler 12 years ago

Is there a TLA+ or Coq model available for this?

  • leif 12 years ago

    No, none of us know how to use those. :(

    If you're interested in building one and you have experience with them, get in touch and we can work through it together. I think the biggest challenge would be modeling the semantics of write concern, but I'm not that familiar with proof assistants, maybe that isn't too hard.

    • lumpypua 12 years ago

      It looks like you've read the Call Me Maybe series of posts over at aphyr.com. He tests a number of distributed systems (Mongo, Riak, Cassandra, etc) and their behavior under network partitions and almost all of them fuck up and lose data. A summary of results can be found at [1].

      Amazon has used a TLA+ model for their distributed systems and found a bunch of bugs [2].

      Seriously, everybody fucks this up. Please please learn a model checker and check your algorithm.

      [1] In the "Summary of Jepsen Test Results" section: http://blog.foundationdb.com/call-me-maybe-foundationdb-vs-j...

      [2] https://research.microsoft.com/en-us/um/people/lamport/tla/a...

      • leif 12 years ago

        We are working on several ways to prove to ourselves and the community that what we have is correct. The most important is obviously testing. We have tests that demonstrate the problems we found, and Ark passes those tests. Publishing this explanation and soliciting feedback is another.

        We have run Jepsen and have not been able to get it to show data loss in TokuMX. The problems it found in MongoDB were already fixed in other ways in earlier versions of TokuMX, but we're trying to get Jepsen to demonstrate the other problems we've found.

        Model checking may be another way we can prove correctness, but since Ark is so similar to Raft, I think the Raft model in TLA+[1] is probably sufficient. Anyway, we'd also need a proof that the model is equivalent to the implementation, and I don't know of a way to do that, so I think functional tests are more important.

        In any case, we'll look in to using a model checker, and any help would be greatly appreciated. If you're interested, feel free to email me.

        [1]: https://ramcloud.stanford.edu/~ongaro/raft.tla

        • sseveran 12 years ago

          If you don't know how to use a model checker you have no business creating a distributed algorithm. Either implement someone else's or do it right. The world has enough broken distributed algorithms and we really don't need any more. Either you are using Raft and you can depend on its proofs and just do the work to ensure that the model is implemented correctly or you are using something different and you need to write the proofs and models.

          The casual attitude that most people show when building these algorithms mind blowing.

    • sseveran 12 years ago

      If you have not used a model checker you don't have a proof. Please don't say that you have one. You are just hoping for the best.

      See this to get yourself started: http://research.microsoft.com/en-us/um/people/lamport/tla/by...

      • aurelius 12 years ago

        If you have used a model checker, you probably don't have a proof either. You have a model that might not be accurate, and testing all its inputs may be combinatorially prohibitive.

        • sseveran 12 years ago

          thats not the real issue. The issue with model checkers is verifying that production code actually exactly implements the model.

          • aurelius 12 years ago

            Yes, that is <i>another</i> issue with model checkers. Either way, your original comment is still nonsense.

yid 12 years ago

Interesting -- sounds like once implemented, you could leverage the consensus algorithm to implement atomic multi-document/multi-table transactions too?

  • leif 12 years ago

    Multi-document and multi-collection transactions are already a part of TokuMX[1]. Since commit of the oplog insert is atomic together with the actual operation's changes to documents, atomicity is also guaranteed in replication. Atomicity and MVCC in a sharded system is something we're working on, but it's unrelated to Raft/Ark.

    Ark is just about making replication as a whole trustworthy. The jepsen post on MongoDB[2] shows MongoDB losing data even with majority write concern, which if used properly, is supposed to make MongoDB a CP system. But because of the design flaws in the election algorithm, you can't rely on it perfectly. The changes we made in Ark fix the election algorithm to make majority write concern actually able to guarantee data safety, so you can treat it as a fully CP system.

    [1]: http://docs.tokutek.com/tokumx/tokumx-transactions.html

    [2]: http://aphyr.com/posts/284-call-me-maybe-mongodb

    • lumpypua 12 years ago

      Ark is just about making replication as a whole trustworthy.

      Then you need a formal model.

yourad_io 12 years ago

I believe this is officially one name too many matching /ar[ck]/

http://en.wikipedia.org/wiki/Ark#Technology

http://en.wikipedia.org/wiki/Arc#Computing_and_gaming

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection