Settings

Theme

OT and CRDT trade-offs for Real-Time collaboration

tiny.cloud

178 points by andfrob 6 years ago · 59 comments

Reader

migueloller 6 years ago

I’ve been doing a pretty deep dive on CRDTs and OTs to get the right user experience for our collaborative website builder[1]. I’m still not done with the implementation and subsequent testing (i.e., splitting text nodes, as mentioned in the post). But the core algorithm behind Automerge[2], formalized in the OpSets paper[3] is extremely promising. A good example of its power is the ability to do an atomic tree “move”, whilst preserving strong eventual consistency.

I'm so glad to see that the HN community cares about this problem space. Maybe I should finally write that blog post on CRDTs and how to implement them yourself following the OpSets spec.

[1] https://www.makeswift.com

[2] https://www.github.com/automerge/automerge

[3] https://arxiv.org/abs/1805.04263

  • sipjca 6 years ago

    I think that blog post would be a great idea, would love to read it

    • migueloller 6 years ago

      Really? I’ll get something out this week. My Twitter handle is in my profile. I’ll probably tweet about it when I write it.

phsource 6 years ago

Wanderlog (https://wanderlog.com) is a Google Docs for planning travel, and naturally, we had to figure this out early in the process.

If you're on a Node.js/React stack, we highly recommend using the combination of OT-JSON0 [1] and ShareDB [2], two excellent libraries. OT-JSON0 lets you perform operational transforms on any JSON-serializable structure pretty intuitively, and ShareDB handles synchronizing it between multiple clients and resolving conflicts. Building your own is going to be a pain, so unless your core business is a text editor like TinyMCE, it's probably best to stand on the shoulders of giants.

Another approach that we didn't end up using is Figma's: especially if you're working on anything other than text editing, ask if you actually want to use operational transform. Something simpler might be good enough -- in Figma's case, they use "last write wins" because they were able to break the document into small enough, atomic elements. [3]

[1] https://www.npmjs.com/package/ot-json0

[2] https://www.npmjs.com/package/sharedb

[3] https://www.figma.com/blog/how-figmas-multiplayer-technology...

  • migueloller 6 years ago

    We’re actually using ShareDB at Makeswift right now and for us it has become a horrible burden. There are very subtle bugs on their data persistence layer with concurrency and the API is a bit of a mess to work with in my opinion. I’m glad to see other people are having success with it, though.

    • spyder81 6 years ago

      I've been given advice that ShareDB is "pretty antiquated and not well supported". I'm told it is not difficult to rewrite in modern NodeJS, replacing things like callbacks and event emitters with async/await produces a much simpler and more reliable server.

    • phsource 6 years ago

      Ah that's unfortunate; we did find it a bit weird that you have to think of all changes as `ops`, but otherwise, I'm curious as to what you've had issues with.

      We also have a custom back-end that's MySQL that may have changed how it work a bit

  • alangibson 6 years ago

    I came to the same "LWW on fragments" solution for peraspera.io. In a lot of cases, all OT gets you is being able to watch someone else's cursor move around. The new wore off on that a long time ago.

  • spyder81 6 years ago

    I'll cover this in the second post but we will be using ShareDB for our initial launch (although probably building our own once we hit production release).

  • spyder81 6 years ago

    atomic elements doesn't really extend to arbitrary HTML, which is fine for the simpler editing solutions but not TinyMCE.

    To do something like that in arbitrary HTML you wind up locking at the top-level-block boundaries. Locking sounds great as an engineer but it leads to a garbage user experience.

raphlinus 6 years ago

There's one thing I'd add here, as advice to potential implementers of collaborative editing algorithms. Test the system rigorously, autogenerating concurrent workloads of all the operations (including undo), and also various delays. If this had been done early on, all the broken algorithms (some published with formal proofs of correctness!) would have been caught. A lot of the broken algorithms work under light loads - TP2 requires at least three concurrent edits, two inserts and one delete, to trigger.

  • spyder81 6 years ago

    We're using property-based testing where we can, which I hope to cover in a future post

gvkhna 6 years ago

At Mixcut.com (Building real time collaborative video editing) we use a combination of OTs and a variant/simpler version of CRDTs that provide the most coverage of the various features needed. The usage of low latency websockets to shuttle OTs and building completely around OTs is key.

It’s working great for us in beta, we’ve recently shipped multiplayer to our users and overall it’s very stable. With a lot more in store soon, please check it out!

phedkvist 6 years ago

Thanks for linking to my blog post :)

"This is the exact "split node" scenario I described earlier; applying bold near the start of a text node necessarily has to split the text node into three parts (before, bold, and after)."

Not sure why you split it up into sections. In my CRDT implementation, I would add meta-data to each character, with the boolean property which is either bold or not. It's certainly cumbersome to keep the cursor at the right place when inserts are being made, but it's doable. https://pierrehedkvist.com/posts/1-creating-a-collaborative-...

I personally never understood how OT actually works, clearly, Google Docs and others find it useful. But to me, CRDT has more solid proof and reasoning behind it, and it is easier to comprehend. https://medium.com/@pierrehedkvist/creating-a-collaborative-...

  • spyder81 6 years ago

    I definitely read a few of your posts during my research, thanks for your hard work :)

    The best CRDTs do treat each character separately (as I think I mentioned at one point?). But not all of them - the video and my example is an example of one that doesn't.

    It's also very difficult to support arbitrary nested trees with a per-character data type, which leads to the compromises I talked about.

strictfp 6 years ago

Is the article suggesting to apply a string CRDT to an entire JSON structure? Are people doing that?

At the very least, I would expect different JSON data types to use different CRDTs. For a collaborative model, I would expect the semantics of the data model to be reflected in the choice of CRDTs for different substructures in the JSON.

  • ergl 6 years ago

    No, the state of the art CRDT solution for json is something like Automerge (https://github.com/automerge/automerge), that treats the document like a collection/combination of CRDTs of different types. (the author doesn't mention it explicitly, but it's linked at the end)

    • codebeaker 6 years ago

      I have been using Automerge recently for a project, and I have found it to be very, very user-friendly.

      Our use-case is offline-editing of documents with an eventual sync-with-yourself-online. It's mostly there as a sync tool, not as a p2p colalb editing. Unless you count yourself as a peer, I guess!

      We store a document in local storage which is the result of `Automerge.save(automergedoc) => serializable string`. That can be loaded with `Automerge.load(s) => doc`.

      "Changes" are a first-class concept in automerge (makes sense, I guess) so you can do

         before = Automerge.from({..some doc..})
         after = Automerge.change(before, "optional commit msg", (doc) => doc.hello = "world);
      
      With that done, you can get a diff, `Automerge.getChanges(before, after) => [changes]`.

      Those `[changes]` are what we try and send to a server and keep them in a `Set()` for each document (background sync, service worker, etc make a nice experience.

      This is all wrapped up in Redux, and a middleware that captures the changes, and a reduxStore subscriber that puts the document back into local storage after any changes.

      It is for the application programmer virtually transparent, and so far we've yet to find a fault with it.

      I'm sure I'll come to regret making "changes" my principle data type, as the server implementation just sees serialized blobs of data, and has no concept of the structure of the document. That's solveable if I need it ever, because there's a protcol compatible implementation in Rust, and I could always use the JavaScript one on the server too, just haven't needed yet.

      The serialization is also quite large, in the order of kilobytes for what are actually quite small docs, but they do contain the original commit messages too, and the Automerge team is planning to release an optimized serialization format early this year, which they say will be especially useful for fields with lots of changes, such as text fields, which currently pay a heavy toll in terms of tracking `[changes]`. Fortunately that also doesn't affect me.

      I can also +1 some of the advice to keep your whole document in a CRDT. We use React+Redux, but since React Hooks made stateless functional components a reality, we felt it was appropriate to use React exclusively for the UI state, and track only document states in Redux reducers. It feels light-weight and sustainable, although we're only about a month and 5k LOC into the project. Time will tell I guess.

      • ergl 6 years ago

        Yeah, personally the feature I'm looking forward to is materializing "changes" into a document without losing editing history.

        At some point your set of changes grows too large or you want to trim it to, say, a few weeks or months. Afaik that isn't possible yet without losing the ability to merging later on.

        • eudoxus 6 years ago

          I'm farmilliar with Automerge, but not its specific features on state compression. In general change set compression is possible with CRDTs but there are some "gotchas" you need to look out for. Most notably, all peers interested in a changeset, have to be completely in sync at the moment of compression, otherwise, depending on which underlying CRDT type you use, there can end up with duplicate entries in for example a Set type. Depending on the use case, 100% peer sync is possible, in others its more challenging.

      • spyder81 6 years ago

        I'm a big fan of Automerge, I look forward to where the research takes it!

jiofih 6 years ago

So.. we’re left with no real explanation to why OT is better than CRDT other than a 5s video showing a poorly managed text node update that moves the cursor. I feel tricked.

  • slau 6 years ago

    Really?

    > In OT, every user action is broken down into one or more operations. These operations are transmitted between clients along with their baseline reference; if two users perform actions at the same time, incoming operations must be transformed to include the local operations that have happened since that baseline. They are then applied locally and form the new baseline.

    > This constant transformation of operations turned out to have too many edge cases where clients were found to not produce the same baseline (the "wrong" papers above). When that happens, the clients will never converge on the same result and break the fundamental assumption of collaboration.

    And that’s exactly right. It’s fairly easy to have an OT system that is very vulnerable, because clients can cheaply generate change sets that are extremely computationally expensive on the server side. I’ve seen a system where a single mobile phone could, in a few seconds, lock up the synchronisation server for days.

    • jiofih 6 years ago

      But the article is arguing in favor of OT, not CRDT. A couple examples of those edge cases would go a long way, the one given is not very helpful to make a point.

      • Vinnl 6 years ago

        What's wrong with the one given? If my cursor moves because some other user makes some other text bold, that doesn't sound like a desirable experience?

        • jiofih 6 years ago

          Because it doesn’t highlight a particularly relevant deficiency of the CRDT protocol. Preserving cursor position is not hard in that case. The post even mentions stronger downsides of OT, the example does not support the conclusion. Why is OT better?

          • Vinnl 6 years ago

            Huh, the article gave me the impression that preserving the position was a crucial problem there.

            As for the last question, I think the later posts will get to that.

            • eudoxus 6 years ago

              Preserving the position of the cursor is a crucial feature for collab editors. However, he doesnt go into detail explaining why CRDTs make this feature impossible. Sure the CRDT route splits the text, but I fail to see why that makes cursor pos preservation impossible?

              • spyder81 6 years ago

                It's not impossible, particularly in cases where the model is fairly flat.

                In complex models it is symptomatic of a larger issue with complexity in CRDTs; the post was already quite long and I didn't want to repeat any more of the research I'd linked to.

                • jiofih 6 years ago

                  You mention that example as a turning point for choosing OT, confirming your research. But to an outsider it is not particularly convincing. Especially since you’d expect a CRDT implementation to use the rope/tombstone structure, applying the transform to that sequence of characters and not actually have to split a text node, which looks like a limitation of this particular implementation, or tight coupling between the data model and the DOM.

                  The post is a good introduction to OT and CRDT, but is very light on actual trade-offs and seems to end with a gut-feeling conclusion, when the title promised “trade-offs between”.

                  • spyder81 6 years ago

                    It's hard to offer more detail than that in a blog post. I'm not going to repeat the research papers or write one of my own. The post is already so long and complex, covering so many surface details, I couldn't reasonably push it any further.

                    • jiofih 6 years ago

                      I’d suggest skipping right to the meat of the subject (pros/cons in this case), assuming some previous knowledge, if that’s what the article proposes to share, and simply point readers to learning resources; it’s not possible to start from scratch on every topic. The deeper the subject, the more glaring this problem becomes.

  • spyder81 6 years ago

    I wasn't really trying to say OT is better than CRDT. Nor was I intending to provide a complete detailed explanation (there's enough research already).

    I was driving at CRDT not being capable enough for arbitrary HTML structures, and giving a quick summary of what was an extended multi-week research process.

lxpz 6 years ago

> the prospect of peer-to-peer editing with end-to-end encryption is an exciting one

Have you heard of CryptPad ? Not exactly peer-to-peer but has encryption so that the server sees nothing.

billconan 6 years ago

this is a wonderful writeup! thank you!

I'm researching the same thing.

Any tutorial on OT implementation? Especially, I need to handle structured data, like json.

fyp 6 years ago

The only thing I remember from the CKEditor post from last year was that it took "42 man-years" to implement: https://news.ycombinator.com/item?id=18220020

I wonder if it's better understood now and can cost less R&D time.

marknadal 6 years ago

w00h00, this is my area of expertise.

After 8 years of working on this, I have changed my thoughts:

- The correct algorithm is not always the correct user experience.

- End-to-end encryption is too important to not have.

- Offline support is great, but it behaving consistently is more important than it behaving "intently".

- Biggest pain points can most easily be solved at the editing layer, not data layer.

As a result, OT gets thrown out the door immediately.

I built the most popular CRDT powered database on the market (https://github.com/amark/gun, 11K+ stars, 11M+ monthly installs), but have some harsh words for CRDT obsessed people:

CRDTs are great, but if you create an infinitely complex one to handle "every word editing operation" it'll actually result in an inferior user experience.

As such, for example, GUN supports lock-free concurrent operations on any depth of data in a graph, but keeping text/strings behavior atomic is way more important than having built-in automatic string merges.

Why? It is much simpler to build a correct text/string CRDT on top of a predictable, stable, graph CRDT that is understandable.

Another example is, in our blog editing tools, despite having spent years researching & implementing (+ others in our community) precise character-by-character CRDT resolution schemes, I found I personally had a much better offline & local first user experience with a predictable per-paragraph sync scheme.

This is a stupid simple approach, but what it means is that I as a user, can easily predict whats going to happen, so if I'm offline I know I should just copy a new paragraph and fiddle with things there, cause it isn't gonna get "auto delete regrammar merged".

Generally speaking, me and colleagues don't "fight" over the same paragraph, we're usually concurrently writing different "sections" at the same time, it is pretty rare to grammar fix their edit same paragraph as they're typing it - that is also just kinda, rude looking.

Anyways, finally, is that the majority of text styling and DOM edge cases can be handled by having a deterministic canonical DOM hierarchy that always gets applied at the editing layer, BEFORE any CRDT operations even occur.

So for instance, re-arrange the DOM tree such that <i> is always inside <u> inside <b>, or something like that.

We built this into a library called normalize, and it instantly eliminated so much complexity, especially at the CRDT level. Ping me if you want a demo of this library.

Finally, for everyone else, we also built an INTERACTIVE CARTOON EXPLAINER of an extremely basic text CRDT to help others understand the more detailed concepts:

https://gun.eco/explainers/school/class.html

  • spyder81 6 years ago

    User experience is definitely why we didn't go for an existing CRDT solution.

    The model we've chosen is fairly agnostic to whether OT or CRDT is used to collaborate, as I'll expand on in the second post, so if we see an opportunity to switch to a CRDT-based solution we'll certainly jump on it!

    You only need to normalise things like bold/italic/underline when using a browser DOM, every collaborative rich text model I've seen represents formatting as unordered attributes which is much easier to collaborate on.

  • throwaway739103 6 years ago

    You are a charming and energized speaker and perhaps that is why Tim Draper thinks you a good investment as with others he has made. Acronyms like 'PTSD' and 'PARTY' in the project is likely good at attracting people who think they sound fun and do not question many details.

    It is difficult to get brief clear details about how gun.js works. There are distracting or simple documents, discussions about high level ideas and tangentially-related topics. In some cases 'worse is better' yes but it sets a tone that the project is hiding internal problems of implementation or can not easy explain its own design.

    The transparency of gun.js enterprise organizations also raises questions. Are people on your team related to you? Do you pay yourself and team and open source contributors?

    • marknadal 6 years ago

      Thank you!

      They also like the math & science behind it, some investors do DD, some don't.

      You're tone is pretty assuming, smug, and insulting.

      It is running in production with millions of people, yes there are some hiccups, but surviving & we're improving it to scale even more.

      I rebuild GUN from scratch in 30min on stage, that is how little/simple GUN is:

      https://gun.eco/docs/Porting-GUN

      Do I pay myself? Lol. Yes. No, I do not pay OSS, but yes people contribute regardless, strangers help me, my kids help me, investors help me, family & friends help me.

  • mkl 6 years ago

    With end-to-end encryption in GUN, what role does the server play? I assume the merging type activity happens in the clients?

    BTW, the name makes it almost impossible to search for info about GUN.

    • marknadal 6 years ago

      Yes, merging is per peer, not server transformed.

      "Servers" act as WebRTC bootstrapping signals, storage backups, relay/routing, etc., and you can have as many of them as you want (no need to be centralized).

      True. `gunDB` tag should help a little.

  • josemaenzo 6 years ago

    Do you have any document that explain what resolution algorithm uses in what cases?

    For example, one peer change a property value and the other peer deletes it.

    • marknadal 6 years ago

      Same algorithm for everything.

      This cartoon explains it:

      https://gun.eco/distributed/matters.html

      It is a vector + timestamp + lexical sort.

      A delete is changing a value to `null`, it would lose (I assume you are asking if/when these 2 changes happen at the exact same microsecond time, conflicting?) as is it has a lower lexical rank.

      • josemaenzo 6 years ago

        Can you refer me to a document where you explain the "lexical rank" that you are referring to?

        The cartoon explanation do not tackle those terms.

        • marknadal 6 years ago

          JSON.stringify('a') < JSON.stringify('z')

          https://en.wikipedia.org/wiki/Lexicographical_order

          • josemaenzo 6 years ago

            Cool! Make sense.

            Let's say our state have an array like ['a','b'].

            What about if Peer-A splice the first element (0) which results in ['b'] and Peer-B wants to mutate 'a' to 'A' which results in ['A','b'] and they both make the mutation at the same millisecond?

            • marknadal 6 years ago

              You have to choose which type of Array:

              1. An atomic Array

              2. Fixed index Array

              3. Linked list

              Now rather than not knowing what magic is going to happen, app developers know what to expect in advance.

              So atomic would choose one or the other, not a combined result.

              Fixed array 2nd item to 1st which nulls 2nd and makes 1st b, simultaneously also makes 1st A. Cause lower case b is lexically larger than A, you'd get [b, null].

              Linked list theoretically could handle it as you suggest.

              • josemaenzo 6 years ago

                Does gun have a built-in "Linked list" in the API? If so, where are the docs? Could not find them.

                If not, any paper or doc that explain how to implement it with gun?

                • marknadal 6 years ago

                  Started, but never finished. :(

                  @nmaro has some pretty good stuff.

                  The cartoon explainers gives an overview.

                  But you'd probably want to do RGA algorithm or one of the newer ones.

                  Martin Kleppmann has a pretty good talk with bunch of paper references:

                  https://youtu.be/yCcWpzY8dIA

                  • josemaenzo 6 years ago

                    Oh, I see. Just wondering, How GUN handle a situation like described above when working with Arrays?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection