Loro 1.0 – Loro - NFHN Reader

Zixuan Chen Liang Zhao

Loro is a Conflict-free Replicated Data Type (CRDT) library that developers can use to implement real-time collaboration and version control in their applications. You can use Loro to create local-first software. Loro 1.0 has a stable data format, excellent performance, and rich features. You can use it in Rust, JS (via WASM), and Swift.

Features of Loro 1.0

High-performance CRDTs

High-performance, general-purpose CRDTs can significantly reduce data synchronization complexity and are crucial for local-first development.

However, large CRDT documents may face challenges with loading speed and memory consumption, especially when dealing with those with extensive editing histories. Loro 1.0 addresses this challenge through a new storage format, achieving a 10x improvement in loading speed. In benchmarks using Loro with real-world editing data, we’ve reduced the loading time for a document with millions of operations from 16ms to 1ms. When utilizing the shallow snapshot format (discussed later), the time can be further reduced to 0.37ms. As a result, Loro will not become a bottleneck for applications dealing with such large documents. It expands the potential use cases for CRDTs, making them viable for a wider range of applications.

Rich CRDT types

Loro now supports rich text CRDT, which enhances the merge result of rich text (text with formatting and styling) to better align with user expectations. Our text/list CRDT is based on the Fugue algorithm. It prevents interleaving issues when merging concurrent edits. For example, it can avoid unintended merges like “1H2i3” when “123” and “Hi” are inserted concurrently.

We also support:

Movable List: Supports set, insert, delete, and move operations. The algorithm ensures that after merging concurrent moves, each element occupies only one position.
Map: Similar to a JavaScript object.
Movable Tree: Used to model file directories, outliners, and other hierarchical structures that may need moving. It ensures no cyclic dependencies exist in the tree after merging concurrent move operations.

Loro also supports nesting between types, so you can model edits on JSON documents through them:

You can find all the code samples in this blog here

Version control

Like Git, Loro saves a complete directed acyclic graph (DAG) of edit history. In Loro, the DAG is used to represent the dependencies between edits, similar to how Git represents commit history.

Loro supports primitives that allow users to switch between different versions, fork new branches, edit on new branches, and merge branches.

Based on this operation primitive, applications can build various Git-like capabilities:

You can merge multiple versions without needing to manually resolve conflicts
You can rebase/squash updates from the current branch to the target branch (WIP)

You can also use to create a separate doc at the current version. It is independent of the current doc, and works like a fork:

Leveraging the potential of the Eg-walker

Event Graph Walker (Eg-walker) is a pioneering collaboration algorithm that combines the strengths of Operational Transformation (OT) and CRDT, two widely used algorithms for real-time collaboration.

While OT is centralized and CRDT is decentralized, OT traditionally had an advantage in terms of lower document overhead. CRDTs initially had higher overhead, but recent optimizations have significantly reduced this gap, making CRDTs increasingly competitive. Eg-walker leverages the best aspects of both approaches.

Not only have we use the idea of Eg-walker for Text and List CRDTs in Loro, but Loro’s overall architecture has also been greatly inspired by Eg-walker. As a result, Loro closely resembles Eg-walker in terms of algorithmic properties.

The Eg-walker paper was released in September 2023. Prior to its official publication, Joseph Gentle shared an initial version of the algorithm in the Diamond-Type repository. Excited by the design, I implemented a similar algorithm in Loro two years ago. A brief introduction to this algorithm can be found here.

The properties of Eg-walker includes:

It itself conforms to the definition of CRDT, so it has the strong eventual consistency property of CRDT, thus can be used in distributed environments
Fast local operation speed: compared to previous CRDTs, it processes operations extremely fast because it doesn’t need to generate corresponding Operations based on CRDT data structures
Fast merging of remote operations: The complexity of OT merging remote operations is O(n^2), while Eg-walker, like mainstream CRDTs, is O(nlogn), only reaching O(n^2) in extremely rare worst-case scenarios. This means that when the number of concurrent operations reaches 10,000, OT will start to show noticeable lag to users, while CRDTs can handle it easily. And in most real-world scenario benchmarks, it’s faster than other CRDTs.
Lower memory usage: Because it doesn’t need to persistently store CRDT structures in memory, its memory usage is lower than general CRDTs
Faster import speed: CRDT documents often take a long time to load because they need to parse the corresponding CRDT structures or operations to build the CRDT data structures. Without these structures, they cannot continue subsequent editing, resulting in long import times. Eg-walker, like OT algorithms, only needs the current document state and does not need to build these additional structures to allow users to start editing the document directly, thus achieving much faster import speed

In the past quarter, we have made significant architectural adjustments to allow Loro to further leverage the advantages of the Eg-walker algorithm. Here are our achievements

Shallow Snapshot

By default, Loro stores the complete editing history of the document like Git, because the Eg-walker algorithm needs to load edits that are parallel to them and to the least common ancestor when merging remote edits. Shallow Snapshot is like Git’s Shallow Clone, which can remove old historical operations that users don’t need, greatly reducing document size and improving document import and export speed. This allows you to cold store document history that is too old and mainly use shallow doc for collaboration. Here’s an example usage:

For details on the implementation principle, see Shallow Snapshot.

Optimized Document Format

Loro version 1.0 has achieved a 10x to 100x improvement in document import speed compared to version 0.16, which already has a fast import speed. It makes it possible to load a large text document with several million operations in under a frame time.

This is because we introduced a new snapshot format. When a LoroDoc is initialized through this snapshot format, we don’t parse the corresponding document state and historical information until the user actually needs that information.

In Loro 1.0’s snapshot format, without compression algorithms, its document size is twice that of the old version (and other mainstream CRDTs). This additional size mainly comes from encoding historical operations + document state in the 1.0 snapshot format, without reusing stored data between the two, while in the old version we used the order of historical operations to encode the current state of the document (the old version’s encoding learned from Automerge encoding’s Value Column).

Trading twice the document size for ten times the import speed is worthwhile because import speed affects the performance of many aspects, and the import speed of CRDT documents is often noticeable to users on large documents (> 16ms). It also leaves possibilities for more optimizations in the future.

Inspired by the design of Key-Value Databases, we have also divided the storage of document state and history into blocks, with each block roughly 4KB in size, so that when users really need a piece of history, we only need to decompress and read this 4KB of content, without parsing the entire document. This has led to a qualitative improvement in import speed, and because the serialization format can better compress history and state, memory usage is also lower than before.

The lazy loading optimization takes advantage of Eg-walker’s property that “it doesn’t need to keep the complete CRDT data structure in memory at all times, and only needs to access historical operations when parallel edits occur”.

Benchmarks

All benchmarks results below were performed on a MacBook Pro M1 2020

Below is a comparison of Snapshot import and export speeds between Loro versions 1.0.0-beta.1 and 0.16.12. The benchmark is based on document editing history from the real world. Thanks to latch.bio for sharing the document data. The benchmark code is available here. The document contains 1,659,541 operations.

In Loro, a Snapshot stores the document history along with its current state. The Shallow Snapshot format, similar to Git’s Shallow Clone, can exclude history. In the benchmark below, the Shallow Snapshot has a depth=1 (only the most recent operation history is retained, other historical operations are removed)

Here are the key points of this benchmark:

The Shallow Snapshot has a depth of 1, meaning it only contains the document state and a single historical operation, which is why it’s significantly faster
GetAllValue refers to calling (in JS, it’s ). It loads the complete state of the document and obtains the corresponding JSON-like structure. This represents the time spent on CRDT parsing before a user loads a document.
Edit refers to making a local modification. As you can see, it has little impact on the time taken because Loro doesn’t need to load the complete CRDT data structure for local operations.
Export refers to exporting the complete document data again. We expect to further reduce the time spent here in the future, as we can continue to reuse the encoding of unmodified Blocks from the import.

The following shows the performance on a document after applying the editing history from the Automerge Paper 100 times. You can reproduce the results here. The document contains:

18,231,500 single-character insertion operations
7,746,300 single-character deletion operations
25,977,800 operations totally
10,485,200 characters in the final document

The New snapshot data is smaller because it performs additional simple compression on each Block during encoding internally

Next Steps for Loro

Loro Version Controller

Importing Loro Git Repo into Loro Version Controller

Loro’s performance on a single document is now sufficient to cover the real-time collaboration and version management needs of most documents. So our next step will be to explore real-time collaboration and version control across a collection of documents.

We believe that CRDTs can create a Git for Everyone and Everything:

It’s for Everyone because by leveraging the power of CRDTs, we can make version control much easier to reason about and use for the average person.
It’s (nearly) for Everything because Loro provides a rich set of data synchronization types. We’re no longer limited to synchronizing plain text data, but can solve semantic automatic merging of JSON-like schema, which can meet most needs of creative tools and collaborative tools.

We’ve created a demo of the Loro version controller, which is based on our sub-document implementation (implemented in the application layer) with Version information. It can import the entire React repository (about 20,000 commits, thousands of collaborators), and it supports real-time collaboration on such repositories. However, how to better manage versions and seamlessly integrate with Git still needs to be explored.

Loro CRDTs still have significant room for optimization in these scenarios. Currently, the Loro CRDTs library doesn’t involve network or disk I/O, which enhances its ease of use but also constrains its capabilities and potential optimizations. For example, while we’ve implemented block-level storage, documents are still imported and exported as whole units. Adding I/O capabilities to selectively load/save blocks would enable significant performance optimizations.

Conclusion

Loro 1.0 features great performance improvements, rich CRDT types, and advanced version control features. Our optimized document format has yielded promising results on the import speed and the memory usage.

Now that Loro CRDTs are stable, we are able to develop a better ecosystem. We’re excited to see it being applied in various scenarios. If you’re interested in using Loro, welcome to join our Discord community for discussions.