Onyx Straps in For a Jepsening
onyxplatform.orgI just spent ~5 minutes clicking through on the Onyx site (learn, docs, blog, support, learn/FAQ, github) and still have no idea what it does.
It's a "a masterless, cloud scale, fault tolerant, high performance distributed computation system"! So it's EC2?
EC2 is a proprietary service for creating virtual machines on demand in Amazon's infrastructure, it is not a computation system. Much as a relational database system provides a framework for storing and retrieving relational data, a computation system provides a framework for performing (usually large-scale) computations, like transforming text files. It is open-ended, just as an RDBMS imposes no schema. A simple example would the processing pipeline of some web crawler.
Hadoop is one of the better known computation systems, though it does other things too (e.g. persistence with HDFS). Apache Storm is another popular computation system, particularly in the Clojure world. Onyx was created a few years ago, is implemented in Clojure, and competes with Storm (it's the first such system mentioned in the "What is it?" section of https://github.com/onyx-platform/onyx).
Onyx is distributed (meaning there are multiple nodes cooperating, typically over a network), and Jepsen is a rigorous tool for testing fault-tolerance in such systems. Being masterless (no node has central authority) is valuable for fault-tolerance. It means that in the event of a network partition, either side of the partition can continue processing as normal and recover when the partition is resolved. Jepsen simulates these and other fault conditions.
It is a big data processing system.
It is an alternative to Spark (and Spark Streaming), Storm, Hadoop MapReduce, etc.
It was a lot of fun testing Onyx using Jepsen, though it did take some time to get things right. If anyone has questions about testing a distributed system in this way I'd be happy to answer them.
Was bracing for something about Mary Lou Jepsen. No dice. What a drag.