CopyCat: Protocol-agnostic implementation of the Raft consensus algorithm

61 points by kuujo 12 years ago · 16 comments

Reader

I am writing a program which has to durably store large number of small files. My current implementation stores these on the local disk as files in a particular folder structure. I am considering designing some approach to make this program run on a cluster of machines for high-availability, fault-tolerance & scalability reasons.

One approach I can think of is to run a distributed document database and use that for storage. I don't need most features of such database products in the foreseeable future, so I fear that they will add operational overhead for not much benefit.

Another approach I can think of is to run my processing nodes against a network file system, and rely on that to do the replication.

Yet another approach I am considering is to use something like CopyCat to implement the file replication in my application code. Is this a good use case for CopyCat?

ww520 12 years ago

CopyCat looks really good for HA and fault-tolerance. Not so sure about its scalability since all writes go through the single leader node. It's more appropriate for the use of maintaining metadata of a distributed system, rather than maintaining the data themselves.
For your requirement, can you use S3 or something similar?
- mavelikara 12 years ago
  
  Thanks! The software I am writing needs to be (easily) installed at the customer site. So, S3 won't work for me. Are there any open source S3 clones that are easy to setup?
  - michaelmior 12 years ago
    
    If you mean that you need something that supports the S3 API, you could try S3Proxy[0]. It basically presents the S3 API over a variety of different storage backends.
    [0] https://github.com/andrewgaul/s3proxy
  - bochi 12 years ago
    
    Riak CS is S3 compatible.
    I have a very similar problem and end up writing a python script that copies the files to cassandra and Nginx + Lua to serve them using lua-resty-cassandra. The read/write throughput is still not as good as I was expecting though.
  - stonith 12 years ago
    
    You could try OpenStack's Swift project.
- SEJeff 12 years ago
  
  That is a "design feature" of raft, which allows it to lack the unbelievable complexity of Paxos, Multi-Paxos, or something like ZAB (yet another paxos variant).
lmm 12 years ago

Sounds like the kind of problem OrientDB is supposed to solve (I haven't tried it though). I would stay away from network filesystems (they're fiddly, they'll add much more operational overhead than a database product, and there is no mature distributed one. OpenAFS is probably your best bet if you do want to go that route).
- mavelikara 12 years ago
  
  OrientDBs clustering is, IIRC, built on top of Hazelcast. Storing my files in a clustered Hazelcast is another option I am considering, although I forgot to include it in my first comment.
  - lmm 12 years ago
    
    Oh yuck. I didn't know that. I've long found Hazelcast a pain to work with.
    Having seen in a sibling comment that you're talking about very small files, I'd recommend Zookeeper - it's mature and has pretty low admin overhead, IME.
    
    mikojava 12 years ago
    
    that's very interesting. I've heard the exact opposite, hazelcast is easy to work with and Zookeeper more complicated.
enigmo 12 years ago

Have you considered using a pre-existing distributed filesystem like Ceph or even HDFS instead?
- mavelikara 12 years ago
  
  I did consider both. HDFS, from what I read, is designed for storing large files - my files are about 10Kb in size each.
  I am aware of Ceph but have not tried installing it to see how easy/hard it is to setup. Also, although this is not a hard requirement, I'd like to be able to support Windows; from what I have read so far, Ceph does not support Windows.
  - zerebubuth 12 years ago
    
    Ceph's filesystem interface is built on a lower level protocol which is available through the librados [1] API, which might be a possible solution if it builds for (or can be ported to) Cygwin / Windows and all you need is a client on Windows.
    This API is more S3-like, operating directly on "objects" in the Ceph cluster. I wrote a system for storing many small (10-20kB) files using librados & Ceph, but the performance wasn't as good as I had hoped. Possibly I did not configure the Ceph cluster in the optimal manner - the cluster setup is quite complex.
    [1] http://ceph.com/docs/master/rados/api/librados-intro/

michaelmior 12 years ago

This looks fantastic! As a DB researcher, I've found in the past I need some type of consensus algorithm for a system I'm building, but I don't want to spend a lot of time implementing something myself. I can see this being very useful in academia for scenarios like this.

Settings

CopyCat: Protocol-agnostic implementation of the Raft consensus algorithm

Keyboard Shortcuts