Settings

Theme

CopyCat: Protocol-agnostic implementation of the Raft consensus algorithm

github.com

61 points by kuujo 11 years ago · 16 comments

Reader

mavelikara 11 years ago

I am writing a program which has to durably store large number of small files. My current implementation stores these on the local disk as files in a particular folder structure. I am considering designing some approach to make this program run on a cluster of machines for high-availability, fault-tolerance & scalability reasons.

One approach I can think of is to run a distributed document database and use that for storage. I don't need most features of such database products in the foreseeable future, so I fear that they will add operational overhead for not much benefit.

Another approach I can think of is to run my processing nodes against a network file system, and rely on that to do the replication.

Yet another approach I am considering is to use something like CopyCat to implement the file replication in my application code. Is this a good use case for CopyCat?

  • ww520 11 years ago

    CopyCat looks really good for HA and fault-tolerance. Not so sure about its scalability since all writes go through the single leader node. It's more appropriate for the use of maintaining metadata of a distributed system, rather than maintaining the data themselves.

    For your requirement, can you use S3 or something similar?

    • mavelikara 11 years ago

      Thanks! The software I am writing needs to be (easily) installed at the customer site. So, S3 won't work for me. Are there any open source S3 clones that are easy to setup?

      • michaelmior 11 years ago

        If you mean that you need something that supports the S3 API, you could try S3Proxy[0]. It basically presents the S3 API over a variety of different storage backends.

        [0] https://github.com/andrewgaul/s3proxy

      • bochi 11 years ago

        Riak CS is S3 compatible.

        I have a very similar problem and end up writing a python script that copies the files to cassandra and Nginx + Lua to serve them using lua-resty-cassandra. The read/write throughput is still not as good as I was expecting though.

      • stonith 11 years ago

        You could try OpenStack's Swift project.

    • SEJeff 11 years ago

      That is a "design feature" of raft, which allows it to lack the unbelievable complexity of Paxos, Multi-Paxos, or something like ZAB (yet another paxos variant).

  • lmm 11 years ago

    Sounds like the kind of problem OrientDB is supposed to solve (I haven't tried it though). I would stay away from network filesystems (they're fiddly, they'll add much more operational overhead than a database product, and there is no mature distributed one. OpenAFS is probably your best bet if you do want to go that route).

    • mavelikara 11 years ago

      OrientDBs clustering is, IIRC, built on top of Hazelcast. Storing my files in a clustered Hazelcast is another option I am considering, although I forgot to include it in my first comment.

      • lmm 11 years ago

        Oh yuck. I didn't know that. I've long found Hazelcast a pain to work with.

        Having seen in a sibling comment that you're talking about very small files, I'd recommend Zookeeper - it's mature and has pretty low admin overhead, IME.

        • mikojava 11 years ago

          that's very interesting. I've heard the exact opposite, hazelcast is easy to work with and Zookeeper more complicated.

  • enigmo 11 years ago

    Have you considered using a pre-existing distributed filesystem like Ceph or even HDFS instead?

    • mavelikara 11 years ago

      I did consider both. HDFS, from what I read, is designed for storing large files - my files are about 10Kb in size each.

      I am aware of Ceph but have not tried installing it to see how easy/hard it is to setup. Also, although this is not a hard requirement, I'd like to be able to support Windows; from what I have read so far, Ceph does not support Windows.

      • zerebubuth 11 years ago

        Ceph's filesystem interface is built on a lower level protocol which is available through the librados [1] API, which might be a possible solution if it builds for (or can be ported to) Cygwin / Windows and all you need is a client on Windows.

        This API is more S3-like, operating directly on "objects" in the Ceph cluster. I wrote a system for storing many small (10-20kB) files using librados & Ceph, but the performance wasn't as good as I had hoped. Possibly I did not configure the Ceph cluster in the optimal manner - the cluster setup is quite complex.

        [1] http://ceph.com/docs/master/rados/api/librados-intro/

michaelmior 11 years ago

This looks fantastic! As a DB researcher, I've found in the past I need some type of consensus algorithm for a system I'm building, but I don't want to spend a lot of time implementing something myself. I can see this being very useful in academia for scenarios like this.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection