Dynamiq – A simple implementation of a queue on top of Riak 2.0
github.com> You should account for this in your design by either managing your own de-dupe solution (such as using Memcache to hold the keys you've seen from a given queue, expiring with the visibility timeout on the queue) or design a system which self-defends against duplicate messages.
At that point, why not just use beanstalkd or redis? You're already centralizing part of your queue elsewhere, and the point of Riak is decentralization.
However, if you can tolerate duplicate messages, then this seems like a cool system. I didn't get from the README how it actually orders the jobs/messages though. I get it uses an index range scan, but on what values? Is it an ID, and how does the client generate these?
Hi, thanks for the feedback.
Before I respond, let me say that I'm one of the core developers of the system, and that I am a big fan of Redis, and conceptually of Beanstalkd (never used it, but I know someone who is a big fan of it).
The reason you wouldn't use Redis of Beanstalkd is because they are inherently singular systems - Things exist to make them behave in a distributed fashion (for Redis, anyways - not sure about Beanstalkd?), but they are not. You're not going to scale them past a single box / node (although I will admit, I am not up on my Beanstalkd news, so possibly they've made strides there?).
Dynamiq leverages the amazing work done by Basho to build a rock solid distributed data store. By providing a light layer of coordination and logic at the edge of the system, it allows you to treat Riak (in all of it's AP glory) as something like a queue (this dovetails into your question about order, below). Riak is distributed to the core, and Dynamiq uses that to it's advantage. Need more capacity? Add more nodes. It'll handle the rest.
On the subject of dupes, they are technically "rare" once the system is running at an even keel. Only when nodes enter / leave or when you alter the configuration of a queue will you be likely to see dupes. Otherwise, the only "dupes" you'd see would be when the timeout expires and "out for delivery" messages become naturally available again (but we do not consider those "dupes").
The system in no way shape or form guarantees, offers, or even implies order. You will likely get a very loose order so long as you are always keeping up with the rate of messages in, but thats it. Never assume order, and in general you should strive to build systems that are resilient in the face of a lack of order.
The range scan operates over an ID that Dynamiq itself assigns to the message, by generating a random int64 using the golang secure/crypto library. The client cannot specify the ID. However, and this is what we do internally, we assign the message an internal, application specific ID prior to publishing it to Dynamiq. So each message ultimately ends up with 2 IDs - 1 for Dynamiqs own use, which you use to ACK once you're done, and one application specific ID which may or may not mean anything to the consuming service.
Cool, thanks for the answers, and great work. I understand there's a trade off between distribution and strictness, seems like Dynamiq actively chooses distribution, which is actually a very cool choice to see a queue move into (most or all queues I know of tend to favor consistency).
To answer your questions on beanstalkd, the answer to scaling past a single node is sharding, much like a traditional SQL database (although no replication exists). There really is no HA option. Redis is similar, although it does support replication, so you do have more failover options.