A proposal for more reliable locks using Redis
antirez.comThe classic "Leases: an efficient fault-tolerant mechanism for distributed file cache consistency" (http://portal.acm.org/citation.cfm?id=74870) dating back to 1989 is a good read about these kinds of systems. It makes some interesting observations about the approach, and introduces the need for bounded drift.
I think antirez is saying "skew" here when "drift" would be more appropriate. The safety property appears to refer to the different in rates between clocks, rather than the difference in absolute values. That's a much more reasonable assumption, and is likely to be true even with very bad clock hardware over short periods of time.
Obviously the bounded drift assumption,
> The safety property appears to refer to the different in rates between clocks, rather than the difference in absolute values.
Exactly that, thanks for the correction, indeed the word I used is wrong, even if probably from the context it was understandable I was referring to drift I'm going to replace the term.
Is this new? I feel like using Redis for locks is something that's been going around for a while. I've used Redis to make locks, and also used it to make counting semaphores. It's a fairly interesting use because it's frequently the the simplest, least overhead means to solve a problem that's reliable enough without actually being reliable.
The most obvious issue is that if Redis goes down you could end up with problems if the processes using the locks continue, particularly depending on when and what state Redis restarts.
Another is that you have to take an approach of re-checking out your lock so as not to let it expire if you can't guarantee strict time constraints. Once you do this, you run a risk of something not finishing but extending its lock indefinitely.
A final issue is that you can end up with a situation where there's no guarantee that a waiting task (or whatever you call something that wants a lock or in on a semaphore) will ever run.
I don't really buy those who talk about this being an insane violation 0 state/share nothing. When I've needed these kinds of primitives it rarely has to do with the application state itself - for example I've used the counting semaphores to control how many worker processes can be active. Likewise, I've used the plain locks (and lock-like structures) to do things like insure atomic/ordered writes for user sessions (I suppose session is stateful, but it's also not really shared application state).
In any case, there are some issues, but at the cost of a minimal amount of nursing the ease of implementation and integration often makes Redis a go-to choice for these kinds of things, particularly in resource (hardware) constrained environments. On the other hand if you're operating a scale where you've got multiple datacenters and such, it's a different ballgame.
It is not new at all, that's exactly why I wrote the blog post, since it is an actual use case. But it is a good idea to formalize a canonical model to do it with Redis in a safe way, and to show what are the failure modes when we resort to simpler designs (like failover to slaves).
As a related aside, I'd be mildly interested in Redis having an ability to produce and produce and set UUIDs. That sort of thing might also have some utility for locking structures/synchronization primitives, though it's not the original reason I'd thought having something like that might be nice.
I wrote a very similar, Redis-based lock in Python a while ago, here it is:
https://gist.github.com/adewes/6103220
It uses Redis pipelines and watchers to make sure that no race conditions between two processes requiring the same lock occur, and uses "expire" keys to avoid deadlocks.
I prefer Lua-scripting based approach: https://github.com/bbangert/retools/blob/master/retools/lock..., but I had to implement something similar to your code to support older Redis versions.
You should just use redis-py's Lock (https://github.com/andymccurdy/redis-py/blob/master/redis/cl...)
> Step 2) It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances.
> so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.
I am confused. Are the locks requested sequentially, or at the same time? It seem like if they are requested sequentially, the the random backoff time would need to be a large multiple of the combined latency.
> I am confused. Are the locks requested sequentially, or at the same time? It seem like if they are requested sequentially, the the random backoff time would need to be a large multiple of the combined latency.
As stated in the post, it is ideal if using multiplexing we send the SET to all the instances at the same time, but this does not change a lot the difference between the chosen lock validity and the latency to set the lock, since with 5 instances it is still requires something in the "a few" millisecond range in the average to set the lock, with not very optimized clients, so 1 second is already three order of magnitude more.
Thanks, I understand. The sequential locking isn't a requirement per-se, and the assumption is that the N-masters are in the same region. It seems like with clever multiplexing, you could relax the requirement the N-masters are in the same region.
Definitely, even without proper multiplexing, if the connections are not blocking, it is possible to just lower the latency to MAX(latencies) sending the commands in a loop to all the instances, and later reading them.
Redis clients should be capable to do that, by allowing to separate (on demand) the command delivering from the reply reading moments.
It seems like a lock should be able to autorelease in a distributed environment if the acquirer is no longer available. Would this not be considered "safe?"
What about broadcasting acquire/release messages?
The proposed algorithm provides a safety guarantee which is time bound: once the lock is acquired it has a specified validity time, after this time, it is possible for another client to reacquire it.
In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.
>In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.
You could keep re-acquiring the lock when it gets close to expiring, and stop your process if the lock becomes un-acquirable.
Yes, this is a good strategy. We are even guaranteed to be able to re-acquire the log if we send the reacquire request in time, and there are no new partitions, because in order to reacquire the lock it is possible to send a script that checks if the value matches, and if so, we can extend the expire of the keys. Basically it is possible for the lock holder to reacquire by extending the duration of the previous lock before it expires.
It seems like a timeout is less reliable/safe than some broadcast/ping mechanism that can check availability perpetually and if a node has disappeared the validity of the lock changes.
Trying to remember which distributed system model it is that sort of does this. Ring? Mesh?
For such a model to work I believe you need a distributed replicated state machine, and the clients to be an active part of the distributed system (not just participating doing requests), being able to reply to pings. Yes, there is a safety advantage in the model you describe, as if the time taken to finish with an operation is larger than expected, the other clients may want to wait more, but in the practice:
1) What you do if the client replies to pings but takes an apparently never ending time to perform the operation on the shared resource?
2) What about if the client is correctly operating on the shared resource but the only component which is failing is the system you use to check its availability?
1) I think a mix of the two approaches would work here - an actual timeout to a lock but not the only way of keeping a lock
2) I suppose that's always possible, but then what would happen is the lock would be released. Not ideal behavior but also not one that presents a data reliability issue.
That could increase throughput if you have a lot of crashing nodes but how does it improve safety or reliability?
Safety? It doesn't. Reliability? because it would prevent another node from acquiring a held lock if a server is available and release a lock if a server goes down.
There would never be a dirty read possible.
So the locks in a Distributed version work something like the NRW concept in Riak ?
The algorithm described more closely resembles a CP system (in its need to successfully lock the majority of instances), since it requires majority to be available and in order to lock a resource. Also it is not a tunable algorithm, at least the majority of nodes must be locked in order for the safety to be guaranteed.
I finish reading after first paragraph... When do people will learn that using locks, shared memory does not work? It is just wrong. Things should be immutable, you should share nothing. And by nothing i mean nothing at all. It is just WRONG.
I don't think you can simplify things that far. Shared-nothing is certainly a good architectural goal, and there are many types of distributed systems where it is both highly desirable and achievable. On the other hand, there are many distributed systems (and parallel systems) where coordination is strictly necessary.
Think about implementing something like the TPC-C load on a distributed database. Many parts of that load can be done sharded, with a shared-nothing model. Other parts need some level of coordination, and some need true serializability (which requires significant coordination). The pieces that can be built on shared-nothing primitives should be built on shared-nothing primitives. However, it's not useful to pretend that the pieces that require coordination simply don't exist.
Similarly, immutability is a good goal. An append-only log of immutable data items is often a very good mental model, and frequently a good implementation model, for distributed databases. It isn't universally applicable, though, and may put constraints on the possible supportable operations that add complexity to other parts of the system.
Go create such a system, make it easy to use and widely available, and then you might have room to say such things with some credibility.