Settings

Theme

Time, Clocks, and the Ordering of Events in a Distributed System – Paper Review

percisely.xyz

85 points by ekez 2 years ago · 23 comments

Reader

Phelinofist 2 years ago

Thanks for the link, really interesting!

We have a distributed system (clouds & cars) that message events which need to be processed in order. However, the clock of some system participants (cars) are drifting quite often. We plan to use a logical clock for the ordering instead soon.

  • AdieuToLogic 2 years ago

    > We have a distributed system (clouds & cars) that message events which need to be processed in order.

    Just out of curiosity, would a vector clock[0] be applicable for your problem domain?

    0 - https://en.wikipedia.org/wiki/Vector_clock

    • Phelinofist 2 years ago

      AFAIU Vector clocks are hybrid clocks and we plan to use a hybrid one, should've mentioned that from the start.

  • michaelt 2 years ago

    No way of accessing your cars' GPS receivers' 100-nanosecond-precise time signal, huh?

    • econner 2 years ago

      Yea I always liked the way Spanner solved this. Let's not bother with any of the distributed system theory and just use that Google money to build in to the system extremely precise clocks :-D.

      (Obv I know it's more complicated than that, but where else can you be reductionist if not the internet)

    • willis936 2 years ago

      It probably won't be 100 ns accurate in motion, but still quite accurate. Practically speaking you won't get better than ~1 s unless you connect the PPS pin to an interrupt or PLL.

      With a bare minimum implementation you should be able to get the GPS time from the NMEA strings. You just need to guarantee you'll have GPS lock at least intermittently, which is probably true for a fleet of monitored cars in nearly all situations.

      • labcomputer 2 years ago

        Urban canyons aside, the GNSS receiver itself should almost always have better than 100 ns accuracy, even when moving. After all, 100 ns at the speed of light is ~30 meters. At 200 ns you're on the wrong city block.

        As for bringing that into a computer, I'm having a hard time imagining a hardware setup where a software PLL won't give you millisecond-level accuracy (assuming you have an internal clock with at least that much precision). 1 second accuracy implies just using the timestamp from the last NEMA string with no further processing...

        • willis936 2 years ago

          Yes I am referring to NMEA string usage because that's as good as you'll get if the PPS output is not connected. You don't strictly need a hardware disciplined oscillator solution, which is why I mentioned connecting to an interrupt pin. This has worked pretty well for me when making raspberry pis into stratum 0 NTP servers.

          • labcomputer 2 years ago

            > Yes I am referring to NMEA string usage because that's as good as you'll get if the PPS output is not connected.

            1. That's only true with a super, super naive approach. You can easily estimate the average latency as part of your prototyping, and subtract that (while "live") from when you receive the full NEMA string. That will knock down the (in)accuracy by at least an order of magnitude, probably two, depending on latency variance. NEMA messages are (almost[1]) always sent synchronously with the PPS.

            2. Again I'm having trouble imagining this naive hardware setup where you don't have a better timestamp than 1 second. NEMA usually comes over a UART, so just timestamping the first byte of the NEMA string will give you ~millisecond accuracy. Even a cheap USB 1.1 UART dongle that only polls at 125Hz should give you ~10 millisecond accuracy. As above, you can estimate average latency during development to improve accuracy during usage.

            You'll need to do a little bit of software filtering to remove outliers, of course, but that requires no extra hardware.

            The point is: The technical bar to getting 10ms accuracy with no special hardware and no PPS is basically zero.

            [1] I'm hedging with almost, but I've never seen a counterexample.

    • Phelinofist 2 years ago

      That's what I asked them as well and apparently... no.

      • AlotOfReading 2 years ago

        As someone who's worked on some very similar problems, fix it properly on the vehicle side. Lamport/vector clocks are not a silver bullet here. They can be very difficult for end users to handle properly, and don't necessarily scale well in large fleets with unbounded drift. The vehicle network has a synchronized notion of time already. If the vehicles were designed properly, that's hooked into GNSS already. Even if they aren't (e.g. because it would impose unacceptably high uptime requirements), you have another system to bound drift already because various systems are required to stay within a small margin of "true" time (certificate validation comes to mind).

        This is a fundamental system for the company, so it's worth cracking a few heads over the issue.

  • myk9001 2 years ago

    You might also find this paper interesting. To be fair, so far I only had a chance to give it a very superficial look.

    https://dl.acm.org/doi/pdf/10.1145/112600.112601

  • YetAnotherNick 2 years ago

    Is it P2P connection or something? You could easily order events using kafka, redis or any alternative in cloud if they are less than million per second range.

  • myk9001 2 years ago

    Disclaimer: I'm not a distributed systems expert by any stretch of the imagination. Just picked up an interest in the subject some time ago. So, the things I'm about to say may sound silly.

    Do the cars participating in the system broadcast/multicast messages to each other? If so, logical clocks like Lamport clocks or vector clocks can be of great use. Logical clocks help capture the order of events in a distributed system (sending or receiving a message is one kind of event, but not the only).

    To give an example, let's say we have cars A, B, and C, they broadcast messages with Lamport timestamps. B broadcasts (lts: 33, msg: y), A broadcasts (lts: 18, msg: x). No matter in what order C receives the two messages, it knows A could not have sent "x" in response to "y", as 18 < 33. However, the converse isn't true -- i.e., 18 < 33 doesn't imply B sent "y" in response to "x", the two messages could've been broadcasted concurrently.

    If you do want to be able to tell if one event might've caused another based on their logical timestamps, a vector clock is a great choice.

    All that said, if this isn't as much about cars broadcasting messages to each other, as it's about cars sending messages to the server, pure logical clocks are not a tool meant for this job. This scenario calls for real-time ordering -- i.e., an imaginary omnipotent observer who could assigns a timestamp based on their own watch to each message could solve this. In the unfortunate absence of such an observer, the approaches to deal with this I'm aware of are:

    - A central timestamp server. While this comes with all the obvious downsides, it's also a relatively simple and straightforward solution.

    - Keeping the drift as small as possible and establishing an upper bound on it. Then waiting out the uncertainty interval before acquiring a timestamp. Basically, something akin to what Spanner is doing with their atomic clocks. A caveat here is if the cars are competing for offers with these messages, they might not be exactly happy with this strategy.

    - Similar to the previous point, but without atomic clocks? Maybe you can adapt some ideas from CockroachDB[^1]?

    - Using hybrid logical/physical clocks might be an option[^2][^3]

    ---

    [^1]: https://www.cockroachlabs.com/blog/living-without-atomic-clo...

    [^2]: http://users.ece.utexas.edu/~garg/pdslab/david/hybrid-time-t...

    [^3]: https://cse.buffalo.edu/tech-reports/2014-04.pdf

    • Phelinofist 2 years ago

      Thanks for your comment, very interesting and great links.

      Broadcasts are always done via the Cloud, not P2P. We plan to use a hybrid clock, should have mentioned that.

      • myk9001 2 years ago

        Looks like you have a truly exciting task ahead. Hybrid clocks are a very interesting technology. Good luck!

andsoitis 2 years ago

The paper: https://amturing.acm.org/p558-lamport.pdf

reality_inspctr 2 years ago

this is great! very clever logic model.

I wrote something I think is highly relevant about a distributed L1 for time, that you might enjoy, OP?

[oc] https://www.seanmcdonald.xyz/p/the-clockchain-protocol-the-l...

jpgvm 2 years ago

Possibly the most important distributed systems paper to read, along with the rest of Lamport naturally.

  • tincholio 2 years ago

    This is the first Lamport paper I read (over 20 years ago now), and it has stuck with me since then, it's so clearly explained...

frenchLeaf 2 years ago

What about ticsync?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection