Settings

Theme

Bug story: Sorting by timestamp

adam-p.ca

68 points by adam-p 2 years ago · 48 comments

Reader

aljarry 2 years ago

Interesting note on the NOW() (or CURRENT_TIMESTAMP), they are equivalent to transaction_timestamp(), which means - start time of the current transaction.

So, if you'd insert multiple items in a single transaction, all of them would end up with the same value in the "created" column.

jupp0r 2 years ago

I found it to be a generally useful rule to never "ORDER BY created" but instead "ORDER BY created,id" instead to achieve stable sorting.

I recently added some indices to a few tables to speed up a complicated query with lots of subqueries and joins and ran into many unit test failures because usage of the new indices changed the order in which items with the same "created" values were returned.

  • adam-pOP 2 years ago

    Nice. That's a good general rule to follow.

  • bufferoverflow 2 years ago

    If there's an increasing ID, just sort by ID.

    • zoky 2 years ago

      Won’t work if your ID is a UUID. Also, this is more generally applicable to any date, not just created.

      • keep_reading 2 years ago

        UUIDv7 is sortable by time

        • zoky 2 years ago

          And I’m sure that will be incredibly useful in 20 years when UUIDv7 has entirely supplanted UUIDv4 in all legacy systems (and you only need to sort by date created), but for now let’s just go with the best practice for the foreseeable future and sort in a way that not only ensures consistency but future-proofs us indefinitely.

dotancohen 2 years ago

Another common reason that time goes backwards is DST. The amount of DST-related bugs that I've fixed over the years amazes me, because every single developer has moved clocks back and forth twice a year for their entire lives, bar a few years in the beginning. And even this fine article mentions the time-has-gone-back possibility yet ignores DST.

  • gwbas1c 2 years ago

    > The amount of DST-related bugs that I've fixed over the years amazes me

    > yet ignores DST

    Uhm, you do know that you're supposed to use a neutral timezone (Such as UTC) internally in your code / schema, and convert to local time in the UI?

    The bugs that I've hit generally have to do with using local time accidentally when UTC is expected.

    > because every single developer has moved clocks back and forth twice a year for their entire lives

    WTF? Every single developer knows that their database schema should be in UTC and thus immune to DST issues.

    • e28eta 2 years ago

      It sounds like you’re ignoring the fact that sometimes programs want to do things based on the user’s current time, and there _are_ reasons for developers to need to be aware of DST and handle edge cases correctly.

      I inherited some code that was sending push notifications to users with a daily summary. I don’t remember why it was tripping over DST, since the notifications were at a “reasonable” hour (9am?) and afaik the time change happens during the hours most people are asleep, but the ruby API being used had errors for “there are zero / two times that match the time you asked for”.

      I had to scratch my head for a little on why we were seeing both the zero and two errors in September / October, despite being intellectually well aware that the southern hemisphere seasons are the opposite of the northern.

      • takinola 2 years ago

        But using UTC solves that problem, no? The time only changes in DST but not in UTC. If the code checks the time against UST, it will never trip up.

        Anyways, the only real solution for managing time is to use a library like Luxon so you can stop thinking about it. Time is like cryptographic encryption - Don't roll your own solution if there is a battle tested library available.

        • recursive 2 years ago

          It does not. I've got a specification that all inputs and outputs are in local time. Sometimes the timezone isn't known yet. Or at all.

          • out_of_protocol 2 years ago

            Well, when it's not time, just some numbers. The most you can do is to store as is, hoping to figure out what that means later.

            Also, even if all inputs and outputs are in local time, it still makes sense to process everything in UTC. There are a lot of weird corner cases making math on datetime really complicated

            • growse 2 years ago

              This doesn't work reliably for future timestamps. If a user submits a local timestamp and you then convert it to UTC, you're using the current definition of what we think that offset will be at that future time.

              However, timezone definitions change. By doing the conversion, you've got to remember to check that the offset between what you stored and the user's local hasn't changed since the time you did the conversion. How often and when you do this checking is ... not obvious.

              • out_of_protocol 2 years ago

                UTC or unix timestamp gets you moment of time. You know exactly moment event happened. Definitions can change, but usually not retroactively. TZdata is a database of such changes, and stores definition changes, basically all *nix OSes depend on it to cast moment of time into timezone. You can always convert timestamp in the past into desired timezone. If you work with calendars and want you know what "04:00 tomorrow" means - good luck, you'll need it. as example, "04:00 tomorrow" may not exist at all

        • latchkey 2 years ago

          I worked in a system a long time ago where someone came up with the bright idea of running batch jobs in such a way that the job would complete by the time we were in the users timezone morning hours. You know, so they could get the data when they were awake.

          As time went on, the jobs started taking longer and longer and eventually took longer than the whole local morning, which resulted in them being awake AND being able to complain about it. The other issue was DST... users who were used to waking up and getting their data right away... would either be delayed an hour, or be done an hour early. Sometimes, the jobs would also run 2x or not run at all.

          It was an utter mess of unintended consequences.

        • e28eta 2 years ago

          It was checking for something like “is it 9am in PST / PDT right now? What about EST / EDT?”

          Or maybe, “what’s the current time in this list of time zones and then I’ll send notifications to users where it’s 9am”

          Or something, I don’t specifically remember. The point is that notifications aren’t generated until they’re supposed to be sent, and we didn’t want that to vary by 1 hour with daylight savings.

          And this specific use case aside, I think it’s true that there’s a class of use cases where developers do need to consider DST.

          • takinola 2 years ago

            I have no doubt that this may have been some crazy complicated case (as dealing with time can be) but I can't help the armchair quarterback debugging. Could this not have been addressed by periodically running a script that generated when (in UTC) a notification should be sent and then another script to send the notifications when that time comes around ie

            Script 1: send notification to User 123 at 9:00 UTC

            Script 2: It is 9:00 UTC, which users are due to have notifications that have not been sent?

            This way, the timezones don't ever really come into play. The axiom I follow for dealing with time is collect and display time in user's local time but store and process in UTC.

            • e28eta 2 years ago

              If you choose to divide it like that, script 1 is the one that needs to know about DST, because user 123 might be 9 UTC today, but what is it tomorrow?

              Might change due to a DST change (which varies by time zone, and time zone DST date can change year-to-year), or the user might change time zones. Either way, your Script 1 is the logic that had the bug I inherited.

    • arp242 2 years ago

      As soon as a time refers to "a time in a specific location" you need to store the location or timezone and can't just rely on UTC. There's a bunch of ways to do that, but "store as UTC, convert to local time in UI" is very simplistic that doesn't always work.

      For things like, say, HN posts or whatnot storing as UTC is usually the right answer, but even here there's some nuance; for example email Date: headers tend to include the TZ of the person who sent them, and if you just store all of that as UTC you lose that information.

  • WirelessGigabit 2 years ago

    Oh the joys of living in Arizona! Not having to deal with confusion twice per year! Not having to deal with these sort issues!

    Or so I thought.

    All calendar invites I own are set in Arizona time, so for everybody else they shift. Queue the rescheduling requests.

    For the ones I don't own it actually is better for me as they all shift an hour later.

  • aljarry 2 years ago

    > time goes backwards is DST

    Only if you deal with timestamps that don't contain timezone information. With TIMESTAMPTZ in postgres it's transparent, you don't have to do anything specifically to manage DST.

  • jonathanlydall 2 years ago

    > because every single developer has moved clocks back and forth twice a year for their entire lives

    Where I live, South Africa, we have a single timezone which never has DST.

    It is very easy for a developer here to be oblivious of time zones/DST as in practice it will never bite them in the local only market.

    Well, I lie, it’s only almost never, a common smell of incorrect time handling is when the deployment instructions have a note to ensure the server is set to the correct time zone.

    Tangental pet peeve of mine, in the early 2000s I received an abuse report with quoted logs with the timestamps being qualified as merely EST (or some other US time zone), which I found super annoying as I had to look up what the offset for such time zone was and then manually apply conversions before I could do any investigating of my own.

  • loloquwowndueo 2 years ago

    Every single US developer maybe.

    In Mexico DST started in 1994 so developers who started before that (hello!) did need some adjustment and operating system support was not a given.

    There are also countries where DST is not observed or differs from the US yet developers from those countries might be working remotely for a US company, with US developers or needing to accommodate a significant US clientele.

    • gwbas1c 2 years ago

      DST is common throughout the world: The days that it starts and stops vary, both by locale, and by year.

      The operating system keeps track of every locale's DST dates, all the way as long as DST has been a thing. When governments change the date, via law changes, the change usually gets passed along in an OS update.

      • o11c 2 years ago

        Note that Windows has major bugs here - from memory, it only keeps track of the 2 most recent DST rules, so historical timestamps will be wrong after the rules change twice.

      • mook 2 years ago

        Looking at the Wikipedia article, it looks like it's not that popular in Asia and Africa. Lots of places seems to have observed it at some point in the past, though a lot of them seem to have been last century.

        https://en.m.wikipedia.org/wiki/Daylight_saving_time_by_coun...

      • pritambaral 2 years ago

        Reporting from India here. No DST ever. I still know what DST is and how it works, but I've never had to change clocks due to DST.

    • dotancohen 2 years ago

      I'm not in the US.

  • _kst_ 2 years ago

    That should only matter if times are stored and sorted as local time. Storing time as UTC (e.g. seconds since 1970) and computing local time from that and the current time zone should avoid DST-related bugs.

    • andreareina 2 years ago

      Until you deal with the future. If I have a recurring 2 pm meeting, I'd be a little put out if some of those suddenly became 3 pm because of a DST change.

WirelessGigabit 2 years ago

Sidenote on showing (or not showing) timestamps.

Consider a page with 10 rows, and in each row you show the relative date.

2 issues with that:

1) If I see a whole page and I see 1 week ago the range is 7 days. If I see 1 year ago the range is 365 days. That is too much for most of the things. 2) If I am on a page without visual indication of sort order and I'm on a page that shows 10 entries with '1 year ago' I have no clue about the sort order.

I hate relative dates.

  • Terr_ 2 years ago

    To be pedantic, that's not a problem of relatives times as much as the data being lossy vague approximations.

    Still, "1 year, 7 months, 2 days, 5 hours, 3 minutes ago" isn't always ideal either, not even when it's done in a lexically sortable way.

    Ultimately it boils down to a choice which doesn't match problem the user has, and in different circumstances someone might want relative or absolute.

    • WirelessGigabit 2 years ago

      It can be solved by putting in an absolute date. I still don't understand what kind of problem relative dates solved.

      • Terr_ 2 years ago

        They work when relative-times are part of the question or mental model the user has when approaching the system.

        Then the user doesn't have to mentally cross-convert between relative measures and absolute timestamps, which is less error-prone and annoying.

senderista 2 years ago

Timestamp resolution doesn't just depend on the storage granularity of the timestamp type; it also depends on the actual accuracy of whatever system call is used to populate the timestamp.

neonsunset 2 years ago

This title is instant PTSD flashback - at one place I worked there was a system that would order events by their timestamps and multiple downstream systems would rely on that.

One day, I fixed an issue in the message producer that was causing the routine to take unreasonable amount of time and resources so that latency went from 1s to ~50ms.

Three hours later P1 is raised and the entire architecture had to be refactored. I still have the screenshot of the latency graph saved somewhere :)

vivzkestrel 2 years ago

CREATE INDEX IF NOT EXISTS feed_items_pubdate_id ON public.feed_items USING btree (pubdate DESC NULLS FIRST, id DESC NULLS FIRST) TABLESPACE pg_default;

create an index on both the pubdate (timestamptz column with potential duplicates like your post mentions) and a uuid primary key column (id in my case) problem solved

  • adam-pOP 2 years ago

    There are a couple of shortcomings that I think I see:

    1. If the clock moves backwards (and an "older" record gets created), it doesn't help.

    2. If a duplicate timestamp is created with an id that sorts earlier than one of the other dups (and a user has already synced to the first dup).

    In both of those cases, the user won't receive the new record. (Yes, neither is probable, but neither is impossible.)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection