<et cetera>
As mentioned in I had a brain hiccup with __builtin_parity:
Two things:
- I work with bits often on a daily basis.
- I’m pretty stupid and my brain occasionally has a hiccup… like it did today.
Only the second point is relevant in today’s hella face-palmy ramble.
I was working on some sockets code and I needed to log some stuff where nanoseconds mattered. Although I wanted to go via the RDTSC route and multiply that with TSC increment frequency, I eventually decided against it for reasons that don’t matter here. It was deemed fine to use the clock_gettime() syscall. This syscall takes two arguments: clockid and a reference to a timespec structure variable.
I set the clockid to CLOCK_MONOTONIC_RAW and passed a blank timespec structure variable, whose definition looks like:
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
I needed to get the time as a string, so I made this quick function that was repeatedly called:
/* Get the current time, sourced from monotonic raw, as a string */
int gettime_str(char *time_str)
{
int ret = 0;
struct timespec ts;
ret = clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
if (ret != 0) {
perror("clock_gettime failed\n");
return ret;
}
sprintf(time_str, "%ld.%ld", (long) ts.tv_sec, ts.tv_nsec);
return 0;
}
Apart from the blatant lack of checking whether time_str is NULL, some of
you may have already identified the problem. For those of you that haven’t, take
a look again.
While doing some python3 post processing, I was calculating the difference in timestamps between log entries. However, I noticed that occasionally the program would spit out negative time differences, i.e., a later entry had an earlier timestamp than a previous entry. This was bizarre because I made sure to flush the buffer to the file and ensured that no potential re-ordering was occurring.
I revisited the clock_gettime() man page to see if I missed something because the programmer can NeVeR be wrong (sarcasm, just in case). After some analysis, I realized my mistake — I didn’t format the nanoseconds properly. Here’s an example:
log 0: 216.819
log 1: 216.2102
>>> ...
>>> # after some processing
>>> ...
>>>
>>> timediff = log[1] - log[0]
>>>
>>> # this is essentially:
>>> 216.2102 - 216.819
-0.6088000000000022
Do you see the mistake now? This is when I wanted to jump off a cliff. Log 0 was recorded at (216 seconds, 819 nanoseconds) whereas Log 1 was recorded at (216 seconds, 2102 nanoseconds). OF COURSE WHEN PYTHON3 TRIES TO READ THE NUMBERS FROM NON-ZERO-PADDED STRINGS, IT’S GOING TO THINK THAT 216.819 IS GREATER THAN 216.2102.
Instead, it should have been 216.0819 and 216.2102.
I’m SUCH a fool. To cut me some slack, the logs weren’t as minimal as they are written here. Nanoseconds involve 9 digits, and the logs were staggered, making it hard to notice that a few entries had 8 digits while most had 9 digits. That being said, I’m still a fool.
To fix it, you pad the ts.tv_nsec part of the format specifier with zeros:
sprintf(time_str, "%ld.%09ld", (long) ts.tv_sec, ts.tv_nsec);
</et cetera>