How setting the TZ environment variable avoids thousands of system calls
blog.packagecloud.io> In other words: your system supports calling the time system call via the Linux kernel’s vDSO to avoid the cost of switching to the kernel. But, as soon as your program calls time, it calls localtime immediately after, which invokes a system call anyway.
This reminds me of an article by Ted Unangst[1], in which he flattens the various libraries and abstractions to show how xterm (to cite one of many culprits) in one place is effectively doing:
if (poll() || poll())
while (poll()) {
/* ... */
}
In other words, if you don't know what your library/abstraction is doing, you can end up accidentally duplicating its work.Reminds me of some aphorism, "Those who do not learn from history..." ;)
[1] http://www.tedunangst.com/flak/post/accidentally-nonblocking
Those who quote George Santayana are condemned to repeat him.
Those who don't know George Santayana are condemned to repeat him. FTFY.
You seem to have missed the joke...
You seem to have missed the joke... wait a second...
Totally got it. It's a refinement of the line. As the person below understands.
Apparently, i under stand.
With the layering of clusters, containers, micro services, bet you probably have 10x worse than that. There is always a cost to abstraction. On the surface it might make things simpler but if you were to peel it apart, you would reveal a hidden layer of complexity. Hopefully, it's done well right enough that there will never be a need to peel it apart.
> On the surface it might make things simpler but if you were to peel it apart, you would reveal a hidden layer of complexity.
Well yes, This is the very definition and goal of abstraction.
This is why I always liked the idea of Unikernels, they let us reset our abstractions without giving up all we've learned in the last couple decades.
Xe just did mplayer as well. It calls non-blocking select(), then non-blocking poll(), then nanosleep(), in a loop.
System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.
I ran an experiment where I timed the runtime of the sample program provided in the OP, except I changed the number of calls to localtime() from ten times to a million. I then timed the difference with and without export TZ=:/etc/localhost. The net savings was .6 seconds. So for a single call to localtime(3), the net savings is 0.6 microseconds.
That's non-zero, but it's likely in the noise compared to everything else that your program might be doing.
> System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.
"fast" is a relative term, and is somewhat orthogonal to "efficient".
There's a reason why certain functions use a vDSO. If you're just going to use a syscall anyway, there's kind of no point.
You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case. There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹). Look at the examples the article's author offers:
> formatting dates and times
This shouldn't require a call to localtime; more explanation on the part of the article is required here. Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access. Something else is amiss here.
> for everything from log messages
You're about to hit disk; a cache'd stat() isn't going to matter.
> to SQL queries.
You're about to hit the network; a cache'd stat() isn't going to matter.
(Now, I'm not saying you shouldn't set TZ; if it saves some syscalls, fine, and it might be the only sane value anyways.)
¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.
> You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case.
I don't believe I was. I was merely assuming that there a lot of cases (as in, potentially thousands of times a second) where code needs the system time they also want the localtime.
> There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹)
As the article demonstrates, whatever we might believe about best practice, actual practice seems to include a lot of cases where it is called.
Given that a given epoch time value can map to different dates & times, depending on timezone... I'm not sure why you think formatting dates & times wouldn't require considering the desired timezone.
You're similarly mistaken that logging a message involves hitting disk. It's a very common configuration for high throughput logs to buffer writing to disk across multiple messages and/or forward to a remote server.
Similarly SQL queries don't necessarily involve hitting the network (some don't even involve crossing an IPC boundary). Even if you do hit the network, once again, it is very common for multiple network requests to be buffered in user space before making a syscall, and of course a single SQL statement could involve more than one localized timestamp value (though I'd like to think in that case the the local timezone would be cached).
> ¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.
Well, if you are writing in Python, then worrying about the syscall overhead of reading the local timezone would seem odd (and for that matter, Python does some odd things with timezones, so I'm not even sure this would reliably trigger the syscall).
>Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access.
To do it simply yes, but not correctly. See the "Falsehoods programmers believe about time" series.
http://infiniteundo.com/post/25326999628/falsehoods-programm... http://infiniteundo.com/post/25509354022/more-falsehoods-pro...
> To do it simply yes, but not correctly.
No, it do it correctly doesn't require filesystem access either. I've read both articles in the past: neither refutes the point I made above. If I were incorrect, linking to an article that enumerates tens of things (some of them arguably incorrect) isn't useful.
If you're trying to imply that you need to take timezones into account, yes, you do. Yes, typically those definitions are stored on disk, but the context here is requiring filesystem access each and every time; most libraries (including glibc) will load the timezone definitions once, and keep them in memory. Thus, you can break a seconds-since-epoch out into year/mo/day/etc. with "simple" math, and it doesn't require a filesystem access. (Beyond the amortized one time load, but given the point and purposes of the article, I'm not considering that.)
Read the damn article. It explains how it's localtime (the function you need to format time in user's time zone) that makes the stat call - to check if the ocnfigured time zone changed.
> Read the damn article.
> Please don't insinuate that someone hasn't read an article
I read the article. Yes, localtime requires the call; that wasn't my point. My point was that for plenty of common, server-side code, either this isn't required, or is inconsequential.
The former case that I was consider is the formatting of timestamps into TZs in the context of a request being server by a server. Most server-side TZ conversions I've ever needed can't call localtime, b/c localtime is wired to not the user's timezone, but the TZ of the machine the server's code is running on, which is typically either nonsense, UTC, or whatever the devs like. Server side code needs (of course, YMMV) to use the user's TZ, whatever that may be, so I'm making calls to a library built for that, e.g., pytz, which doesn't need to stat() that the machines TZ as there is no point to doing so.
The other instances the author lists that do require localtime are instances where localtime's stat call is the least of your worries, as you're about to perform other operations that are much more expensive.
Timezones don't exclusively belong to users... Most syslogs (up until systemd) are configured to write out logs in machine localized time. Same goes for web servers. Really, there are a ton of cases where servers need to consider their timezone. I don't much like it, but it nevertheless is true.
> Most syslogs ... servers need
You have been using poor logging softwares. For the past decade and a half (or more) some of us have been using logging softwares that write out logs without converting timestamps to a local format or a local time, relying rather upon log post-processing tools to convert them to different (sometimes multiple) timezones of our choosing and at whim when we want to read our logs. Our servers haven't needed to consider timezones for all of those years, and our log-writing softwares don't call a localtime() function of any stripe. Please do not tar us with your brush.
* http://unix.stackexchange.com/a/326166/5132
* http://jdebp.eu./Softwares/nosh/guide/log-post-processing.ht...
* http://jdebp.eu./Softwares/nosh/guide/timestamps.html
* https://sawmill.net/formats/qmail_tai64_n.html
* https://www.elastic.co/guide/en/logstash/2.4/plugins-filters...
* http://docs.projectcalico.org/v1.6/usage/troubleshooting/log...
I haven't been using them I hate the whole approach and defer any rendering of timestamps in logs. I'm just pointing out that it is very commonplace. Set up any distro's server distribution with the default settings and then track how often the localtime file gets touched...
Sounds like a misunderstanding of "simple" :)
Hence the scare quotes around simple. The math is in no way straightforward, but it's nonetheless math, esp. once you have the TZ information (if required) in front of you. The point was that there are plenty of operations within a typical server-side codebase that either involve little-to-no syscalls (tagging a record with the current UTC time, or converting a UTC timestamp to an ISO formatted date and time for serialization on the wire, e.g., JSON) or are forced to hit really expensive syscalls, rendering a quite-likely-cached-in-RAM stat() moot (logging, SQL queries).
On your base system, yes. Lots of things can hook random syscalls, or environments might have syscall monitoring.
One example is the folks over at slack record every syscall for security auditing. https://slack.engineering/syscall-auditing-at-scale-e6a3ca8a...
Slack uses the Linux audit subsystem which is also certainly faster than you think it is. Consider how many system calls your typical application is issuing --- especially ones that are likely to be calling localtime() all the time, such as a web server. If system call auditing had that high of an overhead, everything would be horrifically slow --- but it isn't, because Linux audit sends its records out asynchronously and in batches.
https://www.redhat.com/archives/linux-audit/2015-January/msg...
of course this is RHEL 2.6.32 and it's open/close but 200000 sc/s vs 3000 sc/s shows it has some overhead. Maybe someone can rerun that test code on git and see what the overhead is.
This might be true for your system and libc, where the system calls make use of things like vDSO for gettimeofday go fast, but in general this isn't guaranteed at all. Even on x64, for certain libc implementations, like musl, if I recall correctly, syscalls are made the old fashioned way by trapping 0x80, which would mean you would see a much bigger effect by reducing the number of syscalls.
There is no vDSO for calls to stat(2). The claim in the article was that by setting the TZ environment variable to ":/etc/localtime", one could save "thousands" of stat system calls. Even for old-fashioned system calls where you use trap 0x80, Linux is still amazingly fast.
This can actually be a problem, since there are applications like git which assume stat is fast, and so it aggressively stat's all of the working files in the repository to check the mod times to see if anything has changed. That's fine on Linux, but it's a disaster on Windows, where the stat system call is dog-slow. Still, I'd call that a Windows bug, not a git bug.
Does Windows has stat() call? It is probably a function from some POSIX emulation layer and maybe that is why it is not fast.
It's also a disaster on NFS.
Not quite. On x86_32, for complicated and ultimately ridiculous but nevertheless valid reasons, lots of syscalls on musl use int $0x80. I have a patch to make this fixable but Linus shot it down. Maybe I should try again.
On x86_64, syscalls only use SYSCALL. It's very fast if audit and such are off and reasonably fast otherwise. (I extensively rewrote this code recently. Older teardowns of the syscall path are dated.)
System calls in x86 are fast. Other archs behave differently. And the syscall time is not the only thing that matters, but potentially yielding execution
I thought they were fast because x86 has multiple register files, enough for kernel space and user space to have their own, so that entry/exit to system calls doesn't require flushing registers to L1 (in the common case).
If that's true, then one test where you have a single process spinning into and out of a single syscall will have very different performance characteristics than a test where you have more processes than processor cores, because context switches flush the TLB.
Somebody who knows actual things about x86 and so forth please tell me if I'm spouting 90s-era comp sci architecture textbook stuff that no longer applies.
They're fast because x86 has a decently fast privilege change mechanism for system calls and Linux works fairly hard to avoid doing unnecessary work to handle them. In the simplest case, registers are saved, a function is called, regs are restored, and the kernel switches back to user mode.
The asm code is fairly straightforward in Linux these days. I'm proud of it. :)
Check out the post linked from the article: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g... to learn more about how system calls work on x86 Linux.
I did the same experiment on a Raspberry Pi 2. The net saving was 5.803 seconds, so 5.803 microseconds per call.
Obviously if you care about performance then you wouldn't be running your program on a Raspberry Pi in the first place. But for everything else there's this free speed up.
I build a bunch of home automation stuff (as a hobby) using Pis and other microcontrollers. Performance in those things translates almost directly to power savings, and is very desirable.
OTOH, I've never encountered an issue like this on those systems.. (yet)
System calls in Linux are not faster than not doing them.
Yeah, this is a perfect example of micro-optimization being unnecessary. Not only will you not see performance issues from this in the real world, it might cause problems down the road, because since it isn't set by default this way, some apps may not expect it and behave erroneously.
But it's neat information to have in the back of your head.
Unnecessary? I had a really bad experience with ancient skype version on modern Ubuntu desktop, and the fix for this was to set TZ environment variable to speedup first login/history fetch. Skype process was spending so much time doing useless work it was noticeable.
That's just not possible to authoritatively state. The best you can do is "this shouldn't normally cause a noticeable impact on most systems".
As just one example, what you're stat()ing over NFS with a busy, flaky and/or distant server? A bit of thought and you'll come up with a bunch of other times it suddenly starts to matter.
I did the same, but with 10M iterations:
So 0.7 microseconds on my machine.$ time ./tz ./tz 2,24s user 6,28s system 98% cpu 8,612 total $ export TZ=:/etc/localtime $ time ./tz ./tz 1,35s user 0,00s system 98% cpu 1,364 total> TZ=:/etc/localhost
Hope this is just a typo in your comment, not the actual test ;)
This isn't a typo, but is part of the syntax used by the TZ variable. (The same format appears in the article itself.)
See `man timezone` on a Linux system[1]. Specifically, see the passage that I've quoted below. Note that this is the third of three different formats that the man page describes that you can use in TZ:
> The second format specifies that the timezone information should be read from a file:
> *If the file specification filespec is omitted, or its value cannot be interpreted, then Coordinated Universal Time (UTC) is used. If filespec is given, it specifies another tzfile(5)-format file to read the timezone information from. If filespec does not begin with a '/', the file specification is relative to the system timezone directory. If the colon is omitted each of the above TZ formats will be tried.:[filespec]Sure, but at least on none of my Linux systems there is no such file /etc/localhost. I think the parent was referring to /etc/localtime. Not sure what is the behaviour if non-existent file is specified - perhaps the "value cannot be interpreted" case applies, but it's not pefectly clear, since it could be argued that the value is valid, just refers to a non-existent file.
Ah, correct you are! :-) I had missed that myself, and the : syntax is so rarely seen I naturally assumed that was what was intended.
Good blog post explaining the behavior of glibc, I also saw this first hand when profiling Apache awhile back too:
http://mail-archives.apache.org/mod_mbox/httpd-dev/201111.mb...
https://github.com/apache/httpd/blob/trunk/server/util_time....
The internals of glibc can often be pretty surprising sometimes, I'd really encourage people to go spelunking into the glibc source when they are profiling applications.
Please quantify the speedup (I've found this before, but it's never been a significant issue). Eliminating unnecessary work is great, but what are we really talking about here? Use a CPU flamegraph, Ctrl-F and search for stat functions. It'll quantify the total on the bottom right.
Oh, and another page that recommends strace without warning about overheads. Dangerous.
Honestly, the primary reason I support this is to get developers out of the habbit of demanding a localized server timezone. As an infra' person, I want system time in UTC. If developers get in the habbit of setting TZ, then I can have this!
It feels like any code that needs to know the timezone of the server is inherently wrong. If timezone ever comes up in any context, it's either the timezone of the client from whom the request originates - in which case it should come as part of the request - or else the timezone somehow associated with the business process (e.g. "warehouse open 8-5 Eastern time"), in which case it should be part of the configuration for that one service.
Author of the post here: greetings.
If you enjoyed this post, you may also enjoy our deep dive explaining exactly how system calls work on Linux[1].
[1]: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...
Is there a reason why the path to the timezone file is prefixed with a colon?
TZ=:/etc/localtime
I've set TZ sometimes without the colon and it seem to work. I did a quick online search and didn't find anything relevant.
:<whatever> means "read it from the <whatever>" file. See the last part of the relevant glibc documentation: https://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_...
However the reason it works without : is that the implementation is being lazy and just ignores the : delimiter and falls back to parsing out a filename either way:
https://sourceware.org/git/?p=glibc.git;a=blob;f=time/tzset....
You beat me to it. I was answering my own question when one of my users came in with a problem. Stupid users...
Here is the answer:
https://www.gnu.org/software/libc/manual/html_node/TZ-Variab...
The other formats specify the timezone directly, such as EST+5EDT. Interestingly, it seems to work okay without the colon. Perhaps the leading slash implies a filename?The third format looks like this: :characters Each operating system interprets this format differently; in the GNU C Library, characters is the name of a file which describes the time zone.See https://news.ycombinator.com/item?id=13704054. The colon forces the file to be loaded, without it the other formats are tried first.
Brendan Gregg wrote about this a few years ago [1].
My favorite part:
> WTF?? Why is ls(1) running stat() on /etc/localtime for every line of output?
[1] http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-...
What is missing in this post is:
- Why does glibc check /etc/localtime every time localtime is called? Wild guess: so that new values of /etc/localtime are picked at runtime without restarting programs.
- Corollary: why does glibc not check /etc/localtime every time localtime is called, when TZ is set to :/etc/localtime? Arguably the reason above should still apply when TZ is set to a file name, shouldn't it?
For the second question: There doesn't seem to be an explicit reason for the difference of treatment. The code that does it has been there since 1996, and hasn't changed since. The only reason given is "Caching happens based on the contents of the environment variable TZ.".
https://sourceware.org/git/?p=glibc.git;a=commit;h=68dbb3a69...
I'd argue it should cache the same when both old_tz and tz are NULL (but start with an old_tz that is not NULL).
I was about to file an upstream bug, but found https://sourceware.org/bugzilla/show_bug.cgi?id=5184 and https://sourceware.org/bugzilla/show_bug.cgi?id=5186
The latter actually implies the opposite should be happening: files given in TZ should be stat()ed just as much as /etc/localtime.
This is due to the multiuser nature of Unix-like systems.
/etc/localtime is set by the administrator. It may change without notice to the user.
TZ is part of the user's environment and the user sets it. All applications run by the user should honor the user's wishes if the user's not falling back to system defaults.
If you're setting TZ for yourself, your libc can update things when you update the variable and restart any applications you're running under the old value. It can therefore save cycles. If you're falling back to the system default that's not under the same user's control, then it must be ready to deal with unexpected changes.
Hi, both are answered in the article:
First:
> What’s going on here is that the first call to localtime in glibc opens and reads the contents of /etc/localtime. All subsequent calls to localtime internally call stat, but they do this to ensure that the timezone file has not changed.
and second: read the section titled "Preventing extraneous system calls" for the answer to your second question.
Those "answers" are more about how than why.
Thanks for reading and I'm glad to hear you loved my post!
If this has a real-world/measurable/etc. impact why isn't this set by default? Are there potential side-effects? Is it set in some distros but not others?
Portability, compatibility. The system should not set environment variables if there is a reasonable default action. Env is intended to be set by the user.
In general, the timezone is set during OS setup, and the system is left in a state where it's up to the applications to figure out what to do. For example, you might configure Apache (yes, I am old, leave me alone) to use a particular timezone. But if Apache senses an env var it may choose to override the configured value with what's in the env var. Or SSH might be configured to pass along all env vars, including TZ, which in all honesty it probably won't even if you tried, but it could, and then the destination server's application has the wrong timezone.
Point is, it's safer not to mess with env vars unless you need to.
The default behavior is done so programs don't need to be restarted if the timezone is changed, which also has real world impact.
Probably because while there may be tens of thousands of additional syscalls, the total amount of added latency and resources consumed are more likely to be on a scale of micro/nano/milli seconds.
In the trace you can see that the syscall takes less than a tenth of a millisecond. I don't think this is a big penalty to check if I have changed my timezone or not, as unlikely as that is during normal operation.
This seems to be a simple RTFM issue to me: POSIX specifies that gmtime() uses UTC and localtime() uses current timezone. Using gmtime() would implement the desired behaviour without any need to hardcode environment variables.
...which fixes all the code you wrote, but of course, you may have legacy binaries that you don't have access to source to change... hence a simple setting of an environment variable, hardcoded though it may be, fixes the situation for all.
Though PeterWillis makes a good point akin to yours, and your (plural) point does make sense.
(Edit: added mention of comment with additional background on why to avoid hardcoding the variable)
Great post. I remember when vDSOs were added we noticed a nice speedup in our code. We tuned for realtime and a few microseconds here and there add up. Most importantly, less systems calls means more predictability.
This reminds me of a very similar behavior in Solaris over 20 years ago. Our C application was having odd performance problems on some client systems, and eventually we saw via truss that there were hundreds of fopen() calls every second to get the timezone. Setting the right environment variable solved the problem.
I really enjoy when people dig into things like this and report their findings. Having said that, I question the wisdom of "bothering" with this sort of thing. Everything you do that's non-standard or works against a system's default behavior incurs a cost. It's yet another thing you have to replicate when you migrate to a new version, change provisioning systems, etc.
And for what benefit? A few hundred syscalls per second? Linux syscalls are fast enough that something of that magnitude shouldn't matter much. Given that /etc/localtime will certainly be in cache with that frequency of access, a stat() should do little work in the kernel to return, so that won't be slow either.
It's good that they did some benchmarking to look at the differences, but this feels like a premature optimization to me. I can't imagine that this did anything but make their application a tiny fraction of a percent faster. Was it worth the time to dig into that for this increase? Was it worth the maintenance cost I mention in my first paragraph? I wouldn't think so.
I'm really trying not to take a crap on what they did; as I said, it's really cool to dig into these sorts of abstractions and find out where they're inefficient or leak (or just great as a learning exercise; we all depend on a mountain of code that most people don't understand at all). But, when looked at from a holistic systems approach, a grab bag of little "tweaks" like this can become harmful in the long run.
You should be able to get maintenance for something like this pretty low. Add a line with a nice comment to the config template for running your apps that adds an extra env var. Any machine that runs any app now has the line...
I would definitely like to see a before/after real-world metric on impact here though.
The embedded Linux system I'm working on right now takes about 17 us per stat() call due to this but time will always be kept internally as UTC, so taking advantage of this is worth considering for me.
Since most embedded systems can directly translate the amount of processing needed to achieve the product goal into a real dollar cost of the hardware, any savings, even a small one, is worth investigating. Since the hardware and system are generally well understood, implementing something like this is much more reasonable.
But I agree, on a general purpose OS doing general purpose thing, an optimization like what's proposed by the article may not be worth the other tradeoffs.
There's another easy way to avoid this: use localtime_r instead of localtime. From the glibc source:
/* Update internal database according to current TZ setting.
POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
This is a good idea since this allows at least a bit more parallelism. */
tzset_internal (tp == &_tmbuf && use_localtime, 1);
mktime also does the tzset call every time, though: time_t
mktime (struct tm *tp)
{
#ifdef _LIBC
/* POSIX.1 8.1.1 requires that whenever mktime() is called, the
time zone names contained in the external variable 'tzname' shall
be set as if the tzset() function had been called. */
__tzset ();
#endif
and I don't see any way around that other than setting TZ=: or some such.It is perhaps out of scope of the article, but it sure would have been helpful to show how to set the TZ environment variable and what to set it to.
Its right there in middle of article (under "Preventing extraneous system calls" section):
$ TZ=:/etc/localtime strace -ttT ./testI can't believe I missed that! Thanks.
Why did this post start with a tl;dr, then a Summary, and _still_ buried the important takeaway in the ultimate paragraph?
BTW, packagecloud.io is the great hosting for RPM/DEB packages. We've been using it for the last couple years. GitHub + Travis CI + PackageCloud combination allows us build and publish packages for EVERY git commit in 30+ repositories targeting 15 different Linux distributions [1]. There is no more need to hire a special devops guy for that.
Interesting article but I can't reproduce the behavior on Ubuntu 16.01 LTS. I don't have TZ set (or anything locale-related for that matter). Here are the library dependencies:
$ ldd test
linux-vdso.so.1 => (0x00007ffd80baf000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8844bf7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8844fbc000)
Any thoughts why the behavior would be different?There's also a surprising difference in behavior between tm = localtime() and localtime_r(..., &tm).
The former is the "traditional" function which returns a pointer to a statically allocated, global "struct_tm". The latter is the thread-safe version receiving a pointer to a use-supplied "struct tm" as it's second argument.
with TZ set to Europe/Berlin, set to :/etc/localtime, or unset I never get a stat on anything.: do { : t = time(NULL); : localtime_r(&t, &tm); : printf("The time is now %02d:%02d:%02d.\n", : tm.tm_hour, tm.tm_min, tm.tm_sec); : sleep(1); : } while(--N);
If I change it to tm = localtime()...write(1, "The time is now 07:23:33.\n", 26The time is now 07:23:33. ) = 26 nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0 write(1, "The time is now 07:23:34.\n", 26The time is now 07:23:34. ) = 26 nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0 write(1, "The time is now 07:23:35.\n", 26The time is now 07:23:35. ) = 26 nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0
One more reason to switch to the reentrant/thread-safe versions of those ugly library functions :-).stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0 write(1, "The time is now 07:30:56.\n", 26The time is now 07:30:56.) = 26 nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc868c3010) = 0Note, this is using glibc 2.24 under Arch.
$ /lib/libc.so.6 GNU C Library (GNU libc) stable release version 2.24, by Roland McGrath et al. (...) Compiled by GNU CC version 6.1.1 20160802.
This reminds me of setting noatime for disk mounts (http://askubuntu.com/questions/2099/is-it-worth-to-tune-ext4...)
Now I want to know the number of other configs to reduce the number of system calls. This all adds up to being significant the greater the number of hosts in your environment.
While trying to find the cause of slowness in Rails requests, I was running strace on an unicorn process when I encountered the same thing mentioned in the article.
Rails instrumentation code calls current time before and after any instrumentation block. So, when I looked at the trace there were a lot of `stat` calls coming for `/etc/localtime` and as stat is an IO operation, I thought I discovered the cause of slowness(which I attributed to high number of IO ops) but surprisingly when I saw the strace method summary; while the call count was high, the time taken by the calls in total was not significant(<1% if I remember correctly). So I decided to set TZ with the next AMI update 15 months back but forgot about it totally. I guess I should add it to my Trello list this time.
Also, I think he should have printed the aggregate summary of just CPU clock time(`-c`) as well as that is usually very low.
...while the call count was high, the time taken by the calls in total was not significant(<1% if I remember correctly).
Yes, on ordinary filesystems if you run stat() over and over again on the same file then it's just copying from the in-memory inode into your struct stat, there's no IO.
I do this for "not get annoyed while stracing" reasons, not perf!
Really interesting - thanks for sharing the findings. I haven't seen it mentioned here, but for those of us using `timedatectl` via systemd, with the default setting of `UTC` are taking advantage[1] of the recommendation in the article.
[1] https://github.com/systemd/systemd/blob/master/src/timedate/...
This reads to me like a glibc bug. Glibc should just be watching "/etc/localtime" for changes, rather than calling out to hundreds of times a second.
It's hard to do that without any system calls, though.
The way you'd do this is to open an inotify (or platform equivalent) file descriptor on the first call to localtime(), and on future calls, only bother statting /etc/localtime again if that file descriptor reports something has changed. But checking if an FD has data requires a system call; you'd do a non-blocking read on the FD (or a non-blocking select or poll, or an ioctl(FIONREAD)). It's possible that system call is faster than stat, but it's still a system call.
You could do it with a thread that does a blocking read, but that's a mistake under the current UNIX architecture: lots of stuff (signal delivery and masks, for instance) gets weird as soon as you have threads, so unconditionally sticking a thread into every process using libc is a bad plan.
You could do it with fcntl(F_SETSIG), which would send you a signal (SIGIO by default) when the inotify file descriptor is readable, but you couldn't actually use SIGIO, since the user's process might have a handler. You'd need to steal one of the real-time signals and set SIGRTMIN = old SIGRTMIN + 1. (See https://github.com/geofft/enviable for a totally-non-production-suitable example of this approach.) This would probably mostly work, except changing SIGRTMIN is technically an ABI break (since programs communicate with each other with numerical signal values). You could maybe use one of the real-time signals that pthread already steals. Also, using signals is sort of a questionable idea in general; user programs now risk getting EINTR when /etc/localtime changes, which they're likely not to be prepared for. A single-threaded glibc program that sets no signal handlers never gets any EINTRs, which is nice.
In an ideal world, every program would have a standard event loop / mechanism for waiting on FDs, and libc could just register the file descriptor with that event loop. Then you'll get notified of /etc/localtime changing after the next run of the event loop, which is good enough, and it would take zero additional system calls. But unfortunately, UNIX wasn't designed that way, so something at libc's level can't assume the existence of any message loop. A higher-level library like GLib or Qt or libuv could probably do this, though.
What about calling mmap to map /etc/localtime into memory? The file would still have to be parsed for every call to localtime(), but the system call is avoided.
(Better yet, if it were possible to memory-map the directory entry for /etc/localtime, parsing could be avoided as well).
I agree the best solution is to provide a more sensible API. Why should an application be limited to only _one_ timezone?
mmapping doesn't solve the problem unless the file was edited in place (rather than overwritten as is commonly done). You'll just have an mmap on an inode that is no longer visible in the directory.
I think I like that solution! It doesn't work if someone replaces /etc/localtime with a new file, but it works if they update it in place. It's low-overhead as long as /etc/localtime remains in cache; the process that updates it will write to the same pages. It's pretty high-overhead if /etc/localtime ever gets flushed from cache, though, since you have to go out to disk.
There's no usable mechanisms glibc can use for watching /etc/localtime for changes that does not mess up the program if it also decides to use any file watching features.
At least on key platforms, it's pretty easy to use an event driven model and watch for updates. Hell, if the vDSO handled TZ it'd be no problem.
I'm not saying it's impossible, but if you actually sit down and attempt to add this to glibc, you will come to the realization that it is not easy at all. If you want to e.g. use the inotify mechanism you have a few key decision points to make
* There's no place where glibc can run an event loop to watch for the changes, the application might not have an event loop.
* If you need to integrate with an event loop the application runs, it'll work fine as long as the user remembers to hook up the events and deal with the corner cases. You can use this approach without any special support for glibc, and set TZ env. variable yourself when /etc/localtime changes - though at the moment that will only work for single threaded programs (setenv()/putenv() is not thread safe)
* If you decide to run the event loop in a separate thread, you force every binary to be multi threaded and have quite a lot of corner cases to handle when fork()'ing.
* If you don't run an event loop, you're back to polling for changes, and hardly anything is gained.
* If you use the signal driven I/O notification mechanism, you interfer with the application use of signals and I/O notification, and also have a host of fork() corner cases to consider.
vDSO is not a magic silver bullet that can solve this, you would at least have to have the kernel manage timezone support, or perhaps better, have the ability to transparently manage arbitarily data, and then expose that through shared memory that vDSO can use. This will not happen anytime soon.
I don't think it is quite so bad. The fork/event loop/thread problems already get dealt with for a variety of other cases. It's handy when you can coordinate with the runtime itself and exploit the level of indirection that the rest of the runtime has.
And how does it watch it? With the stat() syscall.
Polling is not watching.
Side note - why do some sites completely hide information about who is behind them? I couldn't find a single thing about that on their blog or main site.
It's not in any way hidden, just run a whois query:
$ whois packagecloud.io|grep Owner Owner Name : JOSEPH DAMATO Owner OrgName : COMPUTOLOGY, LLC Owner Addr : 359 FILLMORE ST!12 Owner Addr : SAN FRANCISCO Owner Addr : CA Owner Addr : USMaybe it's just that I use packagecloud and follow people in their circle on Twitter, but Joe Damato is the CEO and founder: https://twitter.com/joedamato
I'm under the impression that he may also write a lot these himself? Not entirely certain, though.
Pretty sure he writes all of the linux internals posts. He also has a bunch of great ones on his blog http://timetobleed.com/
All the links are in the footer of the page.
For what it's worth, they disappear if your browser isn't wide enough.
Oops! I'll fix this shortly, thanks for pointing it out.
It would be highly inconvenient to have to set this variable if you live in a country where you change the timezone twice a year due to summertime.
If TZ is set properly then you don't need to change it twice a year.
(man tzset for more info and examples of different values TZ can be set to. Mine is set to "Europe/London" which handles the DST switches automatically.)
Hm…, now that I read TZSET(3): Wouldn't that be
It doesn't seem tzset() will accept the formatTZ=:Europe/LondonTZ=Europe/London ?Yes, but " If the colon is omitted each of the above TZ formats will be tried." , so it'll be figured out even if the colon is missing.
Ah yes, thanks for pointing that out!
Are you talking about DST? You don't change your timezone at DST switch, the timezone data knows how to calculate the correct time based on your location (roughly) and the time of year. Mountain standard time and mountain daylight time are both part of mountain time, for example.
What I meant was: Currently my timezone is CET. In a month's time it will be CEST. If I have to set TZ to explicitly mirror that it will be a burden.
TZ may be set like this:
Replace Europe/Copenhagen with the appropriate entry from the list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zonesTZ=:Europe/CopenhagenUsually, /etc/localtime is a symlink, as on my laptop:
so TZ=:/etc/localtime has the same result./etc/localtime -> /usr/share/zoneinfo/Europe/CopenhagenYou can demonstrate that it takes account of changes to timezones:
Normal time for London:
British Summer Time:$ TZ=:Europe/London date -d '1995-12-30 12:00 UTC' -R Sat, 30 Dec 1995 12:00:00 +0000
'Double British Summer Time', used for a period during World War 2:$ TZ=:Europe/London date -d '1995-06-30 12:00 UTC' -R Fri, 30 Jun 1995 13:00:00 +0100
Before London's time was standardized to Greenwich:$ TZ=:Europe/London date -d '1945-06-30 12:00 UTC' -R Sat, 30 Jun 1945 14:00:00 +0200$ TZ=:Europe/London date -d '1845-06-30 12:00 UTC' -R Mon, 30 Jun 1845 11:58:45 -0001wait, so when I did `echo $TZ`, to check, and got `Europe/Amsterdam` that means it's already properly and I did't need to set it to `:/etc/localtime`?
this is not a part of Linux I'm very familiar with
Thanks!
Do all your work in UTC. Seriously, you'll be glad later.
Convert to local time only on the edges, and only for end users.
All my VPSes run in UTC, no exception. What I'm talking about here is my desktop machine. I strongly prefer to run that in localtime.
If you set your machine to the DST timezone for your political unit, then the displayed time will automatically jump back and forth on the correct dates, as long as your time database is reasonably up-to-date; you don't need to manually change from non-DST to DST & back.
The non-DST timezones are for those who are lucky enough to live in political units which tell DST to get bent, and just stay on real time all year long.
DST delenda est.
Thanks, I didn't know that.
I believe I prefer the
notation in this case, though.TZ=:Continent/City
Starting or stopping Daylight Saving Time is not a timezone change. A timezone is — roughly — a set of timekeeping rules that some set of people use. DST is just part of those timekeeping rules. That is, your timezone is not the same thing as your UTC offset.
For example, the timezone America/New_York contains information not only about the UTC offset, but the offset during and not during DST, and when DST starts and ends. (And historical starts and ends to, so that UTC → local conversions (and vice versa) use the rules that were present at that time, not the rules that are present now, which may be different.)
E.g., my home desktop runs in the timezone America/Los_Angeles all year around. Most of my servers run in the timezone UTC all year. Both always have the appropriately correct time.
TZs should be defined in the form "Europe/London", "Asia/Beirut", "Pacific/Auckland", etc.
Although the hour offset can change at extremely short notice (e.g. discontinuing Daylight Savings during Ramadan [1]), the timezone declaration (e.g. "Africa/Casablanca"), shouldn't need to change, just the underlying timezone database.
[1] https://en.wikipedia.org/wiki/Daylight_saving_time_in_Morocc...
That's not how I read TZSET(3). According to TZSET(3) you can either use (example for Copenhagen shown)
orTZ=CET
but notTZ=:Europe/CopenhagenTZ=Europe/CopenhagenApologies, I didn't explain my response very well.
You're totally right to say that in the context of TZSET, to load the timezone specification from a TZ-formatted file it needs to be prefixed with ':'.
Rather than saying "Technically, you should set the value of TZ to 'Europe/London'", I was trying to say that my philosophical opinion of timezone recording - whether in a CMS, on an O/S level, in a compiled app, etc - is that it should start with the standard of geographical location.
There might be occasions to augment that with other data, maybe including the UTC offset for that TZ at a particular time, but the TZ specification can be subject to frequent change, whilst a description such as "Europe/London" changes infrequently and seems to have the least ambiguity.
Ok, we agree then.
Im not an expert but the first thing that comes to mind is that 1) TFA does not quantify the performance gain in time 2) I wonder if environment variables like TZ are a security risk/vector in that these might facilitate attackers to stealthy skew/screw time within current user process... no root required.
OpenRC users can:
echo 'TZ=:/etc/localtime' > /etc/env.d/00localtimeFor Debian and derivatives:
echo 'TZ=:/etc/localtime' >> /etc/environment
Overhead of localtime() is well-known, just RTFM. Anyway, this article provides very good explanation.
Does anyone have any evidence of this actually having a resource usage impact on any common programs?
I see one reference to Apache below, but not whether it actually made a measurable difference.
This was a thoroughly fascinating read. Highly recommend reading the previous part in the series as well.
Does this affect FreeBSD as well?
Experimentally, no. The example program calls localtime(3) 10 times but only accesses the file once, per truss:
(FreeBSD caches the database on the first call: https://svnweb.freebsd.org/base/head/contrib/tzcode/stdtime/... )write(1,"Greetings!n",11) = 11 (0xb) access("/etc/localtime",R_OK) = 0 (0x0) open("/etc/localtime",O_RDONLY,037777777600) = 3 (0x3) fstat(3,{ mode=-r--r--r-- ,inode=11316113,size=2819,blksize=32768 }) = 0 (0x0) read(3,"TZif20000000000000"...,41448) = 2819 (0xb03) close(3) = 0 (0x0) issetugid() = 0 (0x0) open("/usr/share/zoneinfo/posixrules",O_RDONLY,00) = 3 (0x3) fstat(3,{ mode=-r--r--r-- ,inode=327579,size=3519,blksize=32768 }) = 0 (0x0) read(3,"TZif20000000000000"...,41448) = 3519 (0xdbf) close(3) = 0 (0x0) write(1,"Godspeed, dear friend!n",23) = 23 (0x17)It is unlikely, they usually don't use GNU C library.
It seems to me that this is something the Linux distributions should already be doing.
does anyone know if this impacts docker images as well ?
Yes, at least the Docker images that use glibc as their libc. (eg, most Debian/Ubuntu images)
It looks like musl, which is used on Alpine Linux images for example, will only read it once, and then cache it:
https://github.com/esmil/musl/blob/master/src/time/__tz.c#L1...
It has a mutex/lock around the use of the TZ info, but avoids re-stat'ing the localtime file.
This is the best part of HN. Not only did you answer GP's question in 6 minutes, but you link to the exact line of the source code.
If you havn't set the TZ variable in the image, then yes.
I would be surprised if this doesn't. There's nothing specific about docker when running this C code from any other processes running on a Linux host.