How setting the TZ environment variable avoids thousands of system calls

blog.packagecloud.io

472 points by jcapote 9 years ago · 147 comments

Reader

> In other words: your system supports calling the time system call via the Linux kernel’s vDSO to avoid the cost of switching to the kernel. But, as soon as your program calls time, it calls localtime immediately after, which invokes a system call anyway.

This reminds me of an article by Ted Unangst[1], in which he flattens the various libraries and abstractions to show how xterm (to cite one of many culprits) in one place is effectively doing:

        if (poll() || poll())
        while (poll()) {
             /* ... */
        }

In other words, if you don't know what your library/abstraction is doing, you can end up accidentally duplicating its work.

Reminds me of some aphorism, "Those who do not learn from history..." ;)

[1] http://www.tedunangst.com/flak/post/accidentally-nonblocking

discussed https://news.ycombinator.com/item?id=11847529

nsxwolf 9 years ago

Those who quote George Santayana are condemned to repeat him.
- andrewbinstock 9 years ago
  
  Those who don't know George Santayana are condemned to repeat him. FTFY.
  - koverstreet 9 years ago
    
    You seem to have missed the joke...
    
    btown 9 years ago
    
    You seem to have missed the joke... wait a second...
    
    andrewbinstock 9 years ago
    
    Totally got it. It's a refinement of the line. As the person below understands.
    
    barrystaes 9 years ago
    
    Apparently, i under stand.
segmondy 9 years ago

With the layering of clusters, containers, micro services, bet you probably have 10x worse than that. There is always a cost to abstraction. On the surface it might make things simpler but if you were to peel it apart, you would reveal a hidden layer of complexity. Hopefully, it's done well right enough that there will never be a need to peel it apart.
- nomel 9 years ago
  
  > On the surface it might make things simpler but if you were to peel it apart, you would reveal a hidden layer of complexity.
  Well yes, This is the very definition and goal of abstraction.
- jwhitlark 9 years ago
  
  This is why I always liked the idea of Unikernels, they let us reset our abstractions without giving up all we've learned in the last couple decades.
JdeBP 9 years ago

Xe just did mplayer as well. It calls non-blocking select(), then non-blocking poll(), then nanosleep(), in a loop.
* http://www.tedunangst.com/flak/post/mplayer-ktracing

tytso 9 years ago

System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.

I ran an experiment where I timed the runtime of the sample program provided in the OP, except I changed the number of calls to localtime() from ten times to a million. I then timed the difference with and without export TZ=:/etc/localhost. The net savings was .6 seconds. So for a single call to localtime(3), the net savings is 0.6 microseconds.

That's non-zero, but it's likely in the noise compared to everything else that your program might be doing.

cbsmith 9 years ago

> System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.
"fast" is a relative term, and is somewhat orthogonal to "efficient".
There's a reason why certain functions use a vDSO. If you're just going to use a syscall anyway, there's kind of no point.
- deathanatos 9 years ago
  
  You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case. There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹). Look at the examples the article's author offers:
  > formatting dates and times
  This shouldn't require a call to localtime; more explanation on the part of the article is required here. Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access. Something else is amiss here.
  > for everything from log messages
  You're about to hit disk; a cache'd stat() isn't going to matter.
  > to SQL queries.
  You're about to hit the network; a cache'd stat() isn't going to matter.
  (Now, I'm not saying you shouldn't set TZ; if it saves some syscalls, fine, and it might be the only sane value anyways.)
  ¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.
  - cbsmith 9 years ago
    
    > You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case.
    I don't believe I was. I was merely assuming that there a lot of cases (as in, potentially thousands of times a second) where code needs the system time they also want the localtime.
    > There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹)
    As the article demonstrates, whatever we might believe about best practice, actual practice seems to include a lot of cases where it is called.
    Given that a given epoch time value can map to different dates & times, depending on timezone... I'm not sure why you think formatting dates & times wouldn't require considering the desired timezone.
    You're similarly mistaken that logging a message involves hitting disk. It's a very common configuration for high throughput logs to buffer writing to disk across multiple messages and/or forward to a remote server.
    Similarly SQL queries don't necessarily involve hitting the network (some don't even involve crossing an IPC boundary). Even if you do hit the network, once again, it is very common for multiple network requests to be buffered in user space before making a syscall, and of course a single SQL statement could involve more than one localized timestamp value (though I'd like to think in that case the the local timezone would be cached).
    > ¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.
    Well, if you are writing in Python, then worrying about the syscall overhead of reading the local timezone would seem odd (and for that matter, Python does some odd things with timezones, so I'm not even sure this would reliably trigger the syscall).
  - lfowles 9 years ago
    
    >Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access.
    To do it simply yes, but not correctly. See the "Falsehoods programmers believe about time" series.
    http://infiniteundo.com/post/25326999628/falsehoods-programm... http://infiniteundo.com/post/25509354022/more-falsehoods-pro...
    
    deathanatos 9 years ago
    
    > To do it simply yes, but not correctly.
    No, it do it correctly doesn't require filesystem access either. I've read both articles in the past: neither refutes the point I made above. If I were incorrect, linking to an article that enumerates tens of things (some of them arguably incorrect) isn't useful.
    If you're trying to imply that you need to take timezones into account, yes, you do. Yes, typically those definitions are stored on disk, but the context here is requiring filesystem access each and every time; most libraries (including glibc) will load the timezone definitions once, and keep them in memory. Thus, you can break a seconds-since-epoch out into year/mo/day/etc. with "simple" math, and it doesn't require a filesystem access. (Beyond the amortized one time load, but given the point and purposes of the article, I'm not considering that.)
    
    vsl 9 years ago
    
    Read the damn article. It explains how it's localtime (the function you need to format time in user's time zone) that makes the stat call - to check if the ocnfigured time zone changed.
    
    deathanatos 9 years ago
    
    > Read the damn article.
    > Please don't insinuate that someone hasn't read an article
    I read the article. Yes, localtime requires the call; that wasn't my point. My point was that for plenty of common, server-side code, either this isn't required, or is inconsequential.
    The former case that I was consider is the formatting of timestamps into TZs in the context of a request being server by a server. Most server-side TZ conversions I've ever needed can't call localtime, b/c localtime is wired to not the user's timezone, but the TZ of the machine the server's code is running on, which is typically either nonsense, UTC, or whatever the devs like. Server side code needs (of course, YMMV) to use the user's TZ, whatever that may be, so I'm making calls to a library built for that, e.g., pytz, which doesn't need to stat() that the machines TZ as there is no point to doing so.
    The other instances the author lists that do require localtime are instances where localtime's stat call is the least of your worries, as you're about to perform other operations that are much more expensive.
    
    cbsmith 9 years ago
    
    Timezones don't exclusively belong to users... Most syslogs (up until systemd) are configured to write out logs in machine localized time. Same goes for web servers. Really, there are a ton of cases where servers need to consider their timezone. I don't much like it, but it nevertheless is true.
    
    JdeBP 9 years ago
    
    > Most syslogs ... servers need
    You have been using poor logging softwares. For the past decade and a half (or more) some of us have been using logging softwares that write out logs without converting timestamps to a local format or a local time, relying rather upon log post-processing tools to convert them to different (sometimes multiple) timezones of our choosing and at whim when we want to read our logs. Our servers haven't needed to consider timezones for all of those years, and our log-writing softwares don't call a localtime() function of any stripe. Please do not tar us with your brush.
    * http://unix.stackexchange.com/a/326166/5132
    * http://jdebp.eu./Softwares/nosh/guide/log-post-processing.ht...
    * http://jdebp.eu./Softwares/nosh/guide/timestamps.html
    * https://sawmill.net/formats/qmail_tai64_n.html
    * https://www.elastic.co/guide/en/logstash/2.4/plugins-filters...
    * http://docs.projectcalico.org/v1.6/usage/troubleshooting/log...
    
    cbsmith 9 years ago
    
    I haven't been using them I hate the whole approach and defer any rendering of timestamps in logs. I'm just pointing out that it is very commonplace. Set up any distro's server distribution with the default settings and then track how often the localtime file gets touched...
    
    lfowles 9 years ago
    
    Sounds like a misunderstanding of "simple" :)
    
    deathanatos 9 years ago
    
    Hence the scare quotes around simple. The math is in no way straightforward, but it's nonetheless math, esp. once you have the TZ information (if required) in front of you. The point was that there are plenty of operations within a typical server-side codebase that either involve little-to-no syscalls (tagging a record with the current UTC time, or converting a UTC timestamp to an ISO formatted date and time for serialization on the wire, e.g., JSON) or are forced to hit really expensive syscalls, rendering a quite-likely-cached-in-RAM stat() moot (logging, SQL queries).
dsl 9 years ago

On your base system, yes. Lots of things can hook random syscalls, or environments might have syscall monitoring.
One example is the folks over at slack record every syscall for security auditing. https://slack.engineering/syscall-auditing-at-scale-e6a3ca8a...
- tytso 9 years ago
  
  Slack uses the Linux audit subsystem which is also certainly faster than you think it is. Consider how many system calls your typical application is issuing --- especially ones that are likely to be calling localtime() all the time, such as a web server. If system call auditing had that high of an overhead, everything would be horrifically slow --- but it isn't, because Linux audit sends its records out asynchronously and in batches.
  - nwmcsween 9 years ago
    
    https://www.redhat.com/archives/linux-audit/2015-January/msg...
    of course this is RHEL 2.6.32 and it's open/close but 200000 sc/s vs 3000 sc/s shows it has some overhead. Maybe someone can rerun that test code on git and see what the overhead is.
rpcope1 9 years ago

This might be true for your system and libc, where the system calls make use of things like vDSO for gettimeofday go fast, but in general this isn't guaranteed at all. Even on x64, for certain libc implementations, like musl, if I recall correctly, syscalls are made the old fashioned way by trapping 0x80, which would mean you would see a much bigger effect by reducing the number of syscalls.
- tytso 9 years ago
  
  There is no vDSO for calls to stat(2). The claim in the article was that by setting the TZ environment variable to ":/etc/localtime", one could save "thousands" of stat system calls. Even for old-fashioned system calls where you use trap 0x80, Linux is still amazingly fast.
  This can actually be a problem, since there are applications like git which assume stat is fast, and so it aggressively stat's all of the working files in the repository to check the mod times to see if anything has changed. That's fine on Linux, but it's a disaster on Windows, where the stat system call is dog-slow. Still, I'd call that a Windows bug, not a git bug.
  - codedokode 9 years ago
    
    Does Windows has stat() call? It is probably a function from some POSIX emulation layer and maybe that is why it is not fast.
  - kingosticks 9 years ago
    
    It's also a disaster on NFS.
- amluto 9 years ago
  
  Not quite. On x86_32, for complicated and ultimately ridiculous but nevertheless valid reasons, lots of syscalls on musl use int $0x80. I have a patch to make this fixable but Linus shot it down. Maybe I should try again.
  On x86_64, syscalls only use SYSCALL. It's very fast if audit and such are off and reasonably fast otherwise. (I extensively rewrote this code recently. Older teardowns of the syscall path are dated.)
- nwmcsween 9 years ago
  
  http://git.musl-libc.org/cgit/musl/tree/arch/x86_64/syscall_...
raverbashing 9 years ago

System calls in x86 are fast. Other archs behave differently. And the syscall time is not the only thing that matters, but potentially yielding execution
- philsnow 9 years ago
  
  I thought they were fast because x86 has multiple register files, enough for kernel space and user space to have their own, so that entry/exit to system calls doesn't require flushing registers to L1 (in the common case).
  If that's true, then one test where you have a single process spinning into and out of a single syscall will have very different performance characteristics than a test where you have more processes than processor cores, because context switches flush the TLB.
  Somebody who knows actual things about x86 and so forth please tell me if I'm spouting 90s-era comp sci architecture textbook stuff that no longer applies.
  - amluto 9 years ago
    
    They're fast because x86 has a decently fast privilege change mechanism for system calls and Linux works fairly hard to avoid doing unnecessary work to handle them. In the simplest case, registers are saved, a function is called, regs are restored, and the kernel switches back to user mode.
    The asm code is fairly straightforward in Linux these days. I'm proud of it. :)
  - jdamato 9 years ago
    
    Check out the post linked from the article: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g... to learn more about how system calls work on x86 Linux.
kingosticks 9 years ago

I did the same experiment on a Raspberry Pi 2. The net saving was 5.803 seconds, so 5.803 microseconds per call.
Obviously if you care about performance then you wouldn't be running your program on a Raspberry Pi in the first place. But for everything else there's this free speed up.
- mfukar 9 years ago
  
  I build a bunch of home automation stuff (as a hobby) using Pis and other microcontrollers. Performance in those things translates almost directly to power savings, and is very desirable.
  OTOH, I've never encountered an issue like this on those systems.. (yet)
mfukar 9 years ago

System calls in Linux are not faster than not doing them.
peterwwillis 9 years ago

Yeah, this is a perfect example of micro-optimization being unnecessary. Not only will you not see performance issues from this in the real world, it might cause problems down the road, because since it isn't set by default this way, some apps may not expect it and behave erroneously.
But it's neat information to have in the back of your head.
- ishtu 9 years ago
  
  Unnecessary? I had a really bad experience with ancient skype version on modern Ubuntu desktop, and the fix for this was to set TZ environment variable to speedup first login/history fetch. Skype process was spending so much time doing useless work it was noticeable.
- __jal 9 years ago
  
  That's just not possible to authoritatively state. The best you can do is "this shouldn't normally cause a noticeable impact on most systems".
  As just one example, what you're stat()ing over NFS with a busy, flaky and/or distant server? A bit of thought and you'll come up with a bunch of other times it suddenly starts to matter.

andrelaszlo 9 years ago

I did the same, but with 10M iterations:

    $ time ./tz     
    ./tz  2,24s user 6,28s system 98% cpu 8,612 total
    $ export TZ=:/etc/localtime
    $ time ./tz                
    ./tz  1,35s user 0,00s system 98% cpu 1,364 total

So 0.7 microseconds on my machine.

vesinisa 9 years ago

> TZ=:/etc/localhost
Hope this is just a typo in your comment, not the actual test ;)
- deathanatos 9 years ago
  This isn't a typo, but is part of the syntax used by the TZ variable. (The same format appears in the article itself.)
  See `man timezone` on a Linux system[1]. Specifically, see the passage that I've quoted below. Note that this is the third of three different formats that the man page describes that you can use in TZ:
  > The second format specifies that the timezone information should be read from a file:
  :[filespec]
  > *If the file specification filespec is omitted, or its value cannot be interpreted, then Coordinated Universal Time (UTC) is used. If filespec is given, it specifies another tzfile(5)-format file to read the timezone information from. If filespec does not begin with a '/', the file specification is relative to the system timezone directory. If the colon is omitted each of the above TZ formats will be tried.
  [1]: https://linux.die.net/man/3/timezone
  - vesinisa 9 years ago
    
    Sure, but at least on none of my Linux systems there is no such file /etc/localhost. I think the parent was referring to /etc/localtime. Not sure what is the behaviour if non-existent file is specified - perhaps the "value cannot be interpreted" case applies, but it's not pefectly clear, since it could be argued that the value is valid, just refers to a non-existent file.
    
    deathanatos 9 years ago
    
    Ah, correct you are! :-) I had missed that myself, and the : syntax is so rarely seen I naturally assumed that was what was intended.

pquerna 9 years ago

Good blog post explaining the behavior of glibc, I also saw this first hand when profiling Apache awhile back too:

http://mail-archives.apache.org/mod_mbox/httpd-dev/201111.mb...

https://github.com/apache/httpd/blob/trunk/server/util_time....

The internals of glibc can often be pretty surprising sometimes, I'd really encourage people to go spelunking into the glibc source when they are profiling applications.

brendangregg 9 years ago

Please quantify the speedup (I've found this before, but it's never been a significant issue). Eliminating unnecessary work is great, but what are we really talking about here? Use a CPU flamegraph, Ctrl-F and search for stat functions. It'll quantify the total on the bottom right.

brendangregg 9 years ago

Oh, and another page that recommends strace without warning about overheads. Dangerous.

Daviey 9 years ago

Honestly, the primary reason I support this is to get developers out of the habbit of demanding a localized server timezone. As an infra' person, I want system time in UTC. If developers get in the habbit of setting TZ, then I can have this!

int_19h 9 years ago

It feels like any code that needs to know the timezone of the server is inherently wrong. If timezone ever comes up in any context, it's either the timezone of the client from whom the request originates - in which case it should come as part of the request - or else the timezone somehow associated with the business process (e.g. "warehouse open 8-5 Eastern time"), in which case it should be part of the configuration for that one service.

jdamato 9 years ago

Author of the post here: greetings.

If you enjoyed this post, you may also enjoy our deep dive explaining exactly how system calls work on Linux[1].

[1]: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...

rootbear 9 years ago

Is there a reason why the path to the timezone file is prefixed with a colon?

TZ=:/etc/localtime

I've set TZ sometimes without the colon and it seem to work. I did a quick online search and didn't find anything relevant.

avar 9 years ago

:<whatever> means "read it from the <whatever>" file. See the last part of the relevant glibc documentation: https://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_...
However the reason it works without : is that the implementation is being lazy and just ignores the : delimiter and falls back to parsing out a filename either way:
https://sourceware.org/git/?p=glibc.git;a=blob;f=time/tzset....
- rootbear 9 years ago
  
  You beat me to it. I was answering my own question when one of my users came in with a problem. Stupid users...
rootbear 9 years ago
Here is the answer:
https://www.gnu.org/software/libc/manual/html_node/TZ-Variab...
```
  The third format looks like this:

  :characters

  Each operating system interprets this format differently; in the GNU C
  Library, characters is the name of a file which describes the time zone.
```
The other formats specify the timezone directly, such as EST+5EDT. Interestingly, it seems to work okay without the colon. Perhaps the leading slash implies a filename?
unwind 9 years ago

See https://news.ycombinator.com/item?id=13704054. The colon forces the file to be loaded, without it the other formats are tried first.

snowcrshd 9 years ago

Brendan Gregg wrote about this a few years ago [1].

My favorite part:

> WTF?? Why is ls(1) running stat() on /etc/localtime for every line of output?

[1] http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-...

glandium 9 years ago

What is missing in this post is:

- Why does glibc check /etc/localtime every time localtime is called? Wild guess: so that new values of /etc/localtime are picked at runtime without restarting programs.

- Corollary: why does glibc not check /etc/localtime every time localtime is called, when TZ is set to :/etc/localtime? Arguably the reason above should still apply when TZ is set to a file name, shouldn't it?

glandium 9 years ago

For the second question: There doesn't seem to be an explicit reason for the difference of treatment. The code that does it has been there since 1996, and hasn't changed since. The only reason given is "Caching happens based on the contents of the environment variable TZ.".
https://sourceware.org/git/?p=glibc.git;a=commit;h=68dbb3a69...
I'd argue it should cache the same when both old_tz and tz are NULL (but start with an old_tz that is not NULL).
I was about to file an upstream bug, but found https://sourceware.org/bugzilla/show_bug.cgi?id=5184 and https://sourceware.org/bugzilla/show_bug.cgi?id=5186
The latter actually implies the opposite should be happening: files given in TZ should be stat()ed just as much as /etc/localtime.
cestith 9 years ago

This is due to the multiuser nature of Unix-like systems.
/etc/localtime is set by the administrator. It may change without notice to the user.
TZ is part of the user's environment and the user sets it. All applications run by the user should honor the user's wishes if the user's not falling back to system defaults.
If you're setting TZ for yourself, your libc can update things when you update the variable and restart any applications you're running under the old value. It can therefore save cycles. If you're falling back to the system default that's not under the same user's control, then it must be ready to deal with unexpected changes.
jdamato 9 years ago

Hi, both are answered in the article:
First:
> What’s going on here is that the first call to localtime in glibc opens and reads the contents of /etc/localtime. All subsequent calls to localtime internally call stat, but they do this to ensure that the timezone file has not changed.
and second: read the section titled "Preventing extraneous system calls" for the answer to your second question.
- glandium 9 years ago
  
  Those "answers" are more about how than why.
  - jdamato 9 years ago
    
    Thanks for reading and I'm glad to hear you loved my post!

jonathonf 9 years ago

If this has a real-world/measurable/etc. impact why isn't this set by default? Are there potential side-effects? Is it set in some distros but not others?

peterwwillis 9 years ago

Portability, compatibility. The system should not set environment variables if there is a reasonable default action. Env is intended to be set by the user.
In general, the timezone is set during OS setup, and the system is left in a state where it's up to the applications to figure out what to do. For example, you might configure Apache (yes, I am old, leave me alone) to use a particular timezone. But if Apache senses an env var it may choose to override the configured value with what's in the env var. Or SSH might be configured to pass along all env vars, including TZ, which in all honesty it probably won't even if you tried, but it could, and then the destination server's application has the wrong timezone.
Point is, it's safer not to mess with env vars unless you need to.
noselasd 9 years ago

The default behavior is done so programs don't need to be restarted if the timezone is changed, which also has real world impact.
mixologic 9 years ago

Probably because while there may be tens of thousands of additional syscalls, the total amount of added latency and resources consumed are more likely to be on a scale of micro/nano/milli seconds.
- shawnz 9 years ago
  
  In the trace you can see that the syscall takes less than a tenth of a millisecond. I don't think this is a big penalty to check if I have changed my timezone or not, as unlikely as that is during normal operation.

leovonl 9 years ago

This seems to be a simple RTFM issue to me: POSIX specifies that gmtime() uses UTC and localtime() uses current timezone. Using gmtime() would implement the desired behaviour without any need to hardcode environment variables.

mwexler 9 years ago

...which fixes all the code you wrote, but of course, you may have legacy binaries that you don't have access to source to change... hence a simple setting of an environment variable, hardcoded though it may be, fixes the situation for all.
Though PeterWillis makes a good point akin to yours, and your (plural) point does make sense.
(Edit: added mention of comment with additional background on why to avoid hardcoding the variable)

rdtsc 9 years ago

Great post. I remember when vDSOs were added we noticed a nice speedup in our code. We tuned for realtime and a few microseconds here and there add up. Most importantly, less systems calls means more predictability.

blunte 9 years ago

This reminds me of a very similar behavior in Solaris over 20 years ago. Our C application was having odd performance problems on some client systems, and eventually we saw via truss that there were hundreds of fopen() calls every second to get the timezone. Setting the right environment variable solved the problem.

kelnos 9 years ago

I really enjoy when people dig into things like this and report their findings. Having said that, I question the wisdom of "bothering" with this sort of thing. Everything you do that's non-standard or works against a system's default behavior incurs a cost. It's yet another thing you have to replicate when you migrate to a new version, change provisioning systems, etc.

And for what benefit? A few hundred syscalls per second? Linux syscalls are fast enough that something of that magnitude shouldn't matter much. Given that /etc/localtime will certainly be in cache with that frequency of access, a stat() should do little work in the kernel to return, so that won't be slow either.

It's good that they did some benchmarking to look at the differences, but this feels like a premature optimization to me. I can't imagine that this did anything but make their application a tiny fraction of a percent faster. Was it worth the time to dig into that for this increase? Was it worth the maintenance cost I mention in my first paragraph? I wouldn't think so.

I'm really trying not to take a crap on what they did; as I said, it's really cool to dig into these sorts of abstractions and find out where they're inefficient or leak (or just great as a learning exercise; we all depend on a mountain of code that most people don't understand at all). But, when looked at from a holistic systems approach, a grab bag of little "tweaks" like this can become harmful in the long run.

lotyrin 9 years ago

You should be able to get maintenance for something like this pretty low. Add a line with a nice comment to the config template for running your apps that adds an extra env var. Any machine that runs any app now has the line...
I would definitely like to see a before/after real-world metric on impact here though.
bradfa 9 years ago

The embedded Linux system I'm working on right now takes about 17 us per stat() call due to this but time will always be kept internally as UTC, so taking advantage of this is worth considering for me.
Since most embedded systems can directly translate the amount of processing needed to achieve the product goal into a real dollar cost of the hardware, any savings, even a small one, is worth investigating. Since the hardware and system are generally well understood, implementing something like this is much more reasonable.
But I agree, on a general purpose OS doing general purpose thing, an optimization like what's proposed by the article may not be worth the other tradeoffs.

scottlamb 9 years ago

There's another easy way to avoid this: use localtime_r instead of localtime. From the glibc source:

    /* Update internal database according to current TZ setting.
       POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
       This is a good idea since this allows at least a bit more parallelism.  */
    tzset_internal (tp == &_tmbuf && use_localtime, 1);

mktime also does the tzset call every time, though:

    time_t
    mktime (struct tm *tp)
    {
    #ifdef _LIBC
      /* POSIX.1 8.1.1 requires that whenever mktime() is called, the
         time zone names contained in the external variable 'tzname' shall
         be set as if the tzset() function had been called.  */
      __tzset ();
    #endif

and I don't see any way around that other than setting TZ=: or some such.

rocky1138 9 years ago

It is perhaps out of scope of the article, but it sure would have been helpful to show how to set the TZ environment variable and what to set it to.

mmozeiko 9 years ago
Its right there in middle of article (under "Preventing extraneous system calls" section):
```
    $ TZ=:/etc/localtime strace -ttT ./test
```
- rocky1138 9 years ago
  
  I can't believe I missed that! Thanks.

falsedan 9 years ago

Why did this post start with a tl;dr, then a Summary, and _still_ buried the important takeaway in the ultimate paragraph?

rtsisyk 9 years ago

BTW, packagecloud.io is the great hosting for RPM/DEB packages. We've been using it for the last couple years. GitHub + Travis CI + PackageCloud combination allows us build and publish packages for EVERY git commit in 30+ repositories targeting 15 different Linux distributions [1]. There is no more need to hire a special devops guy for that.

[1]: https://github.com/packpack/packpack#packpack

hodgesrm 9 years ago

Interesting article but I can't reproduce the behavior on Ubuntu 16.01 LTS. I don't have TZ set (or anything locale-related for that matter). Here are the library dependencies:

  $ ldd test
  linux-vdso.so.1 =>  (0x00007ffd80baf000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8844bf7000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f8844fbc000)

Any thoughts why the behavior would be different?

cnvogel 9 years ago

There's also a surprising difference in behavior between tm = localtime() and localtime_r(..., &tm).

The former is the "traditional" function which returns a pointer to a statically allocated, global "struct_tm". The latter is the thread-safe version receiving a pointer to a use-supplied "struct tm" as it's second argument.

    :   do {
    :           t = time(NULL);
    :           localtime_r(&t, &tm);
    :           printf("The time is now %02d:%02d:%02d.\n",
    :                  tm.tm_hour, tm.tm_min, tm.tm_sec);
    :           sleep(1);
    :   } while(--N);

with TZ set to Europe/Berlin, set to :/etc/localtime, or unset I never get a stat on anything.

    write(1, "The time is now 07:23:33.\n", 26The time is now 07:23:33. ) = 26
    nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0
    write(1, "The time is now 07:23:34.\n", 26The time is now 07:23:34. ) = 26
    nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0
    write(1, "The time is now 07:23:35.\n", 26The time is now 07:23:35. ) = 26
    nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0

If I change it to tm = localtime()...

    stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
    write(1, "The time is now 07:30:56.\n", 26The time is now 07:30:56.) = 26
    nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc868c3010) = 0

One more reason to switch to the reentrant/thread-safe versions of those ugly library functions :-).

Note, this is using glibc 2.24 under Arch.

    $ /lib/libc.so.6
    GNU C Library (GNU libc) stable release version 2.24, by Roland McGrath et al.
    (...)
    Compiled by GNU CC version 6.1.1 20160802.

acscott 9 years ago

This reminds me of setting noatime for disk mounts (http://askubuntu.com/questions/2099/is-it-worth-to-tune-ext4...)

Now I want to know the number of other configs to reduce the number of system calls. This all adds up to being significant the greater the number of hosts in your environment.

actuator 9 years ago

While trying to find the cause of slowness in Rails requests, I was running strace on an unicorn process when I encountered the same thing mentioned in the article.

Rails instrumentation code calls current time before and after any instrumentation block. So, when I looked at the trace there were a lot of `stat` calls coming for `/etc/localtime` and as stat is an IO operation, I thought I discovered the cause of slowness(which I attributed to high number of IO ops) but surprisingly when I saw the strace method summary; while the call count was high, the time taken by the calls in total was not significant(<1% if I remember correctly). So I decided to set TZ with the next AMI update 15 months back but forgot about it totally. I guess I should add it to my Trello list this time.

Also, I think he should have printed the aggregate summary of just CPU clock time(`-c`) as well as that is usually very low.

caf 9 years ago

...while the call count was high, the time taken by the calls in total was not significant(<1% if I remember correctly).
Yes, on ordinary filesystems if you run stat() over and over again on the same file then it's just copying from the in-memory inode into your struct stat, there's no IO.

astrostl 9 years ago

I do this for "not get annoyed while stracing" reasons, not perf!

rargulati 9 years ago

Really interesting - thanks for sharing the findings. I haven't seen it mentioned here, but for those of us using `timedatectl` via systemd, with the default setting of `UTC` are taking advantage[1] of the recommendation in the article.

[1] https://github.com/systemd/systemd/blob/master/src/timedate/...

cbsmith 9 years ago

This reads to me like a glibc bug. Glibc should just be watching "/etc/localtime" for changes, rather than calling out to hundreds of times a second.

geofft 9 years ago

It's hard to do that without any system calls, though.
The way you'd do this is to open an inotify (or platform equivalent) file descriptor on the first call to localtime(), and on future calls, only bother statting /etc/localtime again if that file descriptor reports something has changed. But checking if an FD has data requires a system call; you'd do a non-blocking read on the FD (or a non-blocking select or poll, or an ioctl(FIONREAD)). It's possible that system call is faster than stat, but it's still a system call.
You could do it with a thread that does a blocking read, but that's a mistake under the current UNIX architecture: lots of stuff (signal delivery and masks, for instance) gets weird as soon as you have threads, so unconditionally sticking a thread into every process using libc is a bad plan.
You could do it with fcntl(F_SETSIG), which would send you a signal (SIGIO by default) when the inotify file descriptor is readable, but you couldn't actually use SIGIO, since the user's process might have a handler. You'd need to steal one of the real-time signals and set SIGRTMIN = old SIGRTMIN + 1. (See https://github.com/geofft/enviable for a totally-non-production-suitable example of this approach.) This would probably mostly work, except changing SIGRTMIN is technically an ABI break (since programs communicate with each other with numerical signal values). You could maybe use one of the real-time signals that pthread already steals. Also, using signals is sort of a questionable idea in general; user programs now risk getting EINTR when /etc/localtime changes, which they're likely not to be prepared for. A single-threaded glibc program that sets no signal handlers never gets any EINTRs, which is nice.
In an ideal world, every program would have a standard event loop / mechanism for waiting on FDs, and libc could just register the file descriptor with that event loop. Then you'll get notified of /etc/localtime changing after the next run of the event loop, which is good enough, and it would take zero additional system calls. But unfortunately, UNIX wasn't designed that way, so something at libc's level can't assume the existence of any message loop. A higher-level library like GLib or Qt or libuv could probably do this, though.
- curlypaul924 9 years ago
  
  What about calling mmap to map /etc/localtime into memory? The file would still have to be parsed for every call to localtime(), but the system call is avoided.
  (Better yet, if it were possible to memory-map the directory entry for /etc/localtime, parsing could be avoided as well).
  I agree the best solution is to provide a more sensible API. Why should an application be limited to only _one_ timezone?
  - cbsmith 9 years ago
    
    mmapping doesn't solve the problem unless the file was edited in place (rather than overwritten as is commonly done). You'll just have an mmap on an inode that is no longer visible in the directory.
  - geofft 9 years ago
    
    I think I like that solution! It doesn't work if someone replaces /etc/localtime with a new file, but it works if they update it in place. It's low-overhead as long as /etc/localtime remains in cache; the process that updates it will write to the same pages. It's pretty high-overhead if /etc/localtime ever gets flushed from cache, though, since you have to go out to disk.
noselasd 9 years ago

There's no usable mechanisms glibc can use for watching /etc/localtime for changes that does not mess up the program if it also decides to use any file watching features.
- cbsmith 9 years ago
  
  At least on key platforms, it's pretty easy to use an event driven model and watch for updates. Hell, if the vDSO handled TZ it'd be no problem.
  - noselasd 9 years ago
    
    I'm not saying it's impossible, but if you actually sit down and attempt to add this to glibc, you will come to the realization that it is not easy at all. If you want to e.g. use the inotify mechanism you have a few key decision points to make
    * There's no place where glibc can run an event loop to watch for the changes, the application might not have an event loop.
    * If you need to integrate with an event loop the application runs, it'll work fine as long as the user remembers to hook up the events and deal with the corner cases. You can use this approach without any special support for glibc, and set TZ env. variable yourself when /etc/localtime changes - though at the moment that will only work for single threaded programs (setenv()/putenv() is not thread safe)
    * If you decide to run the event loop in a separate thread, you force every binary to be multi threaded and have quite a lot of corner cases to handle when fork()'ing.
    * If you don't run an event loop, you're back to polling for changes, and hardly anything is gained.
    * If you use the signal driven I/O notification mechanism, you interfer with the application use of signals and I/O notification, and also have a host of fork() corner cases to consider.
    vDSO is not a magic silver bullet that can solve this, you would at least have to have the kernel manage timezone support, or perhaps better, have the ability to transparently manage arbitarily data, and then expose that through shared memory that vDSO can use. This will not happen anytime soon.
    
    cbsmith 9 years ago
    
    I don't think it is quite so bad. The fork/event loop/thread problems already get dealt with for a variety of other cases. It's handy when you can coordinate with the runtime itself and exploit the level of indirection that the rest of the runtime has.
gumby 9 years ago

And how does it watch it? With the stat() syscall.
- Dylan16807 9 years ago
  
  Polling is not watching.

drudru11 9 years ago

Side note - why do some sites completely hide information about who is behind them? I couldn't find a single thing about that on their blog or main site.

avar 9 years ago

It's not in any way hidden, just run a whois query:

    $ whois packagecloud.io|grep Owner
    Owner Name    : JOSEPH DAMATO
    Owner OrgName : COMPUTOLOGY, LLC
    Owner Addr    : 359 FILLMORE ST!12
    Owner Addr    : SAN FRANCISCO
    Owner Addr    : CA
    Owner Addr    : US

jasonmp85 9 years ago

Maybe it's just that I use packagecloud and follow people in their circle on Twitter, but Joe Damato is the CEO and founder: https://twitter.com/joedamato
I'm under the impression that he may also write a lot these himself? Not entirely certain, though.
- jcdavis 9 years ago
  
  Pretty sure he writes all of the linux internals posts. He also has a bunch of great ones on his blog http://timetobleed.com/
jlg23 9 years ago

All the links are in the footer of the page.
- ryanlol 9 years ago
  
  For what it's worth, they disappear if your browser isn't wide enough.
  - jcapoteOP 9 years ago
    
    Oops! I'll fix this shortly, thanks for pointing it out.

kseistrup 9 years ago

It would be highly inconvenient to have to set this variable if you live in a country where you change the timezone twice a year due to summertime.

alexfoo 9 years ago

If TZ is set properly then you don't need to change it twice a year.
(man tzset for more info and examples of different values TZ can be set to. Mine is set to "Europe/London" which handles the DST switches automatically.)
- kseistrup 9 years ago
  Hm…, now that I read TZSET(3): Wouldn't that be
  TZ=:Europe/London
  It doesn't seem tzset() will accept the format
  TZ=Europe/London ?
  - noselasd 9 years ago
    
    Yes, but " If the colon is omitted each of the above TZ formats will be tried." , so it'll be figured out even if the colon is missing.
- kseistrup 9 years ago
  
  Ah yes, thanks for pointing that out!
stormbrew 9 years ago

Are you talking about DST? You don't change your timezone at DST switch, the timezone data knows how to calculate the correct time based on your location (roughly) and the time of year. Mountain standard time and mountain daylight time are both part of mountain time, for example.
- kseistrup 9 years ago
  
  What I meant was: Currently my timezone is CET. In a month's time it will be CEST. If I have to set TZ to explicitly mirror that it will be a burden.
  - Symbiote 9 years ago
    
    TZ may be set like this:
    TZ=:Europe/Copenhagen
    Replace Europe/Copenhagen with the appropriate entry from the list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
    Usually, /etc/localtime is a symlink, as on my laptop:
    /etc/localtime -> /usr/share/zoneinfo/Europe/Copenhagen
    so TZ=:/etc/localtime has the same result.
    You can demonstrate that it takes account of changes to timezones:
    Normal time for London:
    $ TZ=:Europe/London date -d '1995-12-30 12:00 UTC' -R Sat, 30 Dec 1995 12:00:00 +0000
    British Summer Time:
    $ TZ=:Europe/London date -d '1995-06-30 12:00 UTC' -R Fri, 30 Jun 1995 13:00:00 +0100
    'Double British Summer Time', used for a period during World War 2:
    $ TZ=:Europe/London date -d '1945-06-30 12:00 UTC' -R Sat, 30 Jun 1945 14:00:00 +0200
    Before London's time was standardized to Greenwich:
    $ TZ=:Europe/London date -d '1845-06-30 12:00 UTC' -R Mon, 30 Jun 1845 11:58:45 -0001
    
    tripzilch 9 years ago
    
    wait, so when I did `echo $TZ`, to check, and got `Europe/Amsterdam` that means it's already properly and I did't need to set it to `:/etc/localtime`?
    this is not a part of Linux I'm very familiar with
    
    kseistrup 9 years ago
    
    Thanks!
zeveb 9 years ago

Do all your work in UTC. Seriously, you'll be glad later.
Convert to local time only on the edges, and only for end users.
- kseistrup 9 years ago
  
  All my VPSes run in UTC, no exception. What I'm talking about here is my desktop machine. I strongly prefer to run that in localtime.
  - zeveb 9 years ago
    
    If you set your machine to the DST timezone for your political unit, then the displayed time will automatically jump back and forth on the correct dates, as long as your time database is reasonably up-to-date; you don't need to manually change from non-DST to DST & back.
    The non-DST timezones are for those who are lucky enough to live in political units which tell DST to get bent, and just stay on real time all year long.
    DST delenda est.
    
    kseistrup 9 years ago
    
    Thanks, I didn't know that.
    I believe I prefer the
    TZ=:Continent/City
    notation in this case, though.
deathanatos 9 years ago

Starting or stopping Daylight Saving Time is not a timezone change. A timezone is — roughly — a set of timekeeping rules that some set of people use. DST is just part of those timekeeping rules. That is, your timezone is not the same thing as your UTC offset.
For example, the timezone America/New_York contains information not only about the UTC offset, but the offset during and not during DST, and when DST starts and ends. (And historical starts and ends to, so that UTC → local conversions (and vice versa) use the rules that were present at that time, not the rules that are present now, which may be different.)
E.g., my home desktop runs in the timezone America/Los_Angeles all year around. Most of my servers run in the timezone UTC all year. Both always have the appropriately correct time.
manarth 9 years ago

TZs should be defined in the form "Europe/London", "Asia/Beirut", "Pacific/Auckland", etc.
Although the hour offset can change at extremely short notice (e.g. discontinuing Daylight Savings during Ramadan [1]), the timezone declaration (e.g. "Africa/Casablanca"), shouldn't need to change, just the underlying timezone database.
[1] https://en.wikipedia.org/wiki/Daylight_saving_time_in_Morocc...
- kseistrup 9 years ago
  That's not how I read TZSET(3). According to TZSET(3) you can either use (example for Copenhagen shown)
  TZ=CET
  or
  TZ=:Europe/Copenhagen
  but not
  TZ=Europe/Copenhagen
  - manarth 9 years ago
    
    Apologies, I didn't explain my response very well.
    You're totally right to say that in the context of TZSET, to load the timezone specification from a TZ-formatted file it needs to be prefixed with ':'.
    Rather than saying "Technically, you should set the value of TZ to 'Europe/London'", I was trying to say that my philosophical opinion of timezone recording - whether in a CMS, on an O/S level, in a compiled app, etc - is that it should start with the standard of geographical location.
    There might be occasions to augment that with other data, maybe including the UTC offset for that TZ at a particular time, but the TZ specification can be subject to frequent change, whilst a description such as "Europe/London" changes infrequently and seems to have the least ambiguity.
    
    kseistrup 9 years ago
    
    Ok, we agree then.

barrystaes 9 years ago

Im not an expert but the first thing that comes to mind is that 1) TFA does not quantify the performance gain in time 2) I wonder if environment variables like TZ are a security risk/vector in that these might facilitate attackers to stealthy skew/screw time within current user process... no root required.

jakeogh 9 years ago

OpenRC users can:

  echo 'TZ=:/etc/localtime' > /etc/env.d/00localtime

JensRex 9 years ago

For Debian and derivatives:

    echo 'TZ=:/etc/localtime' >> /etc/environment

rtsisyk 9 years ago

Overhead of localtime() is well-known, just RTFM. Anyway, this article provides very good explanation.

creeble 9 years ago

Does anyone have any evidence of this actually having a resource usage impact on any common programs?

I see one reference to Apache below, but not whether it actually made a measurable difference.

vesinisa 9 years ago

This was a thoroughly fascinating read. Highly recommend reading the previous part in the series as well.

mozumder 9 years ago

Does this affect FreeBSD as well?

loeg 9 years ago

Experimentally, no. The example program calls localtime(3) 10 times but only accesses the file once, per truss:

    write(1,"Greetings!n",11)			 = 11 (0xb)
    access("/etc/localtime",R_OK)			 = 0 (0x0)
    open("/etc/localtime",O_RDONLY,037777777600)	 = 3 (0x3)
    fstat(3,{ mode=-r--r--r-- ,inode=11316113,size=2819,blksize=32768 }) = 0 (0x0)
    read(3,"TZif20000000000000"...,41448) = 2819 (0xb03)
    close(3)					 = 0 (0x0)
    issetugid()					 = 0 (0x0)
    open("/usr/share/zoneinfo/posixrules",O_RDONLY,00) = 3 (0x3)
    fstat(3,{ mode=-r--r--r-- ,inode=327579,size=3519,blksize=32768 }) = 0 (0x0)
    read(3,"TZif20000000000000"...,41448) = 3519 (0xdbf)
    close(3)					 = 0 (0x0)
    write(1,"Godspeed, dear friend!n",23)		 = 23 (0x17)

(FreeBSD caches the database on the first call: https://svnweb.freebsd.org/base/head/contrib/tzcode/stdtime/... )

rini17 9 years ago

It is unlikely, they usually don't use GNU C library.

dpatru 9 years ago

It seems to me that this is something the Linux distributions should already be doing.

sandGorgon 9 years ago

does anyone know if this impacts docker images as well ?

pquerna 9 years ago

Yes, at least the Docker images that use glibc as their libc. (eg, most Debian/Ubuntu images)
It looks like musl, which is used on Alpine Linux images for example, will only read it once, and then cache it:
https://github.com/esmil/musl/blob/master/src/time/__tz.c#L1...
It has a mutex/lock around the use of the TZ info, but avoids re-stat'ing the localtime file.
- nathancahill 9 years ago
  
  This is the best part of HN. Not only did you answer GP's question in 6 minutes, but you link to the exact line of the source code.
noselasd 9 years ago

If you havn't set the TZ variable in the image, then yes.
yeukhon 9 years ago

I would be surprised if this doesn't. There's nothing specific about docker when running this C code from any other processes running on a Linux host.

Settings

How setting the TZ environment variable avoids thousands of system calls

Keyboard Shortcuts