Settings

Theme

Beyond Ctrl-C: The dark corners of Unix signal handling

sunshowers.io

165 points by PuercoPop 2 years ago · 75 comments

Reader

chrsig 2 years ago

My favorite signal surprise was running nginx and/or httpd in the foreground and wondering why on earth it quit whenver i resized the window.

Turns out, they use SIGWINCH (which is sent on WINdow CHange) for graceful shutdown.

It's a silly, silly problem.

  • eadmund 2 years ago

    > Turns out, they use SIGWINCH (which is sent on WINdow CHange) for graceful shutdown.

    That’s … that’s even worse than people who send errors with an HTTP 200 response code.

    • aunderscored 2 years ago

      Disagree. Annoyingly there is a reasonable case for 200 but with an error, if http is your transport but not your application, then 200 says "yes, the message was transfered and understood correctly, here is your response" which may be an error response from the application

      • eadmund 2 years ago

        If you’re using HTTP for something other than transferring hypertext — i.e., if your application is not a hypermedia application — then you are doing something just as wrong as encoding IP in DNS packets or email messages. Don’t do that. It’s wrong, even if it is technically interesting.

        If, OTOH, your application is a hypermedia application, then returning a success status for errors is just wrong.

        • aunderscored 2 years ago

          Every JSON API under the sun disagrees, but I do agree in principle. People very much like using HTTP as a JSON (or XML) transfer protocol

        • sunshowers 2 years ago

          This ship sailed the day the first HTTP proxy was installed, and likely well before that.

        • andreyvit 2 years ago

          Sorry, what? HTTP is perfectly fine for APIs which are not hypermedia.

      • Izkata 2 years ago

        For example: Apache (httpd) replaces the 4xx and 5xx response body with its own content instead of whatever you'd returned from an external handler like wsgi. You have to use a 2xx (except for 204) to get a relevant error message back out.

        • AdieuToLogic 2 years ago

          > For example: Apache (httpd) replaces the 4xx and 5xx response body with its own content instead of whatever you'd returned from an external handler like wsgi.

          This is the default behavior. Apache httpd can be configured to produce different responses by way of ErrorDocument[0]. From the documentation:

            Customized error responses can be defined for any HTTP
            status code designated as an error condition - that is,
            any 4xx or 5xx status.
          
          HTH

          0 - https://httpd.apache.org/docs/trunk/custom-error.html

          • jjnoakes 2 years ago

            Even with custom error documents configured in the web server, you still lose the application-specific (and probably request- and error-specific) message generated by the application itself.

            • Izkata 2 years ago

              Yeah, this is how we ran across it - whoever originally wrote a particular feature was trying to do the right thing by using an HTTP error code, but with a message that would be presented to the user about why that operation failed. A generic response wouldn't work, there were multiple possible reasons all fixable by the user, and tying a whole error code to one specific feature would've probably been a bad idea anyway.

      • Groxx 2 years ago

        Which is why "you resized the terminal window, clearly you meant to shut down this web server" is even crazier, yes

    • thezilch 2 years ago

      That's ... not what most people are doing. People send _application_ errors on HTTP 200 response codes, because HTTP response codes are for HTTP and not applications. Most "REST" libraries and webdev get this wrong, building ever more fragile web services.

      • ChocolateGod 2 years ago

        Applications using status codes is useful because it can tell browsers and load balancers to not cache the page in a uniform way.

      • sunshowers 2 years ago

        I don't think the distinction is as clear-cut as you're making it out to be.

        For example, HTTP 409 Conflict generally means an application-level conflict (e.g. an optimistic concurrency mechanism detected a conflict).

        HTTP 422 Unprocessable Entity is also usually an application-level error (e.g. hash validation failure, or identifier not recognized by the server).

      • LoganDark 2 years ago

        Task failed successfully

    • chrsig 2 years ago

      y'know...what really is an error, anyway?

  • thayne 2 years ago

    Why? That's what SIGTERM is for.

    • chrsig 2 years ago

      No clue what the decision making process was.

      There's a bug report for httpd dating back to 2011[0]. The nginx mailling list also has a grumpy person contemporary with that[1].

      My guess is someone thought "httpd is a server running somewhere without a monitor attached, why on earth would it get a SIGWINCH!? surely it's available to use for something completely different", not considering users running it in the foreground during development. Nginx probably followed suit for convention, but that's pure speculation on my part.

      Also that was before docker really took off (I'm not sure if it was around in 2011 yet; still in it's infancy maybe). Running it in the foreground didn't happen as much yet. People were still using wamp or installing it via apt and restarting via sudo.

      [0] https://bz.apache.org/bugzilla/show_bug.cgi?id=50669

      [1] https://mailman.nginx.org/pipermail/nginx/2011-August/028640...

      • hulitu 2 years ago

        > why on earth would it get a SIGWINCH!?

        Reminds me of those "/* not reached */" stories.

    • lolinder 2 years ago

      They use SIGWINCH for gracefully shutting down workers but not the main process [0]. SIGQUIT is used for a graceful shutdown and SIGTERM for a sort of graceful shutdown (with timeouts).

      SIGWINCH is apparently used for an online upgrade [1]. Because it only shuts the workers down you can quickly transition back to the old binary and old configuration if there's a problem, even after upgrading the binary or config stored on the hard drive.

      I'm sure there are other ways to get a similar capability, but this set of signals is apparently what they came up with.

      [0] http://nginx.org/en/docs/dev/development_guide.html#processe...

      [1] https://www.digitalocean.com/community/tutorials/how-to-upgr...

    • ibash 2 years ago

      I tried to find out why.

      Unfortunately the change that introduces it predates the official release by a few months. And predates the mailing list by about a year:

      https://trac.nginx.org/nginx/changeset/5238e93961a189c13eeff...

  • ykonstant 2 years ago

    I don't know whether to laugh or cry.

    • chrsig 2 years ago

      definitely laugh! life's too short, you'll never get out alive :)

layer8 2 years ago

> Another common extension is to use what is sometimes called a double Ctrl-C pattern. The first time the user hits Ctrl-C, you attempt to shut down the database cleanly, but the second time you encounter it, you give up and exit immediately.

This is a terrible behavior, because users tend to hit Ctrl-C multiple times without intending anything different than on a single hit (not to mention bouncing key mechanics and short key repeat delays). Unclean exits should be reserved for SIGQUIT (Ctrl-\) and SIGKILL (by definition).

  • tripdout 2 years ago

    If you don't know about it, sure, but I find it's kind of convenient to get a safe shutdown and then be able to easily say "I don't care, just stop this program" without needing a separate kill -9 command or something.

    • wombatpm 2 years ago

      Kids these day. Try resetting server windows on a sgi.

      Subject: -42- How can I restart the X server? Date: 10 Sep 1995 00:00:01 EST

        To restart the X server (Xsgi) once, do any one of the following
        (in increasing order of brutality):
      
        - killall -TERM Xsgi
        - hold down the left-Control, left-Shift, F12 and keypad slash keys
          (this is fondly known as the "Vulcan Death Grip")
        - /usr/gfx/stopgfx; /usr/gfx/startgfx
        - reboot
      
        To restart the X server every time someone logs out of the console,
        edit /var/X11/xdm/xdm-config, change the setting of
        "DisplayManager._0.terminateServer" from "False" to "True" and do
        'killall -HUP xdm'.
    • layer8 2 years ago

      As I wrote, Ctrl-\ should do the trick. And it’s just not practical having to know which program applies the double pattern, and having to train yourself to not accidentally hit Ctrl-C twice.

      • __MatrixMan__ 2 years ago

        My brush with the double-ctrl-C pattern was in a place that wrote a lot of Java. It was generally frowned on to write any code that relied on signals which windows users can't send, and if I recall, Java made it quite difficult anyhow.

        Windows does have a tradition of using ctrl-c to quit though, so SIGINT ends up being one of the few that you can use in both places. It's not pretty, but giving it a different meaning based on how many times you've ordered it seems like a somewhat natural next step, if a hacky one.

  • bonzini 2 years ago

    In the Meson build system's test harness, a single Ctrl-C terminates the longest running test with a SIGTERM; while three Ctrl-C in a second interrupt the whole run as if you sent the harness a SIGTERM. This was done because it's not uncommon that there are hundreds of tests left to run and you have seen what you want, and it's useful to have an intuitive shortcut for that case.

    However, in both cases it's a clean shutdown, all running are terminated and the test report is printed.

  • jcelerier 2 years ago

    > Unclean exits should be reserved for SIGQUIT (Ctrl-\) and SIGKILL (by definition).

    I don't know how it works on your keyboard but on french layout, Ctrl-\ is a two-hands, three-fingers, very unpleasant on the wrist, keyboard shortcut. Not a chance I'd use that for such a common operation.

    • mananaysiempre 2 years ago

      The byte that sends SIGQUIT is very much configurable with stty quit ^X , but unfortunately X has to be a-z or one of \]^_ (that is, 0x41 through 0x5F except 0x5B = [ which would conflict with other uses of ESC = ^[ = 0x1B) because of how the Ctrl modifier traditionally works. Looking at a map of AZERTY, I don’t see any good options, but you may still want to experiment.

      • jks 2 years ago

        Curiously, on many terminal emulators the following work:

        Ctrl-2 = Ctrl-@ = NUL byte

        Ctrl-3 = Ctrl-[ = ESC

        Ctrl-4 = Ctrl-\ = default for SIGQUIT

        Ctrl-5 = Ctrl-] = jump to definition in vim

        Ctrl-6 = Ctrl-^ = mosh escape key

        Ctrl-7 = Ctrl-_ = undo in Emacs

        I think these probably originate in xterm.

      • cperciva 2 years ago

        I map SIGQUIT to ^Q because that's the easiest to remember.

    • remram 2 years ago

      I think the point is that it is not to be a common operation.

      • jcelerier 2 years ago

        well I don't know, it feels like I must mash ctrl-c twenty times per day on average at least

    • Sophira 2 years ago

      While on UK keyboards it's the opposite "problem" - the left Ctrl key and the \ key are right next to each other (making it potentially a one-finger operation), which is the opposite of how a US keyboard is laid out (where Ctrl-\ was presumably intended to need to be a two-handed, two-finger operation).

      • Izkata 2 years ago

        > which is the opposite of how a US keyboard is laid out (where Ctrl-\ was presumably intended to need to be a two-handed, two-finger operation).

        We have a right Ctrl, so one-hand two-finger.

      • LtWorf 2 years ago

        two handed operations shouldn't exist.

        • Sophira 2 years ago

          I completely agree - they're very inaccessible. That's why I quoted the word "problem"; it's not actually a problem at all.

      • mananaysiempre 2 years ago

        stty quit ^] ?

  • marcosdumay 2 years ago

    It's worse, because there are languages that encode interruption into the error handling functionality, so it's common that people mismanage their errors and programs require several Ctrl-C presses to actually reach the interruption handler.

    What means that you have to memorize a list of "oh, this program needs Ctrl-C 3 times; oh, this program must only receive Ctrl-C once!"... I don't know of any "oh, this program needs Ctrl-C exactly 2 times", but it's an annoying possibility.

    • wongarsu 2 years ago

      Any software I've come across that uses intentional double ctrl-c shows a message after the first ctrl-c. Something to the effect of "shutting down gracefully, press ctrl-c again for immediate shutdown".

      Hence you can just press it once and wait half a second, if no message to this effect appears you can spam ctrl-c.

  • bcrl 2 years ago

    That shouldn't matter. Your database should be consistent in the face of an unclean exit. ACID has been around for a long time.

  • Levitating 2 years ago

    They can print a message that states that it is attempting to quit cleanly but can be forced to quit by pressing Ctrl+C another time(s). Unison does this.

  • sunshowers 2 years ago

    While I agree in spirit, I also want to meet users where they are.

cperciva 2 years ago

The article doesn't mention the most useful of all signals: SIGINFO, aka "please print to stderr your current status". Very useful for tools like dd and tar.

Probably because Linux doesn't implement it. Worst mistake Linus ever made.

Also, it talks about self-pipe but doesn't mention that self-socket is much better since you can't select on a pipe.

  • epcoa 2 years ago

    > self-socket is much better since you can't select on a pipe.

    This needs further explanation. Why can’t you select on a pipe? You certainly can use select/poll on pipes in general and I’m not sure of any reason in particular they won’t work for the self pipe notification.

    Its even right in the original: https://cr.yp.to/docs/selfpipe.html

    • cperciva 2 years ago

      Oops, brainfart. Sadly it's too late for me to edit that comment.

      Yes, you can select just fine on pipes. What I was thinking of is that recv and send doesn't work on pipes, and asynchronous I/O frameworks typically want to use send/recv rather than write/read because the latter don't have a flags parameter.

  • sunshowers 2 years ago

    Thanks for the feedback! As the talk and the post both mentioned, I was focusing on signals that work on all Unix platforms. Within the constraints of a 30 minute talk there must be material left on the cutting room floor. (If I started talking about the specifics of various Unix lineages I could fill up a whole day...)

    For most users in the real world, self-pipes are sufficient. This includes mio (Tokio's underlying library)'s portable Unix implementation of wakers (how parts of the system tell other parts to wake up).

  • avidiax 2 years ago

    SIGSTOP and SIGCONT are very useful as well.

    SIGSTOP is the equivalent of Ctrl-Z in a shell, but you can address it to any process. If you have a server being bogged down, you can stop the offending process temporarily.

    SIGCONT undoes SIGSTOP.

    The cpulimit tool does this in an automated way so that a process can be limited to use x% of CPU. Nice/renice doesn't keep your CPU from hitting 100% even with an idle priority process, which may be undesirable if it drains battery quickly or makes the cooling fan loud.

    • sunshowers 2 years ago

      Note Ctrl-Z is actually SIGTSTP, which is basically "SIGSTOP except the process can install a signal handler for it".

      I have a very exciting blog post about debugging a nasty bug with how SIGTSTP works, coming very soon.

  • fragmede 2 years ago

    dd prints out status when sent SIGUSR1, but yeah that would be cool if other utilities did that as well off SIGINFO.

    • cperciva 2 years ago

      And does ^T map to SIGUSR1? That's the other thing which makes it so useful in BSD.

      • saagarjha 2 years ago

        You wouldn’t want it to, because the default behavior for SIGUSR1 is to terminate.

        • cperciva 2 years ago

          Exactly. Whereas on BSD hitting ^T is (a) very likely to print useful information, and (b) if it doesn't do that, won't do anything at all.

efxhoy 2 years ago

I recently wrote a little data transfer service in python that runs in ECS. When developing it locally it was easy to handle SIGINT: try write a batch, except KeyboardInterrupt, if caught mark the transfer as incomplete and finally commit the change and shut down.

But there’s no exception in python to catch for a SIGTERM, which is what ECS and other service mangers send when it’s time to shut down. So I had to add a signal handler. Would have been neat if SIGTERM could be caught like SIGINT with a “native” exception.

  • mananaysiempre 2 years ago

      from signal import SIGTERM, raise_signal, signal
      import sys # for excepthook
      class Terminate(BaseException):
          pass
      def _excepthook(type, value, traceback):
          if not issubclass(type, Terminate):
              return _prevhook(type, value, traceback)
          # If a Terminate went unhandled, make sure we are killed
          # by SIGTERM as far as wait(2) and friends are concerned.
          signal(SIGTERM, _prevterm)
          raise_signal(SIGTERM)
      _prevhook, sys.excepthook = sys.excepthook, _excepthook
      def terminate(signo=SIGTERM, frame=None):
          signal(SIGTERM, _prevterm)
          raise Terminate
      _prevterm = signal(SIGTERM, terminate)
  • Spivak 2 years ago

    I mean you can just have the signal handler throw StopRequested in your Python boilerplate and never think about it again.

    One common pattern is raising KeyboardInterrupt from your handler so it's all handled the same.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection