Writing systemd units that stop gracefully before shutdown

228 points by dghubble 3 years ago · 56 comments (54 loaded)

Reader

A much more challenging task is writing a systemd unit that starts gracefully before shutdown. I wanted to write a unit that could issue an API call to delete the instance rather than doing a normal power off. Putting it at a reasonable place in the sequence took a lot of trial and error! The trick is actually to have the unit be started at some point in normal boot up (e.g. “armed”) and then do the actual task when the unit is stopped.

Here’s the unit I ended up with: https://github.com/CGamesPlay/infra/blob/master/private-serv...

hinkley 3 years ago

I tend to push the metaphor of software being meant to be read and only incidentally to be run by a computer as far as I can. We are telling a story to future us or our successors. Stories have rules and it's jarring when you violate them.
There's an idea in software that's a bit like the corollary of Chekhov's gun. Chekhov's gun is about not presaging jarring story elements that will never come to pass. But it's nearly as jarring to leave important story arcs as complete surprises until the end. Producing the gun moments before the curtain goes down would be quite a WTF. We didn't know that was a possibility. That's a niche that some people occupy, but it is a niche.
Introducing things early fights with Locality of Reference, but when we're talking about things of deep, dramatic importance (like an attempted murder, or a reaper process) it's important to introduce that "character" early in the story so that people know that it exists. Failing to do so is a form of deus ex machina and we only appreciate that in very small doses.
So framed that way, I don't see a problem with having to start a killswitch while you're spinning everything up. It's there, people can see it, and know to ask questions about it.
- dec0dedab0de 3 years ago
  
  If software is meant to be read, and only incidentally run, then having it in the config to only run at shutdown is still introducing it early on. As a matter of fact, I think it would be like introducing the the reaper character in the very beginning and keeping them around in every scene sitting silently in the background until it was time for their one line at the very end. Many people might not notice, but to someone who does, it would be very odd indeed.
vngzs 3 years ago
I have the following unit file saved for that purpose:
```
    [Unit]
    # https://stackoverflow.com/questions/36729207/trigger-event-on-aws-ec2-instance-stop-terminate
    Description=unlink agent from remote server
    Before=shutdown.target
    [Service]
    Type=oneshot
    EnvironmentFile=-/etc/environment
    KillMode=none
    ExecStart=/bin/true
    ExecStop=/opt/service-name/shutdown-unlink
    RemainAfterExit=yes
    User=root
    [Install]
    WantedBy=multi-user.target
```
If I recall correctly, the KillMode=none is important as it causes the shutdown-unlink binary to escape systemd process supervision. Without it, you may deal with systemd immediately halting your shutdown unit (and killing the process) when it hits the shutdown target.
- AnssiH 3 years ago
  
  The "Before=shutdown.target" is superfluous, that is a default dependency (as regular services obviously need to be shut down before system shutdown).
  "KillMode=none" shouldn't be necessary, unless your command leaves processes running that need to stay running (and in that case you have other problems, since those processes may not actually finish before the system is shut down). Systemd waits for your service to stop before reaching shutdown.target (per the default Before= and Conflicts= dependencies).
  Since your command probably needs networking, you also want "After=network.target" to ensure your command runs before network is torn down, as covered in sibling comments.
  - vngzs 3 years ago
    
    Thanks. I ripped it from a colleague, and now I have some bugs to report.
- CGamesPlay 3 years ago
  
  You unit races with the network being brought down, since it isn't listed as a "Before" target, FYI.
  - AnssiH 3 years ago
    
    They need After=network.target, not Before=network.target.
    The shutdown order is the reverse of startup order, and they execute the payload on shutdown (ExecStop).
  - boring_twenties 3 years ago
    
    I could be mistaken, but I don't think even that would be sufficient? Before would make your command execute before the network is brought down, but it wouldn't have the latter wait for your command to actually complete.
    
    dghubbleOP 3 years ago
    
    The post shows long-running stop scripts / containers and demonstrates them delaying shutdown (not with KillMode none though)
    @CGamesPlay network.target's "primary purpose is for ordering things properly at shutdown: since the shutdown ordering of units in systemd is the reverse of the startup ordering, any unit that is order After=network.target can be sure that it is stopped before the network is shut down if the system is powered off."
    https://www.freedesktop.org/wiki/Software/systemd/NetworkTar...
dghubbleOP 3 years ago

That's what this post builds to solving at the end and in the next post - having a unit deletes the instance from a cluster before shutdown
SrslyJosh 3 years ago

Sadly, I think this would be child's play with SysV init.

SoftTalker 3 years ago

I fundamentally disagree with the idea that software should require or even expect a graceful shutdown. You can never stop the user from yanking the power cord out of the socket, which is what they will do if you force a bunch of housekeeping to happen before shutdown.

You have to deal with crash/power failure recovery anyway. So do your housekeeping on startup. Shutdown should be a quick and simple termination.

empthought 3 years ago

This is a weird take; most systems in data centers don’t have people walking from rack to rack yanking power cords, and most consumer systems don’t even have a power cord to yank.
- bravetraveler 3 years ago
  
  While I agree it's a bit of a weird take, for example -- there may be performance tradeoffs made in any given workload to make the disk consistent, inconsistently
  The 'most' there is doing some effort
  It is actually quite a common practice for those being audited for disaster recovery to do exactly that -- yank cables. More realistically, flip some switches
  We do it once a year, set aside a region and time... then test our processes
  It serves a few purposes, most importantly -- are our services fault tolerant, and can we bring them back?
  I think it's reasonable to trap the signals and make a best effort basis, knowing that PID 1 (or the environment) will eventually have to SIGKILL you -- ready or not
  Just because we can't save all of the state doesn't mean we shouldn't try
  - empthought 3 years ago
    
    Right, there are failure modes that have to be tested and accounted for, and one of them is the state being inconsistent after a shutdown.
    The previous poster seemed to advocate for not thinking of this as a failure mode at all but rather normal operation, which I just don’t see as true.
    
    tokenrove 3 years ago
    
    This paper was influential with regards this idea: https://www.usenix.org/conference/hotos-ix/crash-only-softwa...
    I don't think it's that unusual, but obviously there are tradeoffs.
    
    bravetraveler 3 years ago
    
    Totally, it's certifiably untrue!
    Take the InnoDB storage engine in MySQL/MariaDB for example.
    For performance (and likely other) reasons, this file only grows. It never shrinks... it will only go to 0 or grow.
    The DB (or individual tables, depending on config) have to be truncated/emptied to reclaim those blocks.
    Stop it uncleanly and there's a good chance you'll have to sacrifice a considerable amount of the data just to get the engine to start
    This and countless other things have to make consistency trade-offs. While everything could be written to only operate atomically, it will also slow to a crawl.
- comex 3 years ago
  
  > and most consumer systems don’t even have a power cord to yank.
  Some do. And the rest occasionally forcibly reboot (kernel panic or hardware failure), need to be manually forcibly rebooted (due to frozen UI), or unexpectedly loose battery, all leading to the same outcome. At least, that’s been my experience with just about every computer, phone, tablet, smartwatch, game console, and smart TV I’ve ever owned. Plus a number of routers. Is your experience different?
- kentonv 3 years ago
  
  Weird take?
  It is a table stakes expectation for most servers that they will not lose data when the power goes out, or when the kernel panics, or when the server itself crashes or runs out of memory. If your software requires graceful shutdown, that seems to imply that it will lose data in all those cases.
  You can perhaps use graceful shutdown to perform some optimization that allows subsequent startup to go faster, e.g. put things in a clean state that avoids the need for a recovery pass on the next startup... but these days with good journaling techniques "recovery" is generally very fast. When that's the case, it's arguably better to always perform non-graceful shutdown to make sure you are actually testing your recovery code, otherwise it might turn out not to work when you need it.
  So yeah, I agree with SoftTalker. Assume all shutdowns will be sudden and unexpected, and design your code to cope with that.
  - empthought 3 years ago
    
    That software should not “even expect” a graceful shutdown is the weird take.
    Servers can and do “lose data” all the time when they’re shut down unexpectedly. I don’t know why you’d think they don’t. If the data has been read from somewhere (a socket maybe) and not fsynced, it’ll be lost. I agree that the system needs to be designed in such a way that this is a recoverable state, but I disagree with the ideas that applications should not have a mode where buffers can accumulate for some period of time without being fsynced, and that there should be no attention paid to the common case, which is planned process stops (aka SIGTERMs) for a variety of reasons. System shutdown just being one of those reasons.
    
    kentonv 3 years ago
    
    By "lose data" I mean losing a confirmed write. That is, the server got a request to modify some state, and it responded to the request indicating success, but the change is later lost. Generally it's expected that databases will not lose confirmed writes, unless the application has explicitly made the decision that this is acceptable and opted into possible data loss to improve performance.
    
    lmz 3 years ago
    
    I think in that situation they expect you to not ack the data before you called fsync (which is the same expectation most people have of their SQL database). Then the remote end can retry the operation.
- aequitas 3 years ago
  
  They do. Reminds me of a previous job long ago where a datacenter tech was checking where a network cable went by tugging on it, resulting in a network switch blade being yanked out of a chassis, bringing down half of the production environment.
  These thing did happen, can happen and will happen.
  Even in modern cloud environments. AWS might consider the hardware your EC2 VM is running on unstable, prompting you to replace/move the VM within 24 hours (if it has not already brought down by hardware failure).
thfuran 3 years ago

If I'm turning on a computer, it's because I want to use it right now. If I'm turning off a computer, it's because I don't need to be using it right now. I guess you do need to make sure shutdown is fast enough that a laptop won't start cooking itself if someone tells it to shut down and then immediately sticks it in a bag, but it seems great more useful in general to optimize for startup time. Providing the happy path of a clean shutdown is useful for that, even if you do still occasionally need to handle power failure recovery.
nerdponx 3 years ago

Hope for the best, plan for the worst, right?
Here's a contrived analogy: modern airplanes are designed to stay in the air even if an engine burns out, but we would still rather fly with both engines at full power whenever possible.
jefftk 3 years ago

Just because you need to be able to handle an employee being hit by a bus doesn't mean employees should ghost their companies, or that companies shouldn't have systems for when someone gives their two weeks notice.
The article gives the examples that "A load balancer might stop accepting new connections and disable its readiness endpoint. A database might flush to disk. An agent might inform a cluster it’s leaving the group." All of these seem like they're worth doing, and improve expected case shutdown behavior, though you should also write and test the abrupt shutdown case.
code_biologist 3 years ago

100%
https://en.wikipedia.org/wiki/Crash-only_software
mixmastamyk 3 years ago

The twenty years of laptops I've had wouldn't even flinch at a power cord disconnect.
maw 3 years ago

I'm with you.
It's easier said than done, of course, but crash-only software is a worthwhile goal IMO.

akeck 3 years ago

I love this. Lots of details I didn't know.

MarkusWandel 3 years ago

Quoting from the article:

   TimeoutStopSec=0

That's cost me more than one hard power button powerdon (on a desktop machine with the system partition on SSD - unnerving).

One of the innumerable things that systemd stops on shutdown gets stuck - permanently - and the machine goes into a state out of which, to my knowledge, is only a powerdown or reset.

I ended up searching for the above and replacing them with a reasonable timeout (several minutes).

dghubbleOP 3 years ago

That's fair, that example would be better with a modest timeout (e.g. if podman had a bug that caused it to hang), without taking away from the main points. Updated.
Someone asked about the opposite on Twitter: https://twitter.com/DannoHung/status/1585350836074446869

rfmoz 3 years ago

The macOS init manager, LaunchD, doesn't offer an easy way to execute an script at shutdown.

By definition, it sends a SIGTERM signal to all of the daemons that it started. But as the script isn't started before, and doesn't keep a running PID, you don't have a clean way to do it.

I don't understand why they only implemented the SIGTERM call without any alternative.

SrslyJosh 3 years ago

It shouldn't be this hard to stop a service gracefully. This is far, far more complicated than SysV init, where you just need to drop a script into /etc/init.d symlink it from the appropriate rc directories. (For shutdown/reboot, you'd create symlinks in rc5.d and rc6.d named KNNwhatever, where NN is an integer that specifies the order the script will be run in. The "K" stands for "kill".)

Edit: Note that my example runlevels are for Solaris, other UNIX/Linux OSes will vary.

LukeShu 3 years ago

> It shouldn't be this hard to stop a service gracefully.
It's not.
> you just need to drop a script into /etc/init.d symlink it from the appropriate rc directories.
You just need to drop a unit file into /etc/systemd/system/ and symlink it from the appropriate /etc/systemd/system/${target}.wants/ directories.
Don't tell me that "shutdown.target.wants" and "reboot.target.wants" are harder than "rc0.d" and "rc6.d".
A lot of the article is about ordering of dependencies (don't stop a dependency until after the dependent has stopped). Don't tell me that adding `Before=` and `After=` lines in the unit file is harder than having to remember all of the dependencies and manually figure out the correct "NN" for it all to work correctly.
A lot of the article is about either having your daemon handle SIGTERM, or coming up with the appropriate `ExecStop=` command. The same command you'd be writing in your rc script (the "handle SIGTERM" stuff being for if your rc script simply says `kill $PID`).
That is: The complex parts of the article are things that were complex with sysvinit too.
- ivan23178 3 years ago
  
  There's also middle ground. OpenRC doesn't need those weird numbers while still being simple sysvinit/bash scripts-based system (it has a simple dependency system too). I'm not completely sure about `shutdown` level scripts having access to network though (too lazy to check), but it's still worth mentioning here.
  - LukeShu 3 years ago
    
    OpenRC's great! But it doesn't detract from the point that systemd isn't nearly as complicated as some people act like it is.
    PS: Who's still using OpenRC with heavy ol' sysvinit instead of the lighter openrc-init?

kzrdude 3 years ago

Halt is apparently not the same as poweroff.

andrewaylett 3 years ago

Kids of today, etc.
AT power supplies didn't have any mechanism for the system to tell the power supply it wasn't needed any more. So when you shut down the computer, it would wind up at a screen with a message approximating "it is now safe to switch off your computer", at which point the system would halt.
ATX power supplies added the ability for the OS to trigger an actual power off. But that's a different end-state to halting, and if you halt the system then it stays on. You may wonder why anyone would want to halt when power off is an option, and to be honest I'm not entirely sure -- possibly because you have a hardware watchdog which will trigger a reboot of a halted machine but not of a powered off machine?
chasil 3 years ago

A "halt -fp" just unmounts file systems and immediately shuts down.
I find that CentOS systems that I've used for a while seem to hang on shutdowns; halt -fp is a way to get them down quickly. It is important to terminate any sensitive processes beforehand.
- SoftTalker 3 years ago
  For systems that hang or take intolerably long to shutdown, I typically do:
  systemctl --force [poweroff|reboot]
  From the man page, this means that "shutdown of all running services is skipped, however all processes are killed and all file systems are unmounted or mounted read-only, immediately followed by the powering off."

exikyut 3 years ago

Meta: I can't reach this website!

Chrome is giving me an instant NXDOMAIN error.

Dig shows that

  $ dig psdn.io @1.1.1.1
  ...
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19283
  ...
  ;; QUESTION SECTION:
  ;psdn.io.                       IN      A

so then I prefix "www." like is in the URL...

  $ dig www.psdn.io @1.1.1.1
  ...
  ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 64024
  ...
  ;; QUESTION SECTION:
  ;www.psdn.io.                   IN      A
  
  ;; ANSWER SECTION:
  www.psdn.io.            300     IN      CNAME   poseidon-www.pages.dev.

Okay, fine:

  $ dig poseidon-www.pages.dev @1.1.1.1
  ...
  ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 55471
  ...
  ;; QUESTION SECTION:
  ;poseidon-www.pages.dev.                IN      A

...wat??

(Where there's no ANSWER section, none was returned, just an AUTHORITY section)

This is reproducible for me with 1.1.1.1, 8.8.8.8 and 9.9.9.9.

dghubbleOP 3 years ago

Hmm, sorry you're not seeing it. Its just a CNAME to Cloudflare Pages, nothing fancy

  dig www.psdn.io @1.1.1.1
  
  ; <<>> DiG 9.16.33-RH <<>> www.psdn.io @1.1.1.1
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37362
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0,   ADDITIONAL: 1
  ;; QUESTION SECTION:
  ;www.psdn.io.                   IN      A
  
  ;; ANSWER SECTION:
  www.psdn.io.            126     IN      CNAME   poseidon-www.pages.dev.
  poseidon-www.pages.dev. 126     IN      A       172.66.45.44
  poseidon-www.pages.dev. 126     IN      A       172.66.46.212

Settings

Writing systemd units that stop gracefully before shutdown

Keyboard Shortcuts