Did you know...?LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.
The systemd project is preparing for a new release. Version 256-rc1 was released on April 25 with a large number of changes and new features. Most of the changes relate to security, easier configuration, unprivileged access to system resources, or all three of these. Users of systemd will find setting up containers — even without root access — much simpler and more secure.
Lennart Poettering chose to experiment with a new format for announcing features this year: posting a series of Mastodon threads that cover features that he's excited about in more detail. Poettering said that he found it easier to get ideas out on Mastodon than in a more official venue, and invited anyone who wished to consolidate his thoughts as a long-form article to do so. One thread — on systemd's new run0 tool — has already generated substantial commentary.
The first thread describes the new way that systemd finds configuration files. Currently, many tools, systemd included, support reading multiple configuration files from a directory (whose name typically ends in .d) and combining them to produce the final configuration. As Poettering points out in his thread, this approach is useful for package managers, because it lets individual packages add to the configuration while keeping those contributions separate.
There are some situations where it's less important to have files from many packages than from different versions of the same package: for example, a container runtime needing to deal with versioned images. Ideally, existing containers could continue using an older version, while new containers would seamlessly use the newest version. Systemd now supports this use case by reading files from a directory whose name ends in ".v". When a systemd tool goes looking for a particular file — example.ext, for example — it will now accept a directory called example.ext.v/ with files example_[version].ext inside. Of the available files, the tool will pick the one with the highest semantic version number.
The rest of the changes Poettering has chosen to highlight are a bit larger. Systemd has had support for encrypted credentials for some time. In systemd terms, a credential is a named blob that an application may interpret however it likes. Credentials are locked to a computer's trusted platform module (TPM), or stored on an encrypted disk if no TPM is available. These credentials have only been usable by system services, however, not by per-user services. Poettering shared that systemd version 256 would support making credentials available to user services. This is useful in its own right, but other improvements make this feature more useful than it might initially appear.
The release also includes support for working with discoverable disk images (DDIs) in an unprivileged context. DDIs are disk images with embedded metadata that systemd uses for various purposes. DDIs are often used as filesystem images for systemd-nspawn containers. Letting unprivileged users work with DDIs was the last step required to permit unprivileged systemd-nspawn containers.
Finally, systemd also supports configuring some settings by adding encrypted credentials — even if these thing are not traditional "credentials", but rather just a useful way to pass configuration parameters into a service using an interface that already existed. For example, systemd-firstboot looks for a credential called firstboot.locale and uses its value as the system's locale. On a physical computer or a virtual machine, those credentials can be passed in via the BIOS or UEFI ESP. In a container, they can be passed in via a mount under /run/host. The number of settings that can be configured this way has been greatly expanded in the new release:
Thus, a regular systemd system will now allow you to configure via credentials: keymap, locale, timezone, issue file, motd file, hosts file, .link files, .network files, .netdev files, DNS servers, DNS search domains, root passwords, root shell, SSH key of root, additional SSH address/port to listen on, sysuser.d/ additions, tmpfiles.d/ additions, sysctl.d/ additions, fstab additions, console font, additional TTYs to spawn gettys on, socket to forward journal data to, socket for sd_notify() messages from the system, machine ID, hostname, systemd-homed users to create, cryptsetup passwords and pins, additional unit files and drop-ins for unit files, udev rules, and more.
The combination of these features means that it is now possible for an unprivileged user to configure their own systemd-nspawn containers — or even entire hierarchies of such containers — using encrypted credentials that are protected from other users on the host system.
That isn't the only feature designed to make interacting with containers or virtual machines more pleasant, however. Many readers may be aware of the sd_notify() protocol that systemd uses to get information from system services about their status. Less well-publicized is the fact that systemd actually sends sd_notify() messages to whatever started it. This is useful for running systemd under another init system, but it also means that systemd can signal the host of a container this way. Since version 253, systemd has also supported the AF_VSOCK option for sending sd_notify() messages, letting it send messages to the virtual machine manager responsible for more traditional virtual machines.
Version 256 adds a new message that systemd will send when a given target is
fully activated: X_SYSTEMD_UNIT_ACTIVE=[unit name]. Poettering
calls this
"both a progress notification and a feature notification
". One
example use is letting the host system of a virtual machine know when the SSH
socket (which systemd sets up before starting SSH, and then hands over when the
service is up in socket-activated configurations)
is bound, and therefore it can connect without errors or retries. Other
uses include discovering what services are running on a virtual machine, or
providing a more granular view of how far into starting up the machine is.
Another feature that existed previously in a smaller form, but which is now available to the whole system, is a configuration option called ProtectSystem. Services with this option run in a separate mount namespace where important system directories — particularly /usr — are mounted read-only. Since few programs need to write to /usr, this is a fairly seamless way to make the system more secure.
With version 256, this option can now be applied to the entire system instead of on a service-by-service basis. While this is not practical for most systems, since tools like package managers do still need to write to /usr on occasion, there is one place where enabling the option by default makes sense: the system's initial ramdisk.
When a Linux system starts up, it begins by creating a temporary, in-memory filesystem and unpacking the initial ramdisk into it. Then it starts the init process from the disk, and leaves the task of actually setting up all the expected filesystem mounts and so on to user space. Often, this setup involves talking to the network, receiving encryption secrets to unlock the hard disk, or both. Exposing trusted code to the network is always risky, but the code to handle both of those things can also write to the temporary filesystem, opening an even larger attack surface. With the new version, however, ProtectSystem becomes the default for systemd on a ramdisk, causing it to remount the temporary filesystem as read-only before proceeding with the rest of the boot. Early tests revealed few problems with this change, Poettering said. The only distribution to have a serious problem with it was Fedora; dracut (the tool Fedora uses to create an initial ramdisk) had problems writing hook files with the new protection in place, but has since been fixed.
The final feature that Poettering has discussed at the time of writing (although more threads seem sure to follow) is a quality-of-life improvement for users of systemd-homed — a service that encrypts users home directories until they log in. Unfortunately, encrypted home directories don't work with SSH because it doesn't include a mechanism to ask for encryption secrets before trying to start a shell (systemd-homed loads SSH authorized keys from outside the home directory, so that is not a barrier to SSH logins). Currently, users must log in locally at least once (in order to be prompted to unlock their home directory) in order for SSH logins to work correctly. With the new update, systemd has added a shim that will intercept SSH logins for a user with an encrypted home directory and prompt them to enter encryption credentials over the network.
New systemd versions don't just bring new features, however. They also bring the deprecation of old features. In this case, the most noticeable deprecation is that systemd is finally dropping support for version 1 control groups (cgroups) in favor of the newer version 2 cgroups. A system that boots with version 1 control groups will cause systemd to fail loudly with an error, although version 1 cgroups can still be turned on with an option on the kernel's command line, for now.
There are other, less notable additions and deprecations with the release as well, including changes to nscd caching, configuration file locations, and many others. Interested readers can find the full list in the project's NEWS file. Systemd releases usually have three or four release candidates approximately a week apart, so it is reasonable to expect that systemd version 256 will be fully released in approximately a month, and make its way into distributions from there.