Running systemd without systemd-journald
declassed.artPSA: systemd-journald uses shared file-backed mappings via mmap() for its journal IO.
You must subtract its shared memory use from its resident memory use before judging how much memory it's consuming. The file-backed shared mappings are reclaimable, because they are file-backed. The kernel will just evict the mapped journal pages at will, since they can always be faulted back in from the filesystem.
TFA is much ado about nothing, learn to measure memory use properly before breaking out the pitch forks.
Full disclosure: I've hacked a bunch on journald upstream.
This is true, but the author seems to be running their services on several Raspberry Pi like devices whose flash storage may be unstable or quick to wear out. Eliminating unnecessary writes and swap space (depending on the application), those megabytes of extra memory may be just enough what tricks the system into committing memory into swap.
You can run quite a lot in 512MB of RAM if you use the right languages to write code in. I was surprised about how little RAM my moderately complex daemon written in Rust uses, for example; I expected to have to allocate a gigabyte of RAM to the VM running it (based on what other tools similar to what I was doing needed) but the entire system turned out to be quite comfortable with just a quarter of that. I didn't even try to optimise for memory usage, which is what made this so surprising. I stil had to give it some more RAM because unattended upgrades tended to get stuck, but I learned a lesson that day.
Ever since I've been meaning to try to mess with Firecracker + bare bones daemons to run virtual machines services with absolutely minimal overhead. I like the virtualisation boundaries from a security standpoint much more than container boundaries and now I wonder how much I can shrink my overhead by.
> This is true, but the author seems to be running their services on several Raspberry Pi like devices whose flash storage may be unstable or quick to wear out. Eliminating unnecessary writes and swap space (depending on the application), those megabytes of extra memory may be just enough what tricks the system into committing memory into swap.
Well the author seems to want text logs instead, which seems much much worse for this.
If you're concerned about storage wear you'd just run journald without /var/log/journal so it's volatile (tmpfs) only. At least that way you still have journals for your current boot and functionality like `systemctl status $service` can still tell you some journal information.
Yeah this is a lot of work to avoid reading journald.conf, switching the storage to volatile, and capping the memory usage to whatever you want.
>You can run quite a lot in 512MB of RAM if you use the right languages to write code in.
I recently delivered a production-ready embedded system running Armbian with 512megs RAM, and indeed disabled systemd-journald for our uses, also .. but even with it enabled, our Lua-based app was (science/data analysis on sensor network) running in the best environment it has ever run, so I can confirm: 512MB is enough for a lot of things.
512MB is absolute overkill for the application that you built, it is the choice of OS + the tooling used that resulted in that requirement. Not all that long ago 32 MB served a whole bank, and embedded systems used kilobytes of RAM, not megabytes. We've gotten so used to slapping a full unix server into stuff that we hardly even think about it any more and just take that kind of power completely for granted. I'm not saying you made any wrong choices, it's just that most of the embedded stuff that I come across would be just as feasible on a fraction of the CPU (and power) budget than what we typically choose because for instance Lua is such a convenient choice for a platform like that.
Windows 95 ran an entire OS with decent UI in 8 MB of RAM. One really has to wonder, where is all the RAM going these days? I think the knowledge of doing anything with only 8 MB of RAM has gone away, we don't know how to do it anymore.
Your comment brings the following Monty Python sketch to mind:
“What did the Romans ever do for us?
… All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health”
Replace “Romans” with “Increased RAM usage”.
It's not the knowledge - it's the increased complexity of the entire stack, all the way down to the hardware. A modern linux kernel image is easily bigger than 8MB, and that needs to be in memory at all times. Why? Because of all the functionality it has these days, to fit all the possible usecases people need. Windows 95 didn't have Swap, didn't support many filesystems, didn't have central logging, didn't have ASLR, let alone support for containers, and many other features I'm forgetting along the way.
Sure you could strip away a lot of that functionality, even at the distribution level (by for example not using an init system at all, instead just one shell script to initialize things), but then you'd end up with an operating system that's not general purpose for today's standards anymore.
Don't forget how much higher screen resolutions are these days. Color depth also. Those 8 MiB systems were driving single-buffered displays with perhaps 800x600 resolution at eight bits per pixel, with a color palette and dithering, which requires about 480 KB to hold the framebuffer image. Most applications would render directly into the framebuffer. A full HD (1080p) screen at 32 bits per pixel requires 8 MiB just to store the framebuffer (16 MiB with double-buffering), and that's not counting any of the input data or code needed for rendering. Figure on two or three times that to hold separate textures for each window (depending on the window sizes and how much they overlap) so that they can be composited live with desktop effects.
> Windows 95 didn't have Swap
it did have virtual Memory and swap
Can confirm. I remember it swapping heaps on my 32MB machine.
A huge amount of it is going to graphics. A 4K screen is ~31 MB just for the framebuffer. In comparison, 640x480x16 colors is 150K of memory.
Windows 95 also didn't do things the modern way. It didn't keep an image of every application's windows in RAM. It kept track of what covered up what, and then asked applications to redraw themselves when needed.
Another huge amount is going to features like internationalization. Unicode is a beast that takes a good amount of code to implement, and Arial Unicode is a ~20 MB TTF file.
Modern luxuries like being able to tweet in Japanese are quite expensive.
It's not quite that simple: while clean file-backed pages are cheaper than, say, private dirty pages (which the kernel must preserve as long as anyone references them), they're not free: you're still paying an opportunity cost. That is, the kernel is, at least for a time, keeping each clean file-backed page resident when it could be keeping some other page, perhaps a more useful one, in RAM instead. If systemd-journald is append-mostly, it'd be useful to MADV_FREE (after msync) any pages behind the current write pointer so as to give the kernel a hint that it can get rid of those clean file-backed pages early. I'd actually suggest getting rid of the use of memory mapping entirely, but doing so would likely be a bigger ask.
> the kernel is, at least for a time, keeping each clean file-backed page resident when it could be keeping some other page
It's almost the same result for the standard page cache when you're reading a file, isn't it?
Did the Linux kernel ever fix the thing where it evicts code pages at the same priority as files mapped read/write?
I haven't checked in the last 4 years or so, but, before that, every time I've worked with a Linux-based storage system that used mmap to write to files, I've ended up rewriting it to use pread/pwrite.
Each time, there was no perceptible CPU hit, but there was a massive page cache / memory pressure win. It turns out that aggressively evicting warm code pages then faulting them back in is bad for system performance, even with a fast SSD.
There's nothing to "fix" here, in some cases what you want is not optimal. It is perfectly reasonable for the kernel to prioritize data pages you touched more recently than code pages by default. It's essentially a big LRU, always has been.
If you don't like that, you can always use mlock(). You can also tune things like writeback sysctls and readahead behavior. But I disagree it's "broken" because it doesn't do what you want by default.
In a post-spectre/meltdown world syscalls are a bit more expensive, you'd be hard-pressed to compete with the journal's mmap windows especially for a warm page cache, using pread/pwrite. Especially if you just went naively about it and tried turning every little object access into its own little island of buffered IO. The objects in the journal are quite small, so you'd likely end up having to implement your own page cache/buffer manager in userspace to coalesce the syscalls.
It'd be far more interesting to explore an io_uring based implementation IMNSHO.
I was wondering about this exact thing when the article didn't break down the ram usage. Thanks!
I was waiting for the author to check the memory usage of rsyslog and becoming enlightened... but it didn't happen. Reminder: check your assumptions/result after changes. He could learn that rsyslog uses way more shared memory than journald (>900M on my system) and it doesn't matter.
Memory use has always been a mystery to me and I can easily miss some things. Thanks for pointing to. Anyway, the right solution for me is tldr, all the rest is shit.
Yeah, the rest just shows that op likes to do things like they've always done it. How you can prefer to poke around syslog and ps output to determine the state of a service instead if just doing systemctl status is beyond me for example.
Because systemctl status is called by monitoring tool. It's a habit, yes. If monitoring shows the service is down, no point to manually use systemctl, and syslog with ps become best friends. And sometimes date command as well. PIs don't have hardware clocks and wrong date may lead to errors that look mysterious.
That (syslog,ps) method was likely formed by habit during the many years before systemctl existed.
Would doing something like this work around the "journald drops the most important error messages" issue that has been known/outstanding for ten years (bug moved to GitHub six years ago), or is that more of a fundamental design mistake in systemd itself?
It's not accurate to say it "drops" error messages. The bug causes these messages to not be attributed to a particular unit - you can still see them with `journalctl` but not with `journalctl -u foo`. Still pretty annoying and should absolutely be fixed (although I'm not sure if systemd is the right place to do it).
> although I'm not sure if systemd is the right place to do it
It's truly impressive how systemd turns out the be the right place to do absolutely anything and everything from bootloader to init to dhcp to ntp to network shares... except fix systemd bugs. systemd doesn't seem to be the right place to do that ever. Someone else needs to do it. For another example of this phenomenon, see the nohup bug.
This is actually extremely useful for me to know, so thank you so much for pointing this out!
Weird; the systemd journal is the feature I want most! It would be the last thing I would ever consider disabling.
I don't know if this is still an issue but the last time I used journald the logs would occasionally become corrupted and journalctl would refuse to read them. The fix was to just delete the logs. I have no idea how logging got so screwed up that corruption in part of the file could make the rest of the log file unreadable. I mean, it's a journal, it's right in the name.
Ever since then I switched to rsyslogd and the like. Rock solid.
Keep in mind that rsyslog doesn't even attempt to verify logs. An alternative explanation is: my system is corrupting logs, I changed to a logging daemon which doesn't tell me about it.
I mean, there could definitely be a bug in journald, but I haven't seen any fixes mentioned in changelog for the last 5 years and if it was happening in standard usage, people would notice.
For recovering corrupted logs - you can still "less" them as usual. They have some extra markers, but the text is available as text. Journalctl has some special options for that too.
IIRC when I experienced this the logs were all in some binary format that couldn't easily be less'd. tbh I didn't do much investigation other than to see the "delete all your logs" resolution suggestion. There could have been a better option.
The fact that it can't seem to recover from a few bad records and gives up on the whole file demonstrates what terrible software it is.
Have you tried it? Journalctl does skip bad entries and prints out the rest automatically. If you've found a case where it doesn't, you should report that as a bug.
It may do that nowadays but it definitely didn't back when I experienced this issue. This matches my experience:
https://www.reddit.com/r/linux/comments/1y6q0l/systemds_bina...
Yes, 8 years ago things had more bugs than today / were less mature. It's silly to call something terrible software today because of that. Everyone can find a pet bug they run into years ago. shrug
Not really a bug. A basic design flaw, since corrected it seems.
Lots of hardware problems on display, especially suspend and resume which is notoriously buggy (broken ACPI tables that happen to work in Windows so the hardware manufacture never noticed they were busted, etc). I recommend spending extra to get ECC ram, and running ZFS filesystems. Both can catch a number of types of errors before they corrupt your data. With those precautions I haven’t lost any data in many years.
Though one time at work we had a few thousand hard drives from a particular vendor that had an interesting firmware bug. Very, very occasionally they would write a sector with incorrect data. No individual drive did it very often, but after a few thousand full drive writes we noticed it half a dozen times. We also discovered that the garbage data was always the same across all of the drives. Crazy. Sadly we weren’t running ZFS on those systems, which would have caught the problem and corrected it from redundancy. Thankfully we were able to get a refund. Never put your trust a hard drive.
To get back on topic, I’ve always assumed that journald was reasonably robust against minor corruption, but honestly I’ve never had a reason to test it. At the end of the day no one component of the system is solely responsible for data integrity; every level of the hardware and software must cooperate to prevent corruption else there will be cracks for the data to slip through.
Honestly, that sounds more like a disk problem than a systemd problem.
systemd is incredibly useful and powerful.
If you are a developer on Linux then you really owe it to yourself to learn as much as you can about systemd.
If you really understand systemd then you'll find yourself architecting your software around its capabilities.
systemd, if you understand it, can mean you can completely avoid large chunks of development you might otherwise have assumed you need to do.
socket activated services, nspawn, traffic accounting, the list of juicy goodness goes on and on....
Ignore the haters, they wouldn't hate if they dedicated their energy to understanding systemd instead of hating on it.
In 2022 you are a really an incomplete full stack developer if systemd is not one of the technologies you know very well.
> systemd is incredibly useful and powerful.
In that one sentence I think you've highlighted what garners hate from the haters. It's powerful, and it's great when that power is used for good. Those times when it's not is where people get agitated.
Do you have an example of when systemd is not used for good?
It's quite annoying to not be able to use common tools for DNS queries like `dig`, when using `systemd-resolved` for DNS. I think you might have to sometimes flush the caching feature of `systemd-resolved` as well.
It's fine, I guess. But it did take a while to learn the new ways.
I think this is the main complaint from users with decades of experience. Their scripts and old knowledge stops working.
it kills production daemons having long running stateful data whenever the netdev goes offline., systemd-networkd, that is.
Only if you configure it to do that. You can use Wants= and After= for your service to start it after the network, but not restart otherwise.
The behaviour you describe is not unreasonable to want: If your network device goes away, what exactly are you binding the socket to? How are you restoring the listening when it comes back up? But it's up to you to say which one you want.
Its never a good systemd-networkd default to be killing daemon whenever a netdev goes “cricket”.
It's not a networkd default. Networkd doesn't kill or even care about other services. The setting exists on the service's unit side. You have to explicitly specify how you want to depend on networking.
Try it. Pull the Ethernet cable.
I do that every day moving between WiFi and plugged in Ethernet. It seems you're misunderstanding what exactly is killing processes in your case. Try to collect the logs and maybe ask on serverfault. I promise networkd is not killing your processes and if you don't have strict dependencies in your unit file, systemd is unlikely to do that either.
An ethernet interface is not a netdev. Netdevs are always virtual interfaces.
it is a good thing you have no states to care for.
I don't hate systemd, it simply makes me upset from time to time. If you're a developer, I'd suggest to look around and try writing portable software. Investing your precious time in systemd is pointless IMHO unless you're a systemd developer. It's alike windows: if you a developer on windows learn its incredibly powerful service subsystem... If people made systemd they definitely needed it but I gave it a try and became convinced I don't need it. So why pushing it to me in my favorite distro without any alternative? To force me stop using that distro? To force me to look around and choose another one? Is that the ultimate goal of higher forces? Well, I have to agree, that's not bad, but this distracts me from more important things.
systemd was the first init/supervision system I used. I was able to understand how to use it pretty well.
I still hate it.
Sorry, but understanding systemd does not preclude hating it.
In fact, it's because of the "haters" that I decided to dive deep into what init/supervision systems are and can be. Without that deep dive, I would have always thought that systemd was great. Afterward, however, I know just how wrong that is.
If a developer's only interaction with init systems was SysV init, yes, of course, systemd is the greatest thing since sliced bread.
But systemd could have been so much better...
Disclaimer: I'm writing an init/supervision system to be so much better. And simpler. Orders of magnitude simpler. Oh, and one that doesn't reach its fingers into every part of your OS.
tl;dr: systemd is better than SysV init, but it's not the end-all-be-all of init/supervision systems.
I have similar experience with PulseAudio. First look: it's unstable: Second look: it's fairly well thought out. Third look (using it as a programmer to implement some audio routing thing): it's kind of garbage. I am now using Pipewire, and Pipewire is pretty great and very flexible because it has the right foundations, so it can do anything.
I have but one word for this guy.
Devuan
I see no reason not to consider it.
He is a Debian user who hates the way systemd works.
Devuan is for you.
He doesn't even know how to read man pages (journald.conf(5) describes the setup he is looking for), and he doesn't know how memory usage on Linux works (shared /mmap()ed memory is counted for each process). Do you really think his issues go away when he switches to Devuan? He'll probably just yell at different clouds
Other than that, Devuan is a solid choice for people who want to get rid of systemd. It comes with the Debian-typical rather old versions of most programs, but I guess for a file server it doesn't matter much if you run kernel 4.19 or 5.15
The systemd man pages are a book. And, just for avoidance of doubt, that's not a good thing. I am never surprised when someone can't find out how to configure systemd to do what they want. It's just too enterprise grade.
journald.conf(5) is ~2500 words and ~230 lines on a terminal 130 chars wide. Not exactly a book. systemd-journald(8) is ~220 lines and systemd(1) is ~750 lines. Big? yes. But nothing compared to some other man-pages (ever tried `man gcc` or `man bash`?)
People complain about how bad or non-existing Linux's man-pages are compared to the BSDs, and then systemd comes along with a really extensive and well-written set of manpages and people complain that it's too much
Can't make everyone happy I guess...
I recently got a new work laptop and decided to take the opportunity to switch from using windows 10 with a linux VM (please don't try to sell me WSL2, I'm not interested) to plain linux. I decided to pick devuan stable since the base machine needs to be... stable. I have no interest in using systemd on anything which needs to be stable. I feel happy for anyone who has never encountered severe stability issues with systemd but I am not that person. Devuan comes with sysvinit which is also trash but it offers the opportunity to use other inits. At this point in time I have switched it from sysvinit to runit and eventually ripped out the entire runit infrastructure that comes with devuan and replaced it with something heavily inspired by void's runit. This isn't great, it would be nice if devuan took things other than sysvinit+initscripts more seriously. Maybe even just switch to OpenRC since, while it is still not great (please stop using pid files) it's a hell of a lot more sane than the mess of init scripts devuan ships with.
It depends what you're doing but for my "workstation", Devuan is indeed working flawlessly. I'm running it since six months or so on my main PC (a little Ryzen 3700X / 32 GB or RAM) and I'm very happy with it.
Now if you're a sysadmin and have come to rely on systemd and are now locked in, Devuan is obviously not for you. But if you're running Linux on a desktop or on a laptop and aren't a fan of systemd, Devuan is great.
Three cheers for Devuan! :). Completely agree, this would've been an easy distro switch-a-rooni ..
I switched all my Debian to Devuan. Havent looked back.
if you run a Debian-based production server, then Devuan is for you.
Devuan can allow your precious daemon to stay up despite netdev going offline, unlike systemd-networkd which would kill the daemon.
This is quite important if your large ling-running daemon has statefulness data.
As someone pointed out elsewhere in the thread, so can systemd, your service did what it was configured to do. If you want the service to stay up when netdev goes down, then don't tell systemd that netdev is a hard requirement for your service.
Its never a good systemd-networkd default to be killing daemon whenever a netdev goes “cricket”.
But that's just it, to say this just demonstrates you don't understand the systemd service model.
There is no "system default" in the loop here at all. The service told systemd that it cannot operate without netdev, and systemd behaved accordingly. If the config was written more appropriately, it would have behaved accordingly. The "default" is what you had written in the unit file.
"when I needed systemd binary logs? - and I realized I never needed them."
that is my sentiment.
Linus was hesitant about binary logs.
I think they are just not unix. They are doing the wrong thing correctly (I would prefer doing the right thing poorly - or better yet doing it well)
Can we just admit jourald is a thorn in the side of people taking systems seriously. The default behaviour even on polished distros is just bad and it's mainly because of the mindsets behind it...
Spam omelette with spam sausage and spam does not have too much spam in it!