Settings

Theme

Running systemd without systemd-journald

declassed.art

78 points by axy 4 years ago · 78 comments (75 loaded)

Reader

pengaru 4 years ago

PSA: systemd-journald uses shared file-backed mappings via mmap() for its journal IO.

You must subtract its shared memory use from its resident memory use before judging how much memory it's consuming. The file-backed shared mappings are reclaimable, because they are file-backed. The kernel will just evict the mapped journal pages at will, since they can always be faulted back in from the filesystem.

TFA is much ado about nothing, learn to measure memory use properly before breaking out the pitch forks.

Full disclosure: I've hacked a bunch on journald upstream.

  • jeroenhd 4 years ago

    This is true, but the author seems to be running their services on several Raspberry Pi like devices whose flash storage may be unstable or quick to wear out. Eliminating unnecessary writes and swap space (depending on the application), those megabytes of extra memory may be just enough what tricks the system into committing memory into swap.

    You can run quite a lot in 512MB of RAM if you use the right languages to write code in. I was surprised about how little RAM my moderately complex daemon written in Rust uses, for example; I expected to have to allocate a gigabyte of RAM to the VM running it (based on what other tools similar to what I was doing needed) but the entire system turned out to be quite comfortable with just a quarter of that. I didn't even try to optimise for memory usage, which is what made this so surprising. I stil had to give it some more RAM because unattended upgrades tended to get stuck, but I learned a lesson that day.

    Ever since I've been meaning to try to mess with Firecracker + bare bones daemons to run virtual machines services with absolutely minimal overhead. I like the virtualisation boundaries from a security standpoint much more than container boundaries and now I wonder how much I can shrink my overhead by.

    • Redoubts 4 years ago

      > This is true, but the author seems to be running their services on several Raspberry Pi like devices whose flash storage may be unstable or quick to wear out. Eliminating unnecessary writes and swap space (depending on the application), those megabytes of extra memory may be just enough what tricks the system into committing memory into swap.

      Well the author seems to want text logs instead, which seems much much worse for this.

    • pengaru 4 years ago

      If you're concerned about storage wear you'd just run journald without /var/log/journal so it's volatile (tmpfs) only. At least that way you still have journals for your current boot and functionality like `systemctl status $service` can still tell you some journal information.

      • Spivak 4 years ago

        Yeah this is a lot of work to avoid reading journald.conf, switching the storage to volatile, and capping the memory usage to whatever you want.

    • seclorum_wien 4 years ago

      >You can run quite a lot in 512MB of RAM if you use the right languages to write code in.

      I recently delivered a production-ready embedded system running Armbian with 512megs RAM, and indeed disabled systemd-journald for our uses, also .. but even with it enabled, our Lua-based app was (science/data analysis on sensor network) running in the best environment it has ever run, so I can confirm: 512MB is enough for a lot of things.

      • jacquesm 4 years ago

        512MB is absolute overkill for the application that you built, it is the choice of OS + the tooling used that resulted in that requirement. Not all that long ago 32 MB served a whole bank, and embedded systems used kilobytes of RAM, not megabytes. We've gotten so used to slapping a full unix server into stuff that we hardly even think about it any more and just take that kind of power completely for granted. I'm not saying you made any wrong choices, it's just that most of the embedded stuff that I come across would be just as feasible on a fraction of the CPU (and power) budget than what we typically choose because for instance Lua is such a convenient choice for a platform like that.

      • RedShift1 4 years ago

        Windows 95 ran an entire OS with decent UI in 8 MB of RAM. One really has to wonder, where is all the RAM going these days? I think the knowledge of doing anything with only 8 MB of RAM has gone away, we don't know how to do it anymore.

        • HL33tibCe7 4 years ago

          Your comment brings the following Monty Python sketch to mind:

          “What did the Romans ever do for us?

          … All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health”

          Replace “Romans” with “Increased RAM usage”.

        • mid-kid 4 years ago

          It's not the knowledge - it's the increased complexity of the entire stack, all the way down to the hardware. A modern linux kernel image is easily bigger than 8MB, and that needs to be in memory at all times. Why? Because of all the functionality it has these days, to fit all the possible usecases people need. Windows 95 didn't have Swap, didn't support many filesystems, didn't have central logging, didn't have ASLR, let alone support for containers, and many other features I'm forgetting along the way.

          Sure you could strip away a lot of that functionality, even at the distribution level (by for example not using an init system at all, instead just one shell script to initialize things), but then you'd end up with an operating system that's not general purpose for today's standards anymore.

          • nybble41 4 years ago

            Don't forget how much higher screen resolutions are these days. Color depth also. Those 8 MiB systems were driving single-buffered displays with perhaps 800x600 resolution at eight bits per pixel, with a color palette and dithering, which requires about 480 KB to hold the framebuffer image. Most applications would render directly into the framebuffer. A full HD (1080p) screen at 32 bits per pixel requires 8 MiB just to store the framebuffer (16 MiB with double-buffering), and that's not counting any of the input data or code needed for rendering. Figure on two or three times that to hold separate textures for each window (depending on the window sizes and how much they overlap) so that they can be composited live with desktop effects.

          • rasz 4 years ago

            > Windows 95 didn't have Swap

            it did have virtual Memory and swap

        • dale_glass 4 years ago

          A huge amount of it is going to graphics. A 4K screen is ~31 MB just for the framebuffer. In comparison, 640x480x16 colors is 150K of memory.

          Windows 95 also didn't do things the modern way. It didn't keep an image of every application's windows in RAM. It kept track of what covered up what, and then asked applications to redraw themselves when needed.

          Another huge amount is going to features like internationalization. Unicode is a beast that takes a good amount of code to implement, and Arial Unicode is a ~20 MB TTF file.

          Modern luxuries like being able to tweet in Japanese are quite expensive.

  • quotemstr 4 years ago

    It's not quite that simple: while clean file-backed pages are cheaper than, say, private dirty pages (which the kernel must preserve as long as anyone references them), they're not free: you're still paying an opportunity cost. That is, the kernel is, at least for a time, keeping each clean file-backed page resident when it could be keeping some other page, perhaps a more useful one, in RAM instead. If systemd-journald is append-mostly, it'd be useful to MADV_FREE (after msync) any pages behind the current write pointer so as to give the kernel a hint that it can get rid of those clean file-backed pages early. I'd actually suggest getting rid of the use of memory mapping entirely, but doing so would likely be a bigger ask.

    • viraptor 4 years ago

      > the kernel is, at least for a time, keeping each clean file-backed page resident when it could be keeping some other page

      It's almost the same result for the standard page cache when you're reading a file, isn't it?

  • hedora 4 years ago

    Did the Linux kernel ever fix the thing where it evicts code pages at the same priority as files mapped read/write?

    I haven't checked in the last 4 years or so, but, before that, every time I've worked with a Linux-based storage system that used mmap to write to files, I've ended up rewriting it to use pread/pwrite.

    Each time, there was no perceptible CPU hit, but there was a massive page cache / memory pressure win. It turns out that aggressively evicting warm code pages then faulting them back in is bad for system performance, even with a fast SSD.

    • jcalvinowens 4 years ago

      There's nothing to "fix" here, in some cases what you want is not optimal. It is perfectly reasonable for the kernel to prioritize data pages you touched more recently than code pages by default. It's essentially a big LRU, always has been.

      If you don't like that, you can always use mlock(). You can also tune things like writeback sysctls and readahead behavior. But I disagree it's "broken" because it doesn't do what you want by default.

    • pengaru 4 years ago

      In a post-spectre/meltdown world syscalls are a bit more expensive, you'd be hard-pressed to compete with the journal's mmap windows especially for a warm page cache, using pread/pwrite. Especially if you just went naively about it and tried turning every little object access into its own little island of buffered IO. The objects in the journal are quite small, so you'd likely end up having to implement your own page cache/buffer manager in userspace to coalesce the syscalls.

      It'd be far more interesting to explore an io_uring based implementation IMNSHO.

  • bragr 4 years ago

    I was wondering about this exact thing when the article didn't break down the ram usage. Thanks!

    • viraptor 4 years ago

      I was waiting for the author to check the memory usage of rsyslog and becoming enlightened... but it didn't happen. Reminder: check your assumptions/result after changes. He could learn that rsyslog uses way more shared memory than journald (>900M on my system) and it doesn't matter.

  • axyOP 4 years ago

    Memory use has always been a mystery to me and I can easily miss some things. Thanks for pointing to. Anyway, the right solution for me is tldr, all the rest is shit.

    • iforgotpassword 4 years ago

      Yeah, the rest just shows that op likes to do things like they've always done it. How you can prefer to poke around syslog and ps output to determine the state of a service instead if just doing systemctl status is beyond me for example.

      • axyOP 4 years ago

        Because systemctl status is called by monitoring tool. It's a habit, yes. If monitoring shows the service is down, no point to manually use systemctl, and syslog with ps become best friends. And sometimes date command as well. PIs don't have hardware clocks and wrong date may lead to errors that look mysterious.

      • djbusby 4 years ago

        That (syslog,ps) method was likely formed by habit during the many years before systemctl existed.

saurik 4 years ago

Would doing something like this work around the "journald drops the most important error messages" issue that has been known/outstanding for ten years (bug moved to GitHub six years ago), or is that more of a fundamental design mistake in systemd itself?

https://github.com/systemd/systemd/issues/2913

https://bugs.freedesktop.org/show_bug.cgi?id=50184

  • frankjr 4 years ago

    It's not accurate to say it "drops" error messages. The bug causes these messages to not be attributed to a particular unit - you can still see them with `journalctl` but not with `journalctl -u foo`. Still pretty annoying and should absolutely be fixed (although I'm not sure if systemd is the right place to do it).

    • puffoflogic 4 years ago

      > although I'm not sure if systemd is the right place to do it

      It's truly impressive how systemd turns out the be the right place to do absolutely anything and everything from bootloader to init to dhcp to ntp to network shares... except fix systemd bugs. systemd doesn't seem to be the right place to do that ever. Someone else needs to do it. For another example of this phenomenon, see the nohup bug.

    • saurik 4 years ago

      This is actually extremely useful for me to know, so thank you so much for pointing this out!

db48x 4 years ago

Weird; the systemd journal is the feature I want most! It would be the last thing I would ever consider disabling.

  • heretogetout 4 years ago

    I don't know if this is still an issue but the last time I used journald the logs would occasionally become corrupted and journalctl would refuse to read them. The fix was to just delete the logs. I have no idea how logging got so screwed up that corruption in part of the file could make the rest of the log file unreadable. I mean, it's a journal, it's right in the name.

    Ever since then I switched to rsyslogd and the like. Rock solid.

    • viraptor 4 years ago

      Keep in mind that rsyslog doesn't even attempt to verify logs. An alternative explanation is: my system is corrupting logs, I changed to a logging daemon which doesn't tell me about it.

      I mean, there could definitely be a bug in journald, but I haven't seen any fixes mentioned in changelog for the last 5 years and if it was happening in standard usage, people would notice.

      For recovering corrupted logs - you can still "less" them as usual. They have some extra markers, but the text is available as text. Journalctl has some special options for that too.

      • heretogetout 4 years ago

        IIRC when I experienced this the logs were all in some binary format that couldn't easily be less'd. tbh I didn't do much investigation other than to see the "delete all your logs" resolution suggestion. There could have been a better option.

      • lokar 4 years ago

        The fact that it can't seem to recover from a few bad records and gives up on the whole file demonstrates what terrible software it is.

        • viraptor 4 years ago

          Have you tried it? Journalctl does skip bad entries and prints out the rest automatically. If you've found a case where it doesn't, you should report that as a bug.

          • heretogetout 4 years ago

            It may do that nowadays but it definitely didn't back when I experienced this issue. This matches my experience:

            https://www.reddit.com/r/linux/comments/1y6q0l/systemds_bina...

            • viraptor 4 years ago

              Yes, 8 years ago things had more bugs than today / were less mature. It's silly to call something terrible software today because of that. Everyone can find a pet bug they run into years ago. shrug

              • lokar 4 years ago

                Not really a bug. A basic design flaw, since corrected it seems.

            • db48x 4 years ago

              Lots of hardware problems on display, especially suspend and resume which is notoriously buggy (broken ACPI tables that happen to work in Windows so the hardware manufacture never noticed they were busted, etc). I recommend spending extra to get ECC ram, and running ZFS filesystems. Both can catch a number of types of errors before they corrupt your data. With those precautions I haven’t lost any data in many years.

              Though one time at work we had a few thousand hard drives from a particular vendor that had an interesting firmware bug. Very, very occasionally they would write a sector with incorrect data. No individual drive did it very often, but after a few thousand full drive writes we noticed it half a dozen times. We also discovered that the garbage data was always the same across all of the drives. Crazy. Sadly we weren’t running ZFS on those systems, which would have caught the problem and corrected it from redundancy. Thankfully we were able to get a refund. Never put your trust a hard drive.

              To get back on topic, I’ve always assumed that journald was reasonably robust against minor corruption, but honestly I’ve never had a reason to test it. At the end of the day no one component of the system is solely responsible for data integrity; every level of the hardware and software must cooperate to prevent corruption else there will be cracks for the data to slip through.

    • db48x 4 years ago

      Honestly, that sounds more like a disk problem than a systemd problem.

andrewstuart 4 years ago

systemd is incredibly useful and powerful.

If you are a developer on Linux then you really owe it to yourself to learn as much as you can about systemd.

If you really understand systemd then you'll find yourself architecting your software around its capabilities.

systemd, if you understand it, can mean you can completely avoid large chunks of development you might otherwise have assumed you need to do.

socket activated services, nspawn, traffic accounting, the list of juicy goodness goes on and on....

Ignore the haters, they wouldn't hate if they dedicated their energy to understanding systemd instead of hating on it.

In 2022 you are a really an incomplete full stack developer if systemd is not one of the technologies you know very well.

  • coredog64 4 years ago

    > systemd is incredibly useful and powerful.

    In that one sentence I think you've highlighted what garners hate from the haters. It's powerful, and it's great when that power is used for good. Those times when it's not is where people get agitated.

    • qbasic_forever 4 years ago

      Do you have an example of when systemd is not used for good?

      • houzi 4 years ago

        It's quite annoying to not be able to use common tools for DNS queries like `dig`, when using `systemd-resolved` for DNS. I think you might have to sometimes flush the caching feature of `systemd-resolved` as well.

        It's fine, I guess. But it did take a while to learn the new ways.

        I think this is the main complaint from users with decades of experience. Their scripts and old knowledge stops working.

      • egberts1 4 years ago

        it kills production daemons having long running stateful data whenever the netdev goes offline., systemd-networkd, that is.

        • viraptor 4 years ago

          Only if you configure it to do that. You can use Wants= and After= for your service to start it after the network, but not restart otherwise.

          The behaviour you describe is not unreasonable to want: If your network device goes away, what exactly are you binding the socket to? How are you restoring the listening when it comes back up? But it's up to you to say which one you want.

          • egberts1 4 years ago

            Its never a good systemd-networkd default to be killing daemon whenever a netdev goes “cricket”.

            • viraptor 4 years ago

              It's not a networkd default. Networkd doesn't kill or even care about other services. The setting exists on the service's unit side. You have to explicitly specify how you want to depend on networking.

              • egberts1 4 years ago

                Try it. Pull the Ethernet cable.

                • viraptor 4 years ago

                  I do that every day moving between WiFi and plugged in Ethernet. It seems you're misunderstanding what exactly is killing processes in your case. Try to collect the logs and maybe ask on serverfault. I promise networkd is not killing your processes and if you don't have strict dependencies in your unit file, systemd is unlikely to do that either.

                • crazy_hombre 4 years ago

                  An ethernet interface is not a netdev. Netdevs are always virtual interfaces.

                • egberts1 4 years ago

                  it is a good thing you have no states to care for.

  • axyOP 4 years ago

    I don't hate systemd, it simply makes me upset from time to time. If you're a developer, I'd suggest to look around and try writing portable software. Investing your precious time in systemd is pointless IMHO unless you're a systemd developer. It's alike windows: if you a developer on windows learn its incredibly powerful service subsystem... If people made systemd they definitely needed it but I gave it a try and became convinced I don't need it. So why pushing it to me in my favorite distro without any alternative? To force me stop using that distro? To force me to look around and choose another one? Is that the ultimate goal of higher forces? Well, I have to agree, that's not bad, but this distracts me from more important things.

  • gavinhoward 4 years ago

    systemd was the first init/supervision system I used. I was able to understand how to use it pretty well.

    I still hate it.

    Sorry, but understanding systemd does not preclude hating it.

    In fact, it's because of the "haters" that I decided to dive deep into what init/supervision systems are and can be. Without that deep dive, I would have always thought that systemd was great. Afterward, however, I know just how wrong that is.

    If a developer's only interaction with init systems was SysV init, yes, of course, systemd is the greatest thing since sliced bread.

    But systemd could have been so much better...

    Disclaimer: I'm writing an init/supervision system to be so much better. And simpler. Orders of magnitude simpler. Oh, and one that doesn't reach its fingers into every part of your OS.

    tl;dr: systemd is better than SysV init, but it's not the end-all-be-all of init/supervision systems.

    • ahartmetz 4 years ago

      I have similar experience with PulseAudio. First look: it's unstable: Second look: it's fairly well thought out. Third look (using it as a programmer to implement some audio routing thing): it's kind of garbage. I am now using Pipewire, and Pipewire is pretty great and very flexible because it has the right foundations, so it can do anything.

frankharv 4 years ago

I have but one word for this guy.

Devuan

I see no reason not to consider it.

He is a Debian user who hates the way systemd works.

Devuan is for you.

  • dark-star 4 years ago

    He doesn't even know how to read man pages (journald.conf(5) describes the setup he is looking for), and he doesn't know how memory usage on Linux works (shared /mmap()ed memory is counted for each process). Do you really think his issues go away when he switches to Devuan? He'll probably just yell at different clouds

    Other than that, Devuan is a solid choice for people who want to get rid of systemd. It comes with the Debian-typical rather old versions of most programs, but I guess for a file server it doesn't matter much if you run kernel 4.19 or 5.15

    • Arch-TK 4 years ago

      The systemd man pages are a book. And, just for avoidance of doubt, that's not a good thing. I am never surprised when someone can't find out how to configure systemd to do what they want. It's just too enterprise grade.

      • dark-star 4 years ago

        journald.conf(5) is ~2500 words and ~230 lines on a terminal 130 chars wide. Not exactly a book. systemd-journald(8) is ~220 lines and systemd(1) is ~750 lines. Big? yes. But nothing compared to some other man-pages (ever tried `man gcc` or `man bash`?)

        People complain about how bad or non-existing Linux's man-pages are compared to the BSDs, and then systemd comes along with a really extensive and well-written set of manpages and people complain that it's too much

        Can't make everyone happy I guess...

  • Arch-TK 4 years ago

    I recently got a new work laptop and decided to take the opportunity to switch from using windows 10 with a linux VM (please don't try to sell me WSL2, I'm not interested) to plain linux. I decided to pick devuan stable since the base machine needs to be... stable. I have no interest in using systemd on anything which needs to be stable. I feel happy for anyone who has never encountered severe stability issues with systemd but I am not that person. Devuan comes with sysvinit which is also trash but it offers the opportunity to use other inits. At this point in time I have switched it from sysvinit to runit and eventually ripped out the entire runit infrastructure that comes with devuan and replaced it with something heavily inspired by void's runit. This isn't great, it would be nice if devuan took things other than sysvinit+initscripts more seriously. Maybe even just switch to OpenRC since, while it is still not great (please stop using pid files) it's a hell of a lot more sane than the mess of init scripts devuan ships with.

  • TacticalCoder 4 years ago

    It depends what you're doing but for my "workstation", Devuan is indeed working flawlessly. I'm running it since six months or so on my main PC (a little Ryzen 3700X / 32 GB or RAM) and I'm very happy with it.

    Now if you're a sysadmin and have come to rely on systemd and are now locked in, Devuan is obviously not for you. But if you're running Linux on a desktop or on a laptop and aren't a fan of systemd, Devuan is great.

  • seclorum_wien 4 years ago

    Three cheers for Devuan! :). Completely agree, this would've been an easy distro switch-a-rooni ..

  • egberts1 4 years ago

    if you run a Debian-based production server, then Devuan is for you.

    Devuan can allow your precious daemon to stay up despite netdev going offline, unlike systemd-networkd which would kill the daemon.

    This is quite important if your large ling-running daemon has statefulness data.

    • dralley 4 years ago

      As someone pointed out elsewhere in the thread, so can systemd, your service did what it was configured to do. If you want the service to stay up when netdev goes down, then don't tell systemd that netdev is a hard requirement for your service.

      • egberts1 4 years ago

        Its never a good systemd-networkd default to be killing daemon whenever a netdev goes “cricket”.

        • dralley 4 years ago

          But that's just it, to say this just demonstrates you don't understand the systemd service model.

          There is no "system default" in the loop here at all. The service told systemd that it cannot operate without netdev, and systemd behaved accordingly. If the config was written more appropriately, it would have behaved accordingly. The "default" is what you had written in the unit file.

m463 4 years ago

"when I needed systemd binary logs? - and I realized I never needed them."

that is my sentiment.

Linus was hesitant about binary logs.

I think they are just not unix. They are doing the wrong thing correctly (I would prefer doing the right thing poorly - or better yet doing it well)

rob_c 4 years ago

Can we just admit jourald is a thorn in the side of people taking systems seriously. The default behaviour even on polished distros is just bad and it's mainly because of the mindsets behind it...

enriquto 4 years ago

Spam omelette with spam sausage and spam does not have too much spam in it!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection