When you deleted /lib on Linux while still connected via SSH (2022)

191 points by todsacerdoti 2 months ago

One time I was flipping back and forth between directories, compiling some code, then checking it, then rm -rf'ing it. I accidentally hit up an cd ..'d one too many times. Suddenly the rm command hung and I was confused because it should be nearly instant as it was a few files. I stared in horror as I was accidentally deleting everything in mybhome directory. Luckily back then I had a .pr0n directory with a significant amount of content. A few things were lost but that .pr0n folder was luckily early enough in the list and big enough to slow down the deletion of my photos and documents. That's why I always recommend having a big "buffer" of video content for such situations, ya know, for data integrity ;)

winwang - 2 months ago

That's wild and inspirational. Like the scene in Rush Hour when a stack of bills saves Tucker from a bullet.
Though I'm moreso tempted to just create a `.1111aaaa-antidumb` directory and store my caches and backups there.
This has also un-inspired me from creating a fast `rm`-esque utility.
- woleium - 2 months ago
  
  You could of course alias rm -rf to rm -rf -i
  - marc_abonce - 2 months ago
    
    I always use trash-cli and alias rm to 'echo NO! #'.
    Only if the file is too big to fit into the garbage bin, I can unalias rm, rm the thing and then reset the alias immediately after.
    
    tarxvf - 2 months ago
    
    fwiw you can usually bypass aliases ad-hoc.
    In Bash I believe instead of `ls` you can `\ls` to get the unaliased version.
    
    pests - 2 months ago
    
    You can also use the full path to the program, enclose it in quotes, or call "unalias ls" to disassociate it for that shell.
  - bardan - a month ago
    
    There is also safe-rm https://adamheins.com/blog/a-safer-rm
  - immibis - a month ago
    
    I thought -f cancels out -i, and when you alias "rm=rm -i", and then run "rm -rf" you get "rm -i -rf" and -f is last so it still wins.
LinuxBender - 2 months ago

That's why I always recommend having a big "buffer" of video content for such situations, ya know, for data integrity
Good idea, for safety of course. Just for completeness sake I would add one should not rely on that especially if they are using XFS and especially if using a hardware raid controller. The recursive rm may complete faster than the inodes are removed due to the nature how XFS among a few other filesystems operate in the background. I doubt many are using hardware raid controllers on their workstations but if one is on a server there is a chance there may be one and those will also perform some transactions in the background and one may get their prompt back sooner than the inodes have actually been removed. This is all edge case off course. People should have a massive .pr0n folder regardless.
OKRainbowKid - 2 months ago

Great idea, I shall see to it right away.
contingencies - 2 months ago

Userpace pronfs provides high latency unlinking?
- satiric - 2 months ago
  
  Turns out that pronfs uses fsck in the background to provide the delay
hinkley - 2 months ago

Unless you were using tcsh you had to rewrite your shell configuration though.
JoshTriplett - 2 months ago

rm has a --preserve-root option that's enabled by default, which prevents you from removing / . I wish it had a --preserve-home option too.
mekster - 2 months ago

Seriously, start using trash-cli. Even Windows from 30 years ago had a recycle bin.
I can’t grasp how “power users” like Linux users are stuck working in primitive environments.
- accelbred - 2 months ago
  
  Nah, recycle bin is the wrong model. Automatic filesystem snapshoting is the way to go. Snapshots are cheap and let you recover full state from your history.
  Just use btrfs, and set up btrbk to snapshot your home directory every 5 minutes, and have its gc keep every snapshot from past few hours, an hourly snapshot for past few days, and weekly for past few months.
  - miki123211 - 2 months ago
    
    I wish filesystems had a first-in, first-out "garbage heap" model.
    The problem with snapshots is that they take up space. The trash has the same issue, with the added drawback that you have to think about emptying it, and if you empty it before realizing you made a mistake, it won't help you.
    With a "garbage heap" model, all deleted files would automatically end up on the "heap" when deleted. The heap would be exactly as large as the amount of free space you have left, and shrink when necessary by deleting the oldest files, perhaps with a minimum size (expressed in days) that would require manual action to shrink beyond.
    
    E39M5S62 - 2 months ago
    
    Snapshots on ZFS only take up space when files change. It's easy to use one of the many snapshot managers to accomplish what you want - I do it for my workstation and laptop. I have zero fear of deleting files, upgrading the OS, making sweeping changes to my system config, etc.
    
    accelbred - a month ago
    
    On a CoW filesystem like btrfs/bcachefs/zfs, a snapshot only has the overhead of the delta, so are cheap.
    Btrbk can also take disk space into account for garbage collecting snapshots? Not sure.
    
    VoidWhisperer - 2 months ago
    
    There would still have be some buffer between the amount of space you have left and the size of this heap - even if the OS were changed to treat this specific case as not being out of disk space, it would significantly impact write performance on the disk once it is 'full', because any disk operation would have potentially double the number of operations required compared to being able to write it to empty space on the disk
  - WhyNotHugo - a month ago
    
    Snapshots of home directories grow to ridiculous amount due to trash that piles inside of it. Off the top of my head:
    - ~/.local/share/docker is 29G of data that continuously mutates. - Rust/Cargo produce "target" directories in $PWD whenever they run, which is gigabytes of crap littered all over the place.
    Keeping snapshots of home would take up a huge amount of disk space, especially if you ever want to keep more than one.
    That said, keeping TWO snapshots for 24hs for "un-deleting" files might be an interesting idea.
    
    accelbred - a month ago
    
    I personally only snapshot Documents, Pictures, my git repo dir, etc.
    Stuff like .cache and podman stuff stays unsnapshotted.
    I just recommend to others to snapshot home since most people don't want to bother categorizing their data.
  - wruza - 2 months ago
    
    Recycle bin is just a different model. I have full disk weekly and homedir daily backups and could use live snapshots too, but it’s just easier to dig into the bin sorted by deletion date.
- Ferret7446 - 2 months ago
  
  If you delete things from a file manager in Linux, they all generally go into the Trash too.
  And if you rm/del/Remove-Item on Windows, it will also delete without sending to the recycle bin.
- 3836293648 - 2 months ago
  
  You're way too trusting of trash-cli here. I switched to it a while back and the only time I actually needed to recover something half the files in the folder weren't in the rubbsih bin, they were just gone
- LoganDark - 2 months ago
  
  Fuck recycle bins. Make proper backups.
  When I delete something I want it deleted. If I make a mistake I grab it from a backup or just re-obtain it.
jamesy0ung - 2 months ago

This is the best comment I've read on Hacker News in a long time

eitland - 2 months ago

Something similar happened a place I worked somewhere between 10 and 20 years ago.

Service technician did not see the . in the command

  rm -rf ./bin

so he proceeded to run

  rm -rf /bin

When that didn't work he did what everyone who knows a little bit Linux does and added sudo in front.

I was on a terminal from the other side of the globe when the server suddenly started acting weird.

We were able to use scp or rsync (one of then was in sbin or something) to get back the bits from an identical server, which saved me from three days of tedious work :-)

In hindsight of course we should have written the docs in a way that would prevent this exact situation but in the beginning it was just the output of the history command after I had done it, dumped into a document with some explanations.

ryao - 2 months ago

It is not the same thing, but recently, my pfsense router started acting weird where dns stopped working. I was bust so I rebooted it and it failed to boot. I ended up bringing a monitor and keyboard to see what went wrong and it turned out that suricata’s logs had used 1.4TB on my 250GB SSD (ZFS zstd compression is awesome), causing it to run out of space a few years after I enabled the feature out of curiously. I wiped the logs, rebooted and things worked.
My lesson from them is when a machine servers acting strangely, do not reboot and instead troubleshoot right away. Had I done that, I could have found the problem and had only minimal partial downtime (I would have had to restart dns afterward). Since I was too busy to do things the right way, my internet was out for a half hour. Your story reminded me of this, since you would have had a bigger headache if you had rebooted.
- hinkley - 2 months ago
  
  This is the origin story for anyone who has ever set up alerts for 90% disk utilization on a machine.
  These are especially bad with services that generally grow their logs very slowly but when something like a net split happens or a server is down they generate as much log data in an hour as they typically do all week. So you get close to full and then an incident happens, and now you have two incidents because the screaming machine goes down with a full disk right after you lose your internal DNS server or what have you.
  - 3eb7988a1663 - 2 months ago
    
    One trick I learned is to create a big random file (must be random to ensure no funny compression filesystem tricks out smart you) called something like BIG_DUMMY_DATA_SAFE_TO_DELETE. When you find yourself in a catastrophic space situation, you can delete the file and have a less panicky recovery process as the immediate problem is gone.
    
    nneonneo - 2 months ago
    
    Works right up until you have another incident, go to delete the file, and realize you already deleted it the first time…
- PhilipRoman - 2 months ago
  
  The funniest consequence of full disk was that I could not log in to a server, because the login process required writing some tiny temporary file in the user's home. Did not find a way to recover it, other than plugging a keyboard in it.
hinkley - 2 months ago

My second boss was a Sun Microsystems enjoyer and he always pronounced “superuser” as “stupid user”.
After a raised eyebrow he went on to explain, “because when the machine is broken it’s always because some stupid user did something.
It took me a couple more stupiduser incidents of my own before I instituted a rule of counting to five before hitting enter on any `rm -rf` command.
- Ferret7446 - 2 months ago
  
  > I instituted a rule of counting to five before hitting enter on any `rm -rf` command.
  That's just (should be) standard practice. Another risky command is `sudo dd`/`sudo cat` for writing disk images, always chant the disk device against an fdisk -l listing like a magical spell, lest you nuke your main drive.
  - cafeinux - 2 months ago
    
    When using `dd` I always verbally say "`if` is input file, because I want to take data from <whatever I wrote there> and `of` is output file, because I want that data to overwrite <whatever I wrote here>". Never have I ever had a `dd` incident, but that's such a fear I have that I developed this habit.
    
    ryan-c - 2 months ago
    
    of stands for obliterated file
    
    bear8642 - 2 months ago
    
    and dd is data destroyer
    
    thedrexster - 2 months ago
    
    This is the way!
  - hinkley - 2 months ago
    
    But it’s surprising how often you see the wheels turn when you explain it to someone else. It’s not obvious to everyone.
- jwrallie - 2 months ago
  
  I always prepend a # when writing possibly dangerous commands, mostly rm -rf and dd. Started doing it after I wanted to dd an .iso of a distro to my usb drive but accidentally my main drive.
  It can help if you press enter instinctively after finishing writing the command, and also against fat fingering an incomplete but valid command.
  - mook - 2 months ago
    
    I like to prefix it with `echo` so I can see what it's trying to run, especially when wildcards are involved. Or things like `find`.
    
    hinkley - 2 months ago
    
    Echo is also very good for chaining with xargs to make sure you didn’t fuck up.
  - eitland - 2 months ago
    
    I've done it too, but probably not as consistent as you.
mekster - 2 months ago

First, you don’t let people read and type commands by hand. Second, hire a better guy than someone who blindly does sudo because the command didn’t work.
- eitland - a month ago
  
  First, there is a difference between rolling out 5 such systems a year and modern server farms, second, replacing a brilliant electronics guy because he once messed up when he helped us out on site isn't something I'd want to have on my cv. Third this is well over a decade ago.
kees99 - 2 months ago
You were lucky scp uses a binary that typically lives outside /bin - in /usr/lib/openssh/ or some such.
Many years ago, I've got to recover a remote server where /usr was nuked (and /bin, /sbin, and /lib were all symlinks into now-empty /usr). Ended up writing a one-liner perl to convert /bin/busybox-static from my local machine into a series of:
```
  echo -ne "\x7f\x45..." >>~/busybox-static
```
...and copy-pasting that, chunk-by-chunk, into a single surviving ssh/bash connection, and then used that busybox binary to pull in from a backup.
- hinkley - 2 months ago
  
  I have learned through hard experience that is you ever user sudo to edit the sudoers file, create two shell windows logged in as root before doing so.
  Use visudo to edit the file of course, because not doing so can blow everything up by rendering the sudoers file unparseable and then everyone is gonna have a bad time.
  But also the temptation when altering sudo is to immediately log out as super user and try using sudo to do the new command. If you’ve fucked up the file you might not be able to sudo anymore. So now use your second shell window to undo whatever you just did in the first window.
  - ryao - 2 months ago
    
    I have setup ssh remote forwards to a jump host in the past to allow remote access through a firewall. daemontools executes scripts exec’ing ssh. ExitOnForwardFailure and ServerAliveInterval are set client side with ClientAliveInterval and ClientAliveCountMax set server side to enable rapid recovery if something goes wrong.
    Whenever one of the daemon tools scripts doing remote forwards needs to be modified, a second reverse forward script is added and the reverse forward from that is used for ssh, before changing the original script. The second script is removed only after confirming the first still works after the edit. This procedure prevents fatfingering from locking out remote access, since if something goes wrong, you just need to redo the previous step(s) until you get things working.
    If anyone wants to replicate that, I suggest setting -nNT as arguments to ssh and restricting what the user login can do via sshd_config.
    
    hinkley - 2 months ago
    
    SSH is probably where I picked up this trick. It’s the same problem. Now you have to find someone with more privs or physical access to fix your fuckup. Meanwhile people are waiting for whatever you were trying to do.
- - 2 months ago
  
  [deleted]
trelane - 2 months ago

> In hindsight of course we should have written the docs in a way that would prevent this exact situation
I would say the bigger failure is relying on a human typing things into a terminal rather than automating the tasks or changing the system so the task is no longer needed.
- hinkley - 2 months ago
  
  As if there haven’t been outages caused by incorrect directory interpolation in scripts.
  - trelane - 2 months ago
    
    Sure, bug happen. They (usually) happen reliably, and tests can help prevent/detect them.
    It is impossible to test for human errors in advance, though.
rav - 2 months ago

I often run rm -rf as part of operations (for reasons), and my habit is to first run "sudo du -csh" on the paths to be deleted, check that the total size makes sense, and then up-arrow and replace "du -csh" with "rm -rf".
- mekster - 2 months ago
  
  Use trash-cli and additionally git commit the target if you’re nervous before deletion.
micw - 2 months ago

It's always a good idea to allow sudo to untrained people on critical systems...
malkia - 2 months ago
What would be safer alternative?
```
    pwd # Then check something?
    pushd bin #
    rm -rf .
```
Probably still with pitfalls
- cortesoft - 2 months ago
  
  Write a script that does the steps required?
  If the problem is defined enough to create an exact series of commands for an operator to execute, it is defined enough to create a script to do it for you
- pphysch - 2 months ago
  
  Use the full path of the bin dir in your rm rf
  - hinkley - 2 months ago
    
    rm -rf ~/bin could have some nasty consequences if you fat finger a return key anywhere in the middle.
    Run enough commands enough times and you will find Murphy is waiting for you.
    If you’re just removing a bin directory one time, odds are low but not zero. If you’re writing a run book for people to use, odds are 100% that you will have to help someone rebuild at least once.
- 01HNNWZ0MV43FF - 2 months ago
  
  `rm -rf bin`
  - bigstrat2003 - 2 months ago
    
    And not just for bin. There's probably an edge case where you would need to give the ./ prefix to rm, but I've never come across it. The vast, vast majority of the time just entering the name of the thing is easier and less error-prone.
    
    - a month ago
    
    [deleted]
  - OJFord - 2 months ago
    
    I think the misguided belief that `./` means 'execute script' or something (program that isn't 'installed'?) is single-handedly to blame for so much script spaghetti.

inejge - 2 months ago

The heroic version of the story is now almost 40 years old[1]. (One HN mention with the link to a HTMLized version is here[2].) In both cases, the upshot is that as long as you have a running shell with root privileges, at least one existing executable file on the filesystem, and the means to overwrite that file with arbitrary binary content, you can write a small program which can recreate a skeleton system structure and dig yourself out of the hole.

The reason why this keeps happening is that in regular UNIX root is omnipotent and the filesystem is ultimately unprotected. Immutable systems and restricted execution environments may make this a thing of the past.

[1] https://www.wolczko.com/rm.txt [2] https://news.ycombinator.com/item?id=7892471

ryao - 2 months ago

The technical term is unlinked. The files in use by running processes are still anonymous files in the filesystem. They will not be garbage collected until the last program that mmap’ed them is gone. If you could find the inode number from /proc (possibly from /proc/$PID/maps), you should be able to use a filesystem specific tool to retrieve them, such as debugfs or zdb.

abound - 2 months ago

I think the tricky piece is that you'd need to find that inode number without using any dynamically linked libraries, including ls and cat in the author's case. And debugfs is likely dynamically linked too (I just checked and it is on my machine)
- ryao - 2 months ago
  
  Presumably, the read shell builtin can be used to read files in /proc. As per the original article, you can get a static version over the network using only bash builtins and overwrite a file with execute permission to be able to execute it.

lloeki - 2 months ago

A long long time ago the team I was part of managed old unix systems.

A coworker telnet'd (or rsh, can't recall) into such a machine to do some maintenance and after a while fat fingered:

    umount /

Would you believe it, back then being root meant this was absolutely unprotected and the minicomputer OS (some ancient AIX) dutifully complied.

The chaos that ensued is but a blur.

teaearlgraycold - 2 months ago

Did you just restart?
- WesolyKubeczek - 2 months ago
  
  Wasn’t restarting those AIX dinosaurs a nontrivial thing?
  - teaearlgraycold - 2 months ago
    
    I could believe it. I’m just hoping OP can provide more info.
- dingaling - 2 months ago
  
  Using what command, though?
  - teaearlgraycold - 2 months ago
    
    Perhaps there's a restart button?

vessenes - 2 months ago

I “rm -rf /“ ed my Linux box in 1996 and realized what I had done about 30 seconds in. It was through like a-d in /bin.

‘Recovery’ wasn’t a strong concept in Linux os installers then, and I wasn’t sure my home directory would survive a reinstall, (not on a separate partition) and I didn’t think I’d be able to reboot successfully in any event.

My brother ran the same distribution but at a school across the country; I was able to recover by carefully pulling what I needed down with ftp to his dorm computers IP. I’m not sure scp even existed then on Linux; I guess maybe we could have used netcat in a pinch. Well, this was already a pinch. In a real pickle.

This was one of the first moments where high bandwidth connections to endpoints really saved me / impacted me, and I haven’t forgotten. For many years after university this kind of direct high bandwidth connection got much harder to achieve, first because we were back in a low bandwidth residential world, then because ipv4 was mostly denied to consumers, then because of nat.

Today this would again be achievable, but with vastly more complexity. For a home computer rescue you’d want Tailscale in both sides. And it’s extremely unlikely you’d be using the same distribution as your sib, much less have the same libc linking. And god help you if you had to restore your systemd directory by hand.

nasretdinov - 2 months ago

I think I now know why on macOS "Applications" (with capital A) comes first in the list of root-level directories. I've been showing to my colleague that on modern systems "sudo rm -rf /" does nothing, however that turned out to be GNU-specific thing, and macOS of course uses BSD tools. So I've started noticing that something is off very quickly and stopped the command. On HFS+ (probably on APFS too) the file list is always sorted, so it only deleted some of my Adobe apps and stuff like Addressbook, so I didn't even notice anything at first :)

nullorempty - 2 months ago

At the start of my career I removed the `x` attribute from all files :)

ryao - 2 months ago

Of all of the stories I have read here, this is the first to make me laugh. Congratulations. :)
- nullorempty - 2 months ago
  
  ... we had an amazing sysadmin. He had a shell open on that box when I came to tell the news. He started to type quickly and thoughtfully, trying utilities I haven't even heard of. Then, he echoed a small C program that was supposed to set `x` on the chmod. Quickly he typed `cc x.c` to compile... and then it dawned on him.
  - ryao - 2 months ago
    
    Perhaps scp -p or sshfs would have worked. This is just a guess. I am not certain.
ivanjermakov - 2 months ago

chmod -R is too convenient lol

zavec - 2 months ago

I actually started a short blog series about a similar problem where a friend had blown away /bin and a bunch of other stuff, but/lib was still there. Unfortunately it didn't end up getting anywhere because even though I was able to drop executables on the machine with echo and make them executable with a .so from lib I wasn't able to get back to root permissions as sudo and everything had been blown away and I didn't think I'd have great luck trying to find a zero-day in the kernel. It was still a lot of fun though.

Dwedit - 2 months ago

If you deleted /lib, you'd probably be better off reinstalling packages while booting off of USB or something. You're gonna have downtime because programs won't work correctly.

LorenDB - 2 months ago

I also had to wonder why not just liveboot from USB or attach the affected boot medium to another system, then use the recovery system's fully working tools to just relink the /lib folders?
- ideasphere - 2 months ago
  
  “This is not a very important machine, and I could have just reimaged the MicroSD card and be done with it, but I was curious if I could recover from the error.”

nurple - 2 months ago

My workstation seems fine:

  $ ls -R /{lib,usr,bin,sbin}
  ls: cannot access '/sbin': No such file or directory
  /bin:
  sh

  /lib:
  ld-linux.so.2

  /usr:
  bin

  /usr/bin:
  env

Oh right...

  $ ls -l /usr/bin/env
  lrwxrwxrwx 1 root root 65 Mar 21 23:39 /usr/bin/env -> 
  /nix/store/9m68vvhnsq5cpkskphgw84ikl9m6wjwp-coreutils-9.5/bin/env

  $ ldd /usr/bin/env 
        linux-vdso.so.1 (0x00007ffff7fc4000)
        libacl.so.1 => /nix/store/dyizbk50iglbibrbwbgw2mhgskwb6ham-acl-2.3.2/lib/libacl.so.1 (0x00007ffff7fb3000)
        libattr.so.1 => /nix/store/vlgwyb076hkz7yv96sjnj9msb1jn1ggz-attr-2.5.2/lib/libattr.so.1 (0x00007ffff7fab000)
        libgmp.so.10 => /nix/store/dsxb6qvi21bzy21c98kb71wfbdj4lmz7-gmp-with-cxx-6.3.0/lib/libgmp.so.10 (0x00007ffff7f06000)
        libc.so.6 => /nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/libc.so.6 (0x00007ffff7d0e000)
        /nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/ld-linux-x86-64.so.2 => 
        /nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fc6000)

MadnessASAP - 2 months ago

Don't have to worry about trashing FHS if your OS doesn't use FHS :-P

ofalkaed - 2 months ago

Many years ago I wrote a backup script where I did "rm -rf /etc/" instead "cp -r /etc/ /mnt/whatever," not sure how I managed that. Took ages to figure out what was causing /etc/ to disappear since /etc/ going missing often went unnoticed for awhile and I was running Arch back in the days when running pacman -syu always caused exciting things to happen. I even did a complete backup and reinstall trying to figure that one out, and I was extra cautious about making a backup of that backup script which I had spent so much time on and was fairly proud of since it was my first non-trivial bash script.

I also once did "rm -rf /", was deleting a dir which started with a "[" and accidentally hit "enter" instead of "\." That one taught me the dangers of absolute paths.

Edit: That last one is not quite right, would not have been an absolute path issue, that dir must have ended up in root somehow, can't quite remember the details, been too long.

Vilian - 2 months ago

Isn't absolute path better practice than relative paths?
- ofalkaed - 2 months ago
  
  In scripts, yes, but they are not without their dangers and can be especially troublesome when working in the terminal since almost no one types out and checks the full paths constantly, they just let tab completion take care of it and assume it worked.
  In scripts things like ../../../../file are a pain to read and assume everything in the script before all those previous dirs worked as it should have and everything is where it should be. Cd to an incorrect absolute path produces an error code so we can be sure we are in the proper dir, cd ../../../ will never produce an error and always succeeds even if you are at root.
  - 3np - 2 months ago
    
    > almost no one types out and checks the full paths constantly, they just let tab completion take care of it and assume it worked.
    Are you a bootcamp coach having watched countless of people at the terminal? What makes you assume this is a general habit?

throwanem - 2 months ago

"When you discover yourself to be in a hole, the best first thing to do is stop digging."

Some of the best early professional advice I ever received was, in moments like these, to keep my hands off the keyboard for at least a timed minute.

mastax - 2 months ago

    chattr +i /lib/ld-linux.so.2

Sounds tempting.

scripturial - a month ago

‘rm —rf’ should have default behavior to prompt “are you sure” when deleting top level folders. People are literally talking about secret video files saving their files from being accidentally destroyed.

How in 2025 do we not have these types of innovations? Are people afraid of breaking old scripts?

(macOS saves us from a lot of this stupidity, Linux should have something similar. I would love a way to mark folders as “permanent” or as “restore on reboot”)

conceptme - 2 months ago

A long time ago I started working at this company and I fat fingered 'rm -rf /' instead with a dot.

When realizing what I had done and it was taking too long I powered off the machine. When I told the sysops he looked horrified and asked me how far it got.

The fun thing was all websites and data where mounted from the server to each workstation to make it easy to update source code.

chatmasta - 2 months ago

I’ve seen this same story in exploit writeups - it’s funny to see “living off the land” techniques used non-maliciously.

smw - 2 months ago

I had someone do this on an important production Solaris machine once, many years ago. Luckily they just moved /lib instead of deleting it -- and on Solaris (some of?) the binaries in /sbin were statically linked, including ln. Hard linking /lib back to the correct path was enough to recover.

ryao - 2 months ago

Do you mean moving it? You cannot make a hard link for a directory. That is disallowed on Unix to prevent the creation of orphaned subtrees (i.e. ln a directory into itself and then rmdir the original link).
- yjftsjthsd-h - 2 months ago
  GNU ln has this ( https://man.archlinux.org/man/ln.1#d ):
  -d, -F, --directory allow the superuser to attempt to hard link directories (this will probably fail due to system restrictions, even for the superuser)
  which implies that that's not quite an absolute limit. I don't see any comment either way on https://illumos.org/man/1/ln , but it's plausible that some version of Solaris had wiggle room; it's a terrible idea for obvious reasons, but there's really no hard technical reason why a system couldn't allow you to create hard links to directories.
  - ryao - 2 months ago
    
    The Solaris VFS is flexible enough for a filesystem to support this, but any filesystem that does violates POSIX according to a comment in ZFS. A check of the illumos UFS driver source code reveals that it violates POSIX by permitting root to do this. I wrote more about this with links to relevant illumos source code here:
    https://news.ycombinator.com/item?id=43447288
- wizzwizz4 - 2 months ago
  
  There are two checks prohibiting it:
  • Non-root users aren't allowed to make directory hard links.
  • Many versions of the userspace program `ln` don't let you do it.
  But the `link` system call can, at least on Solaris, if called by user 0. (Not sure about Linux: I tried it once and it didn't work, but I was doing weird things with FUSE and also trying to name the link `..`, so I don't know why it failed.)
  - ryao - 2 months ago
    
    It is more nuanced than that. First, the Solaris 11.4 documentation says that this is not allowed:
    https://docs.oracle.com/cd/E88353_01/html/E72487/link-8.html
    The illumos man page is less clear:
    https://illumos.org/man/8/link
    The illumos ZFS driver makes it very clear that this is not allowed under POSIX in a comment and explicitly disallows it in zfs_link():
    https://github.com/illumos/illumos-gate/blob/master/usr/src/...
    However, It appears that the illumos UFS driver supports this:
    https://github.com/illumos/illumos-gate/blob/master/usr/src/...
    Presumably, the Solaris 10 UFS driver also supports it (or supported it in older versions of Solaris 10). Given that someone at Oracle likely modified the Solaris man page to differ from the older OpenSolaris man page in illumos, I would expect recent versions of Solaris to disallow this on UFS, but someone would need to check.
    That said, I have to recant my previous comment. smw likely linked the directory, which is insane, but would have worked on older Solaris versions if we assume the modern illumos UFS driver is unchanged in this regard.
  - - 2 months ago
    
    [deleted]

asah - 2 months ago

This happened to me once. I caught my breath, found an identical server and copied the files over using uuencode and cat, until I had enough working to create a second terminal window and could use better tools to verify everything.

5 minutes that felt a lot longer.

doubled112 - 2 months ago

Sounds similar to a time I accidentally removed a bunch of system libraries with pip. I wish I would have had a similar server to recover from.
Yum being a Python program made recovery tedious, since some of the libraries it depended on were installed via RPM and gone. Not a difficult recovery, but tedious.

udev4096 - 2 months ago

Reminds me of this: https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issue...

jmclnx - 2 months ago

Interesting, at one time ages ago I thought Linux came with a static linked bash in /bin or maybe /sbin for this type of problem.

Just checked Slackware and no more, I wonder if that is a casualty of the /bin /usr/bin merge ?

ryao - 2 months ago

Gentoo used to include a static busybox binary for recovery purposes, but it was removed from the system set years ago. Now you are expected to use the initramfs for recovery and can install bust box yourself if you are concerned. In fairness, this particular issue would be made worse by rebooting since the unlinked anonymous library files would be garbage collected, which is bad unless you have a filesystem that has snapshots like ZFS and had used them prior to this.
The article author did not try to recover libraries from the anonymous files, which is probably good considering that only a subset would have been in use and thus only that subset would be recoverable from the anonymous files (unless there are filesystem snapshots).
pengaru - 2 months ago

once upon a time everything in /sbin was statically linked
then only `sln` was statically linked (a variant of `ln`)
today most distros statically link nothing and you're up shit's creek in this situation

Fizzadar - 2 months ago

I once nuked the entire OS partition on an openvz host. Vz data was still good so we ended up copying the root fs from another similar box, manually updated the network config and it ran for another 4 years until retired.

userbinator - 2 months ago

This could be considered a variant of the "bootstrapping" problem.

nunez - 2 months ago

Super interesting, though it feels like a response to an interview question!

o11c - 2 months ago

> Please note that busybox can’t function with a name that is not a busybox applet name.

This is somewhat wrong. From ksh/bash/zsh, you can run:

  (exec -a someappname /arbitraryexecutablepath args...)

This won't work on most ash derivatives (including /bin/sh on Debian, FreeBSD, or NetBSD), but does work on busybox ash.

The parentheses prevent the `exec` from actually replacing your current shell, which might be less important for emergency rescues, but which otherwise is often what you want with `exec -a`.

- 2 months ago

[deleted]