Isolating operating system processes 🐧📦

An operating system process is a single execution of task. This execution is dependent on an environment which contains the necessary resources to ensure a successful run. To isolate a process like this takes a few steps. Isolating a process’s view to the outside world, then isolating necessary dependencies for the process, then the resources required by it to run, then more isolation based on access to different features of the underlying system itself, isolated communication between isolated processes, we are getting a little out of hand here.

We are going to perform the first step of the process isolation with isolating the view of a process.

Containerization at it’s core is process isolation. At any given point, a process will contain the program that is running, the memory allocated to the process, the CPU state a list of open files and other resources such as IO devices. To isolate a process we can use tools provided by the operating system kernel.

15th November 2024

Process isolation in Linux requires certain Linux Kernel features to ensure isolation of views between proceses.

16th November 2024

Let’s create a new root directory for our isolated process. From now on we are also going to refer to this process as a container for the sake of brevity. As we are getting starting from scratch from a new environment we will ensure we have some of the basics atleast. After creating the directory structure for the container, we are copying the bash and ls commands commands for some initial navigation through the environment.

mkdir new_root                                                                           

mkdir -p new_root/{bin,lib,lib64}
# copy the commands you want in your container                                                 
cp /bin/{bash,ls} new_root/bin/

Let’s remove the access from all our current resources in our current root directory and then jump into the new root directory where we have a view of nothing except the commands ls and bash.

Let’s use chroot to change our root to a new directory called new_root.

sudo chroot ./new_root /bin/bash
chroot: failed to run command ‘/bin/bash’: No such file or directory

Here /bin/bash fails to run because it does not contain the necessary dependencies which it needs to run. We have no idea what dependencies the application is talking about but if it could tell us it would be great. For this purpose we will get the shared object for dynamic linker because it is used to resolve dependencies during process runtime.

cp /lib/ld-linux-aarch64.so.1 new_root/lib/

sudo chroot ./new_root /bin/bash

/bin/bash: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory

Now after running /bin/bash in the new environment again we will get a error with respect to an unavailable dependency. Let’s check which dependencies our two programs here bash and ls have. The command ldd helps us do that.

ldd /bin/{bash,ls}
/bin/bash:
        linux-vdso.so.1 (0x0000ffff8f630000)
        libtinfo.so.6 => /lib/aarch64-linux-gnu/libtinfo.so.6 (0x0000ffff8f430000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff8f280000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffff8f5f7000)
/bin/ls:
        linux-vdso.so.1 (0x0000ffff7ff02000)
        libselinux.so.1 => /lib/aarch64-linux-gnu/libselinux.so.1 (0x0000ffff7fe50000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff7fca0000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffff7fec9000)
        libpcre2-8.so.0 => /lib/aarch64-linux-gnu/libpcre2-8.so.0 (0x0000ffff7fc00000)

Copy these dependencies to the to-be-isolated root.

cp /lib/aarch64-linux-gnu/libtinfo.so.6 /lib/aarch64-linux-gnu/libc.so.6 /lib/ld-linux-aarch64.so.1 /lib/aarch64-linux-gnu/libselinux.so.1 /lib/aarch64-linux-gnu/libpcre2-8.so.0 new_root/lib/

Now that the necessary dependencies are copied, let’s change root into the new root again.

sudo chroot ./new_root /bin/bash

bash-5.1# ls
bin  lib  lib64

We are able to run all the commands we moved there but, not the command we didn’t i.e. ps. So let’s move ps and what all processes we can see.

bash-5.1# ps -aux
bash: ps: command not found

Let’s exit out of the container, copy the command and it’s dependencies then let’s try doing the same again.

bash-5.1# exit

Copy the command.

cp /bin/ps ./new_root/bin/

Use the following command to print the locations of the dependencies only so that we can cycle through them.

ldd /bin/ps | awk '{print $3}' | grep -v '^$'

/lib/aarch64-linux-gnu/libprocps.so.8
/lib/aarch64-linux-gnu/libc.so.6
/lib/aarch64-linux-gnu/libsystemd.so.0
/lib/aarch64-linux-gnu/liblzma.so.5
/lib/aarch64-linux-gnu/libzstd.so.1
/lib/aarch64-linux-gnu/liblz4.so.1
/lib/aarch64-linux-gnu/libcap.so.2
/lib/aarch64-linux-gnu/libgcrypt.so.20
/lib/aarch64-linux-gnu/libgpg-error.so.0

Now, let’s copy the above to our new root using the following command.

for dep in `ldd /bin/ps | awk '{print $3}' | grep -v '^$' `; do cp --parents "$dep" ./new_root; done;

Now, let’s run our ps command to see what happens next.

sudo chroot ./new_root /bin/bash

bash-5.1# ps
Error, do this: mount -t proc proc /proc

bash-5.1# exit
exit

Ok, seems like the command `ps` knows how to help us solve this problem. Let’s move the mount command into our container with it’s dependencies.

cp /bin/mount ./new_root/bin/

for dep in `ldd /bin/mount | awk '{print $3}' | grep -v '^$' `; do cp --parents "$dep" ./new_root; done;

Now, let’s run the mount command in the container considering that we copied this command to do what we were told to do by the ps process.

sudo chroot ./new_root /bin/bash

bash-5.1# mount
mount: failed to read mtab: No such file or directory
bash-5.1# exit

Now, looks like we need a mtab. /etc/mtab is a symlink and we can see the chain below.

ll /etc/mtab 

lrwxrwxrwx 1 root root 19 Oct  2 07:43 /etc/mtab -> ../proc/self/mounts

Even /proc/mounts is a symlink.

ll /proc/mounts 
lrwxrwxrwx 1 root root 11 Nov 15 21:04 /proc/mounts -> self/mounts

If we mount the /proc in the new_root to a correct location we can see that the ps command and the mount command work well after in our effort to get ps running finally.

sudo mount -t proc wavey ./new_root/proc

Run the mount command inside the chroot env. The command works properly after mounting the /proc.

bash-5.1# mount                                                                                           
wavey on /proc type proc (rw,relatime)

Run the ps command inside the chroot env. We see that the command works well but we still have a view of all the other processes running in the system. When isolating a process it is important to ensure that the process doesn’t have a lens into the outside functioning environment.

ps -aux

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
0              1  0.0  0.2 167900 11144 ?        Ss   Nov16   0:11 /sbin/init
...

To ensure that a process will have access only to certain files and also only have a view into it’s own functioning as a process, we are going to use namespaces.

At this point I was having a hard time figuring out the usage of the unshare command. Eric Chaing’s blog on containers from scratch really me get a clearer idea.)

In the above run of ps, we are returning the process information from all the other processes running on the underlying system. Let’s put this process in a namespace and ensure that this doesn’t happen. The command below let’s us run a command in a new namespace.

sudo unshare --pid --fork --mount-proc=$PWD/new_root/proc chroot ./new_root /bin/bash

As you can see above, we are using the —fork, —pid, and —mount-proc flags.

With fork, we are forking the execution into a new child process so that the process becomes PID 1 in it’s namespace. This is part of creating a PID namespace where first before we start namespacing the process to have a view of other processes. But this doesn’t mean that the identifier of these processes change. To ensure that we don’t pass down identifiers of these processes, we fork the execution into a new process. When creating a PID namespace with —pid and not providing a —fork flag we get the following error.

The child process is not able to fork further children.

bash-5.1# ps
    PID TTY          TIME CMD
   2060 ?        00:00:00 sudo
   2061 ?        00:00:00 bash
   2062 ?        00:00:00 ps

bash-5.1# ps
bash: fork: Cannot allocate memory

bash-5.1# ls
bash: fork: Cannot allocate memory

The process is able to execute only once and the command is not able to execute any other command after. This is because the inability of the command to fork.

Being able to fork into new processes and Process Identifiers (PIDs) respectively is very important and thus a very important part of creating a PID namespace.

Now, that we have isolated the PID, it’s time to isolate the view of the processes.

Sadly, the view of the other processes comes when we bring in the proc mount to get a view of our own namespace. Let’s do a new proc virtual filesystem mount called wavey onto /proc of our new root.

sudo mount -t proc wavey ./new_root/proc

With that, we have created a dummy virtual file system where nothing lives right now, but we will use this as a base for our new procfs which needs to be a virtual filesystem.

After doing the above, entering the process jail and running mount to list the different mount in the container we get the below output.

bash-5.1# mount
wavey on /proc type proc (rw,relatime)

Let’s create a procfs specifically for the namespace we are creating with unshare using the —mount-proc command with the full command below.

sudo unshare --pid --fork --mount-proc=$PWD/new_root/proc chroot ./new_root /bin/bash

After giving a new location to the procfs which doesn’t have any information on the other processes, let’s run the mount command again.

bash-5.1# mount
wavey on /proc type proc (rw,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

The second entry above is the procfs mount we just did. When we run the ps command in the container we will see the following.

bash-5.1# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
0              1  0.0  0.0   3984  3200 ?        S    19:18   0:00 /bin/bash
0              2  0.0  0.0   6408  2380 ?        R+   19:21   0:00 ps aux

No processes can be seen apart from the ones running in the context of this container.

25th November 2024

With that, our process do not have any lens into the underlying system, isolating it away from other running processes. We can also isolate the process with other kinds of namespaces further.

Now that the namespace has been processed we can namespace other aspect other aspects of this process.

We would also like to have a virtualized view of time for our process.

If we go to the process and run uptime we get the following output.

bash-5.1# uptime -p
up 3 hours, 30 minutes

If we would like to start our process 9 years ahead into the future we should run the following command.

sudo unshare --fork --pid --time --boottime 300000000 --mount-proc=$PWD/new_root/proc chroot ./new_root /bin/bash

Running uptime after, gets us the following output.

bash-5.1# uptime -p 
up 9 years, 28 weeks, 8 hours, 56 minutes

If you would like to learn more about Time Namespaces, check out the article below.

With that we have gotten a glimpse into creating a new root and a namespaced view for the process. In the next iteration, we are going to explore namespaces in Linux further and see what it means to isolate a process in terms of resource usage.