Tracking Shell Scripts (and Python, Perl, etc) with eBPF is Hard

3 min read Original article ↗

If you want to track anything on a machine, eBPF is almost always the answer (only if you are running on Linux, but who doesn’t run Linux?).

This is even more true for tracking processes, which is quite simple with eBPF: you just hook into execve, and you are good to go.

But if you want precise information, exact paths to the binaries that were started, you will need to hook into something like bprm_check_security, a BPF-LSM hook, that allows you to get a little more information on a process.

This is what we do at Bomfather, and it worked perfectly… Until it did not.

When we ran regular executables, we got a log output that looked something like this (We were logging when a process accessed certain protected files)

But when we ran a shell script, we got

Where did the shell script go?

Well, the reason this happened was because of how the kernel runs these files with a #! (Shebang), which includes shell scripts, or any interpreted language (Python, Perl, Ruby, etc).

When a shell script (or any interpreted script) is executed, the kernel adds an extra step. First execve() is called, and that call hits our eBPF hook. The data we get looks something like this.

But then the kernel reads the first two bytes of the file. If it sees #!, then the Linux kernel will call execve() again!

What happens is the kernel sees the shebang, which tells it not to execute the file; instead, it should pass the file to an interpreter, which will then actually do something. The key here is that the shell script doesn’t do anything by itself at all.

Our eBPF hook triggers on this second execve(), which leads us to the root of our problem.

The second execve() is for the same process, but the args are totally different. Since there is no executable, the only thing “doing” anything in this process is the interpreter. So when our eBPF code writes this data to our storage (an eBPF map), it rewrites over the data that is already stored.

So after the first execve(), we get:

And after the second execve(), we get

To solve this issue, all we have to do is check whether the path for that process (which we may have recorded before) is not empty; if it is, we don’t rewrite it.

While the solution above allows us to capture the path of a shell script and any interpreted script, we still have one more catch for shell scripts.

You see, most of the operations you do in a shell script are going to be the calling of other executables. What I mean is that, for example, when you want to read a file in a shells script, you use cat the executable, to do this job.

But what this does is spawn a new process, so if we monitored a shell script and knew it opened file.txt, we expect an output like this.

But instead, we’d get something that looks more like this:

This is because the shell script never opens any file; instead, it delegates that job to another executable by starting a child process.

So if we want to accurately monitor shell scripts, we also have to track their children and treat them as a single process tree in our monitoring logic.

Discussion about this post

Ready for more?