Misusing Linux policy based routing for firewalling

While experimenting with my approach to "secure" computers, one of the key ingredients is reducing the attack surface as most as possible, and unfortunately, one big attack surface of any Linux-based deployment is the kernel itself. Thus, one of the first things I've done is to configure and recompile my own extremely stripped down version of the Linux kernel (based on the latest LTS branch).

Among the kernel features that were removed was also a very important security related subsystem, namely the Linux firewall netfilter (or iptables the old version, and nf_tables the new version).

Why remove the firewall subsystem when it's essential for security?

Because there have been quite a few vulnerabilities that have impacted the Linux firewall subsystem, many of them granting root access, either to a local unprivileged user, or worse, RCE (Remote Code Execution) to an external attacker.

Here are some links to the most recent vulnerabilities impacting the Linux firewall subsystem:

CVE-2024-1086
Flipping Pages: An analysis of a new Linux vulnerability in nf_tables and hardened exploitation techniques
CVE-2023-32233
Linux kernel use-after-free in Netfilter nf_tables when processing batch requests can be abused to perform arbitrary reads and writes in kernel memory
CVE-2022-32250
Linux kernel a use-after-free write in the netfilter subsystem
CVE-2022-25636
The Discovery and Exploitation of CVE-2022-25636
CVE-2021-22555
CVE-2021-22555: Turning \x00\x00 into 10000$
(and the list goes on...)

However, I still do want the ability to firewall inbound traffic, and especially outbound traffic, in and out of my "secure" computer.

But, without the Linux firewall subsystem, how could we achieve this?

one solution is to declare all ports as privileged:
```
sysctl -w 'net.ipv4.ip_local_port_range = 65535 65535'
sysctl -w 'net.ipv4.ip_unprivileged_port_start = 65535'
```
thus, connecting sockets would require either root or the CAP_NET_BIND_SERVICE capability, plus it would also require to explicitly bind a socket on an explicit non-zero port;
another solution is to use seccomp and either disable the socket related syscalls, or write a policy that allows a subset of calls (for example, for specific users, or specific ports and destinations, etc.);

However, none of these approaches give us the ability to restrict incoming traffic. It only allows us to restrict what local processes can listen to (thus inbound traffic) or connect to (thus outbound traffic).

Which brings me to my alternative solution: Linux policy based routing.

Unfortunately, policy based routing is meant for, as the name says, "routing", and not for "firewalling", thus, it's worth stressing the fact that I'm abusing this tool.

Foremost, we must understand that routing applies only to outbound traffic, not inbound, thus it applies only to packets that are generated locally, and must be sent to somewhere else via a network device. (It also applies to forwarded traffic, when our node behaves as a router.)

It is also worth noting that policy based routing is stateless, as opposed to stateful, meaning that it doesn't see connections (e.g. TCP), or related traffic (like Linux's contrack does for UDP), but instead just individual packets. Thus, any "firewalling" that we can do by using Linux policy based routing can't take into account flows.

And, as a final note, due to a feature called return path filtering present in the Linux kernel, if enabled, we actually also get inbound traffic filtering. In a few words, with sysctl -w 'net.ipv4.conf.all.rp_filter = 1', the kernel is instructed that before accepting and delivering a packet locally, to look if there is a route for a packet that has the source and destination addresses switched (thus the "return path" name), and if not to consider it as "martian", causing it to be dropped and logged, because it is not coming from an expected route.

Fore more details on Linux based policy routing, see the ip-rule man page or this article.

Finally, here are the snippets I've used in my initial experiment:

(remember, these rules apply only to outbound packets, but due to the return path feature, it also applies to inbound traffic in the reverse, by switching the sport and dport values;)

anything from a user-id over 2000 is not allowed to touch the network;

ip rule add priority 1537 type prohibit iif lo uidrange 2000-4294967294

we allow WireGuard UDP traffic from source port 51820 to destination port 51280

ip rule add priority 1538 type unicast table main iif lo ipproto 17 sport 51820 dport 51820

we allow SSH server TCP traffic from the source port 22 to any non-privileged port:

ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 sport 22 dport 1024-65534

(if we also want to allow SSH client traffic, use a rule similar to the ones below;)

we allow the root user and other "normal" users (user-id in the range of 1000 to 1999) HTTP/HTTPS access:

ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 dport 80
ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 dport 443
ip rule add priority 1538 type unicast table main iif lo uidrange 1000-1999 ipproto 6 dport 80
ip rule add priority 1538 type unicast table main iif lo uidrange 1000-1999 ipproto 6 dport 443

we allow DNS client TCP and UPD traffic (to destination port 53) for any user-id in the range of 0 to 1999:

ip rule add priority 1538 type unicast table main iif lo uidrange 0-1999 ipproto 6 dport 53
ip rule add priority 1538 type unicast table main iif lo uidrange 0-1999 ipproto 17 dport 53

we allow NTP client UDP traffic (to destination port 123), and also NTP-secure client TCP traffic (to destination port 4460), but only for the user-id 122 (under which chrony is being run); for some reason, we also need to whitelist the root account, otherwise chrony fails to bind to the UDP socket;
```
ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 17 dport 123
ip rule add priority 1538 type unicast table main iif lo uidrange 122-122 ipproto 17 dport 123
ip rule add priority 1538 type unicast table main iif lo uidrange 122-122 ipproto 6 dport 4460
```

we allow DHCP client UDP traffic (from the client port 68, to the server port 67)

ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 17 dport 67 sport 68

we allow only the root user to ping other hosts, this also allows other hosts to ping us:
```
ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 1
```

at last, some catch-all fallback rules to drop any ICMP, TCP and UDP traffic not whitelisted above, or that can't find a proper route in the table main routing table;

ip rule add priority 1539 type prohibit iif lo ipproto 1
ip rule add priority 1539 type prohibit iif lo ipproto 6
ip rule add priority 1539 type prohibit iif lo ipproto 17

A few observations:

the way ip rules are matched is a bit interesting:
- each rule is tried in turn, ordered by the priority;
- it stops when either a type prohibit or type blackhole rule is found;
- it also stops when a type unicast with table xyz rule is found, and the destination is actually found in that table xyz, else it continues;
the iif lo argument is essential, as it selects only locally generated traffic; (without it, I think all traffic breaks;)
the table main is the "normal" Linux routing table; however, it can be replaced with some other table, thus perhaps allowing further filtering, by treating routing tables as ipset replacement; like for example, having a routing table only for HTTP / HTTPS connections, that explicitly lists all allowed destination IP's, and adding a "default" blackhole route, ip route add blackhole 0.0.0.0/0;
for some reason, for outbound connections (like for example HTTP/HTTPS), if one uses sport 1024-65534 to explicitly states that the local address of these connections should use ephemeral ports, it breaks all outgoing traffic; (the solution would be for the application to explicitly bind the local address to an explicit non-zero port;)
unfortunately, the busybox ip applet doesn't understand the ipproto option, thus one needs to use the ip tool from the iproute2 project;
unfortunately, for some reason I haven't explored much, having a catch all prohibit rule without the ipproto breaks all traffic;
when ping-ing other hosts, one needs to explicitly state the outbound interface, e.g. ping -I eth0 1.1.1.1, else (for some reason) it won't work; (perhaps ping tries to detect the interface address via some other technique, and that fails?)
there is no way to do any quantitative filtering, like for example rate limiting, bandwidth throttling, etc.; for that the Linux firewall subsystem is required;

Have I sparked some new ideas?