While experimenting with my approach to "secure" computers, one of the key ingredients is reducing the attack surface as most as possible, and unfortunately, one big attack surface of any Linux-based deployment is the kernel itself. Thus, one of the first things I've done is to configure and recompile my own extremely stripped down version of the Linux kernel (based on the latest LTS branch).
Among the kernel features that were removed
was also a very important security related subsystem,
namely the Linux firewall netfilter
(or iptables the old version, and nf_tables the new version).
Why remove the firewall subsystem when it's essential for security?
Because there have been quite a few vulnerabilities
that have impacted the Linux firewall subsystem,
many of them granting root access,
either to a local unprivileged user,
or worse, RCE (Remote Code Execution)
to an external attacker.
Here are some links to the most recent vulnerabilities impacting the Linux firewall subsystem:
- CVE-2024-1086
Flipping Pages: An analysis of a new Linux vulnerability in nf_tables and hardened exploitation techniques - CVE-2023-32233
Linux kernel use-after-free in Netfilter nf_tables when processing batch requests can be abused to perform arbitrary reads and writes in kernel memory - CVE-2022-32250
Linux kernel a use-after-free write in the netfilter subsystem - CVE-2022-25636
The Discovery and Exploitation of CVE-2022-25636 - CVE-2021-22555
CVE-2021-22555: Turning \x00\x00 into 10000$ - (and the list goes on...)
However, I still do want the ability to firewall inbound traffic, and especially outbound traffic, in and out of my "secure" computer.
But, without the Linux firewall subsystem, how could we achieve this?
one solution is to declare all ports as privileged:
sysctl -w 'net.ipv4.ip_local_port_range = 65535 65535' sysctl -w 'net.ipv4.ip_unprivileged_port_start = 65535'thus, connecting sockets would require either
rootor theCAP_NET_BIND_SERVICEcapability, plus it would also require to explicitlybinda socket on an explicit non-zero port;another solution is to use
seccompand either disable the socket related syscalls, or write a policy that allows a subset of calls (for example, for specific users, or specific ports and destinations, etc.);
However, none of these approaches give us the ability to restrict incoming traffic. It only allows us to restrict what local processes can listen to (thus inbound traffic) or connect to (thus outbound traffic).
Which brings me to my alternative solution: Linux policy based routing.
Unfortunately, policy based routing is meant for, as the name says, "routing", and not for "firewalling", thus, it's worth stressing the fact that I'm abusing this tool.
Foremost, we must understand that routing applies only to outbound traffic, not inbound, thus it applies only to packets that are generated locally, and must be sent to somewhere else via a network device. (It also applies to forwarded traffic, when our node behaves as a router.)
It is also worth noting that policy based routing is stateless,
as opposed to stateful,
meaning that it doesn't see connections (e.g. TCP),
or related traffic (like Linux's contrack does for UDP),
but instead just individual packets.
Thus, any "firewalling" that we can do by using
Linux policy based routing can't take into account flows.
And, as a final note,
due to a feature called
return path filtering
present in the Linux kernel,
if enabled, we actually also get inbound traffic filtering.
In a few words,
with sysctl -w 'net.ipv4.conf.all.rp_filter = 1',
the kernel is instructed that before accepting and delivering a packet locally,
to look if there is a route
for a packet that has the source and destination addresses switched
(thus the "return path" name),
and if not to consider it as "martian",
causing it to be dropped and logged,
because it is not coming from an expected route.
Fore more details on Linux based policy routing, see the ip-rule man page or this article.
Finally, here are the snippets I've used in my initial experiment:
(remember, these rules apply only to outbound packets, but due to the return path feature, it also applies to inbound traffic in the reverse, by switching the
sportanddportvalues;)anything from a user-id over
2000is not allowed to touch the network;ip rule add priority 1537 type prohibit iif lo uidrange 2000-4294967294we allow WireGuard UDP traffic from source port
51820to destination port51280ip rule add priority 1538 type unicast table main iif lo ipproto 17 sport 51820 dport 51820we allow SSH server TCP traffic from the source port
22to any non-privileged port:ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 sport 22 dport 1024-65534(if we also want to allow SSH client traffic, use a rule similar to the ones below;)
we allow the
rootuser and other "normal" users (user-id in the range of1000to1999) HTTP/HTTPS access:ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 dport 80 ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 6 dport 443 ip rule add priority 1538 type unicast table main iif lo uidrange 1000-1999 ipproto 6 dport 80 ip rule add priority 1538 type unicast table main iif lo uidrange 1000-1999 ipproto 6 dport 443we allow DNS client TCP and UPD traffic (to destination port
53) for any user-id in the range of0to1999:ip rule add priority 1538 type unicast table main iif lo uidrange 0-1999 ipproto 6 dport 53 ip rule add priority 1538 type unicast table main iif lo uidrange 0-1999 ipproto 17 dport 53we allow NTP client UDP traffic (to destination port 123), and also NTP-secure client TCP traffic (to destination port 4460), but only for the user-id
122(under whichchronyis being run); for some reason, we also need to whitelist therootaccount, otherwisechronyfails to bind to the UDP socket;ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 17 dport 123 ip rule add priority 1538 type unicast table main iif lo uidrange 122-122 ipproto 17 dport 123 ip rule add priority 1538 type unicast table main iif lo uidrange 122-122 ipproto 6 dport 4460we allow DHCP client UDP traffic (from the client port
68, to the server port67)ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 17 dport 67 sport 68we allow only the
rootuser topingother hosts, this also allows other hosts to ping us:ip rule add priority 1538 type unicast table main iif lo uidrange 0-0 ipproto 1at last, some catch-all fallback rules to drop any ICMP, TCP and UDP traffic not whitelisted above, or that can't find a proper route in the
table mainrouting table;ip rule add priority 1539 type prohibit iif lo ipproto 1 ip rule add priority 1539 type prohibit iif lo ipproto 6 ip rule add priority 1539 type prohibit iif lo ipproto 17
A few observations:
the way
ip rulesare matched is a bit interesting:- each rule is tried in turn, ordered by the priority;
- it stops when either a
type prohibitortype blackholerule is found; - it also stops when a
type unicastwithtable xyzrule is found, and the destination is actually found in thattable xyz, else it continues;
the
iif loargument is essential, as it selects only locally generated traffic; (without it, I think all traffic breaks;)the
table mainis the "normal" Linux routing table; however, it can be replaced with some other table, thus perhaps allowing further filtering, by treating routing tables as ipset replacement; like for example, having a routing table only for HTTP / HTTPS connections, that explicitly lists all allowed destination IP's, and adding a "default" blackhole route,ip route add blackhole 0.0.0.0/0;for some reason, for outbound connections (like for example HTTP/HTTPS), if one uses
sport 1024-65534to explicitly states that the local address of these connections should use ephemeral ports, it breaks all outgoing traffic; (the solution would be for the application to explicitlybindthe local address to an explicit non-zero port;)unfortunately, the
busybox ipapplet doesn't understand theipprotooption, thus one needs to use theiptool from theiproute2project;unfortunately, for some reason I haven't explored much, having a catch all
prohibitrule without theipprotobreaks all traffic;when
ping-ing other hosts, one needs to explicitly state the outbound interface, e.g.ping -I eth0 1.1.1.1, else (for some reason) it won't work; (perhapspingtries to detect the interface address via some other technique, and that fails?)there is no way to do any quantitative filtering, like for example rate limiting, bandwidth throttling, etc.; for that the Linux firewall subsystem is required;
Have I sparked some new ideas?