The reality is that security is not just technical implementation, but also actually getting people to use the solutions. “Stop disabling SELinux” is not a real answer to when people disable it, like we have one person in this thread.
Another problem with complex security solutions is they are hard to get right. Even if you enable them and configure them, without being an expert, it’s possible you left a gap here or there, and holes and gaps in these solutions.*
Like so many other complex linux security solutions, it is lacking effectiveness due to still sharing the same kernel.
There is a good, but bit dated writeup here about the problems with Linux security, from an architecturual perspective: https://madaidans-insecurities.github.io/linux.html . But, the short version is that the Linux kernel is large and complex, and has a lot of attack surface. And it’s a frequent source of vulnerabilities because attackers can hit it as long as they access to the kernel, even if they are in a container/sandbox. Like, copyfail and dirtyfrag would punch through containers, but also punch through SELinux.
Now, SELinux can be used to restrict what a root shell could do after escalating… but that’s further complexity you have to learn to configure, and configure it correctly as well.
Ultimately, none of the Linux security solutions come anywhere near the isolation of simply running something in a virtual machine. Which, also happens to be a lot simpler and actually possible to get people to use.
*(putting this at the bottom because it veers off topic) I have a greater argument and problem with mentalities like this. I have noticed a pattern, where many of the more effortfull and toil intensive security solutions are recommended by people who have the time, energy, and skills to execute them. They have a bias/blindspot to the realities, which is that not everyone is in the same situation as them.
For example, updating/patching software. Linux distros like RHEL or Debian, have a policy where they only do security updates, and don’t do feature updates or bugfixes. This enables them to ship automatic updates, so that security issues are automatically handled.
On the other hand software like Windows, likes to bundle in breaking changes along with security updates. So automatic updates get disabled because “They might break something”. And then, people don’t update them, and environments get horrifically out of date, because not enough money/time/people is put into regular IT people who are in charge of maintaining them.
But some environments, have heroes, people who go around patching everything and keeping everything up to date and secure. And when they see these environments that don’t have everything patched, they usually give the advice of “You should patch everything” (while simultaneously advising against auto updates), not understanding that these environments are lacking a key ingredient: Themselves.
Sure, I could be a hero. I could “patch” everything manually. I could deploy SELinux. But that would only last until I get burnt out, or leave. Once I’m gone, SELinux, the patches, any similar security solutions are gone. I’ve met so many people, even in cybersecurity, that are apathetic about security, even though they might have cared once upon a time.
Like, copyfail and dirtyfrag would punch through containers, but also punch through SELinux.
User namespaces and optionally limited capabilities severely limit the usefulness of both of these exploits. K8s containers with user namespaces or rootless podman prevent host-root and only allow elevating to container root (host uid != 0) and cross container cache pollution (jump to other containers that use the same base image?)
Copyfail would punch through user namespaces to get root straight on the host. User namespaces only really protect you against vulnerabilities in non kernel applications.
Limited capibilities/seccomp policies did help, though. In my admittedly limited testing, some of the vulnerabilities wouldn’t work in podman, but they would work in docker. This wasn’t due to user namespaces, but this was due to podman having stricter capibilities/seccomp policies than docker by default.
This implies that even if you were using docker rootless, they still would have been able to break out and get root in one go.
User namespaces don’t add that much security, in my opinion. Assuming your container has a non root user inside, adding user namespaces just changes the amount of cve’s/zerodays from 2 to maybe 3:
With a rootful container it’s:
Escalate to root (can be done after or before container escape)
Escape container (can be done after or before escalation to root)
With user namespaces it becomes:
Maybe escalate to root within the container first to get privileges or access to binaries needed to take advantage of a container escape exploit
Escape container
Escalate to root
User namespaces are like every other Linux security solution, they are extremely complex, hard to configure, and they don’t actually add that much security for the trouble The article I linked above has a section about them:
Another example of these features is user namespaces. User namespaces allow unprivileged users to interact with lots of kernel code that is normally reserved for the root user. It adds a massive amount of networking, mount, etc. functionality as new attack surface. It has also been the cause of numerous privilege escalation vulnerabilities, which is why many distributions, such as Debian, had started to restrict access to this functionality by default
Their complexity makes them difficult to secure and execute properly, and adds a ton of attack surface to the kernel.
I mostly use docker and rootfull podman for everything. You already need a CVE/zeroday to do a container break out in the first place, so just keep your runtimes up to date and you should be good. If you really care about being proactive with security, and trying to preemptively prevent issues, user namespaces are not really a good solution, better is just to use a VM container runtime like kata or microvm, or a userspace kernel like gvisor or syd. They are pretty easy to use. You can just set them as your container runtime, in docker, podman, or kubes, and things will mostly just work. Those (and other kernel isolation solutions) would have actually beaten dirtyfrag, copyfail, and the like of recent vulns.
The reality is that security is not just technical implementation, but also actually getting people to use the solutions. “Stop disabling SELinux” is not a real answer to when people disable it, like we have one person in this thread.
Another problem with complex security solutions is they are hard to get right. Even if you enable them and configure them, without being an expert, it’s possible you left a gap here or there, and holes and gaps in these solutions.*
There is a good, but bit dated writeup here about the problems with Linux security, from an architecturual perspective: https://madaidans-insecurities.github.io/linux.html . But, the short version is that the Linux kernel is large and complex, and has a lot of attack surface. And it’s a frequent source of vulnerabilities because attackers can hit it as long as they access to the kernel, even if they are in a container/sandbox. Like, copyfail and dirtyfrag would punch through containers, but also punch through SELinux.
For example, just earlier on lemmy someone dropped a zero day that punches through SELinux: https://programming.dev/post/51103657
Now, SELinux can be used to restrict what a root shell could do after escalating… but that’s further complexity you have to learn to configure, and configure it correctly as well.
Ultimately, none of the Linux security solutions come anywhere near the isolation of simply running something in a virtual machine. Which, also happens to be a lot simpler and actually possible to get people to use.
*(putting this at the bottom because it veers off topic) I have a greater argument and problem with mentalities like this. I have noticed a pattern, where many of the more effortfull and toil intensive security solutions are recommended by people who have the time, energy, and skills to execute them. They have a bias/blindspot to the realities, which is that not everyone is in the same situation as them.
For example, updating/patching software. Linux distros like RHEL or Debian, have a policy where they only do security updates, and don’t do feature updates or bugfixes. This enables them to ship automatic updates, so that security issues are automatically handled.
On the other hand software like Windows, likes to bundle in breaking changes along with security updates. So automatic updates get disabled because “They might break something”. And then, people don’t update them, and environments get horrifically out of date, because not enough money/time/people is put into regular IT people who are in charge of maintaining them.
But some environments, have heroes, people who go around patching everything and keeping everything up to date and secure. And when they see these environments that don’t have everything patched, they usually give the advice of “You should patch everything” (while simultaneously advising against auto updates), not understanding that these environments are lacking a key ingredient: Themselves.
Sure, I could be a hero. I could “patch” everything manually. I could deploy SELinux. But that would only last until I get burnt out, or leave. Once I’m gone, SELinux, the patches, any similar security solutions are gone. I’ve met so many people, even in cybersecurity, that are apathetic about security, even though they might have cared once upon a time.
Small correction:
User namespaces and optionally limited capabilities severely limit the usefulness of both of these exploits. K8s containers with user namespaces or rootless podman prevent host-root and only allow elevating to container root (host uid != 0) and cross container cache pollution (jump to other containers that use the same base image?)
Kind of.
Copyfail would punch through user namespaces to get root straight on the host. User namespaces only really protect you against vulnerabilities in non kernel applications.
Limited capibilities/seccomp policies did help, though. In my admittedly limited testing, some of the vulnerabilities wouldn’t work in podman, but they would work in docker. This wasn’t due to user namespaces, but this was due to podman having stricter capibilities/seccomp policies than docker by default.
This implies that even if you were using docker rootless, they still would have been able to break out and get root in one go.
User namespaces don’t add that much security, in my opinion. Assuming your container has a non root user inside, adding user namespaces just changes the amount of cve’s/zerodays from 2 to maybe 3:
With a rootful container it’s:
With user namespaces it becomes:
User namespaces are like every other Linux security solution, they are extremely complex, hard to configure, and they don’t actually add that much security for the trouble The article I linked above has a section about them:
Their complexity makes them difficult to secure and execute properly, and adds a ton of attack surface to the kernel.
Dirty frag, for example, was using user namespaces as one of the ways it would escalate. Most container runtimes restrict user namespace creation within user namespaced containers (via seccomp/capabilities), so running dirtyfrag in a container wouldn’t have worked. But, at the same time, dirtyfrag is only possible in the first place because of the attack surface user namespaces cause.
I mostly use docker and rootfull podman for everything. You already need a CVE/zeroday to do a container break out in the first place, so just keep your runtimes up to date and you should be good. If you really care about being proactive with security, and trying to preemptively prevent issues, user namespaces are not really a good solution, better is just to use a VM container runtime like kata or microvm, or a userspace kernel like gvisor or syd. They are pretty easy to use. You can just set them as your container runtime, in docker, podman, or kubes, and things will mostly just work. Those (and other kernel isolation solutions) would have actually beaten dirtyfrag, copyfail, and the like of recent vulns.