Securing web applications beyond just containers
gemini.clehaxze.tw for Gemini) run on basically the same architecture as any modern web applications. A backend that that serves some files, talks to some database, read some files and render templates. It's my passion to make what I made absolutely secure. A backend is special in that it's a monolithic piece of software that talks to other services through mostly a single mechinism - TCP. In this post, I'm gonna only ramble about stuff I find interesting.
How could we defend such piece of software? What could we do to slow down attackers? And what can we do if the attacker got ACE (Arbitrary Code Execution) access? Namely, I'm intrested in defending things that the OS can help us with.
Not getting pwned
Let's assume nomatter which language or framework we use, we still have memory bugs that the attacker use and gain ACE. It's also doesn't matter that your backend is closed source/the attacker doesn't have the binary. Blind ROP is a thing. We should always assume we can be hacked. And should make th life of the hacker much more difficult.
Buffer overflow and heap corruption attacks are the most common ways hackers get ACE. One way to make it harder for the attacker is to use a hardened allocator. OpenBSD's default allocator is one. On Linux there's 2 options that I can think of. First hardened_malloc. It was designed for the GrapheneOS project, an Android ROM with paranoid level of security. Second is the scudo allocator. It's a extension of the LLVM compiler. So only applications built with LLVM can use it.
On Linux I prefer hardened_malloc over scudo. hardened_malloc is a drop-in replacment that can be
LD_PREDOADed. Yet scudo required a flag to be set during compiling. So unless you are on NixOS or Gentoo. It's harder to use scudo.
Do note that the cost of using such allocators can be somewhat high. Espically when using a managed language. In my experience, using hardened_malloc make my compute heavy endpoints 12% slower.
I can't tell you if the performance hit it worth the cost. 12% is not that much for most applications since the slow part is always the DB. And in most cases 12% optimization is easily achievable by reorgnizing the code. However if you're like me. Already optimized the code to the max. 12% may not be acceptable.
I hope this is not a problem in 2022 any more. Don't call CLI tools from your backend. If you have to. Don't use
system() to invoke that command. Instead, do a
execve() to run that command. Or just use a library that does it.
execve directly passes the parameters to the new command instead of invoking shell thus parses the parameters. Thus no chance for command injection.
Defending after breach
Man, the hacker still got ACE and thereby a reverse shell despite out best. Whatever will we do... Well, there's still solutions. The (industry) standard is to put the backend inside some container like docker. And only provide it with a minimal environment. So even the hacker pwned the backend, he can't reach anything outside.
Well.. Isn't like that's going to stop anyone. Sure, it's a step up from the attcker getting full access to the underlying system with limited user privilege. But he can still probe the surrounding system, make arbirtrary HTTP requests and even try to escape the container.
pledge and unveil - limiting the attacker's reach
Containers must provide a baseline of commands to work. For example, any docker container must have a
/bin/sh to execute user commands, libc and related libraries must be readable by the application, etc.. Hackers, after getting ACE could invoke
execve to get a shell. Even spawn an independent reverse shell using netcat. There's must be some way to stop this. Like why would any web app need to access
/usr/bin/nc? And why would it need to execute any commands out of the CGI directory?
OpenBSD's unveil makes the process only sees a subset of the filesystem. Thus, even if an attacker gained ACE. He cannot pop a reverse shell. Let's try an example:
assert(fopen("/etc/passwd", "r") != NULL); // allow access to the data and upload folder unveil("/home/webapp/data", "r"); unveil("/home/webapp/upload", "rwc"); unveil(NULL, NULL); // finalize. Futher unveil commands will fail assert(fopen("/etc/passwd", "r") == NULL); // can't read outside execve("/bin/sh", NULL, NULL); // can't execute commands either
Hopefully you have already see the power of this. Even without containers, unveil stops the attacker from reaching outside of what the application actually needs. In most cases, it's either none (handeled by CDN) or some static files that are public anyways. Most importantly, no commands can be executed becausse
/bin is ever allowed. Good luck Mr. Hacker.
But.. the attacker still got ACE. He can upload his own
nc nevertheless. Good question. OpenBSD's pledge protects us against this.
pledge is a promise that we (the developer) make to the kernel that what we'll do in the future. The kernel will deny any other system call if it's not in the promise. No sane webapp will call
execve (unless you are PHP). And only very few needs
chmod. So we just ignore them in the promise.
web_app_init(); ... // We only need memory allocation, IO, networking and process/thread creation pledge("stdio rpath wpath cpath proc inet dns tmppath unix", NULL); execve("/bin/sh", NULL, NULL); // Not in the promise. Process killed
In the above example, we pledged to only a few things. And some that attackers need are not. Thus, even if they got ACE and uploaded a shell. They can't chmod nor execute it. Now pratically the wost the attacker can do is try to execute SQL commands through ACE. Which is much harder than popping a reverse shell. And much less useful (not saying it's still not a problem).
Being very honest, I'd rather run a application that is properly pledged and unveiled on OpenBSD rather then on Linux with container. These two function does more protection and is simpler to use.
pledge and unveil replacements on Linux
Both unveil and pledge are a OpenBSD only feature (and SerenityOS if you count that). So we can't use them on Linux. But there're libraries that tries simplify Linux's overcomplicated seccomp and landlock API.
Exile.h is a very cool library that provides a simple API to limit what a process can do. It's not a drop-in replacement for pledge and unveil.
struct exile_policy *policy = exile_init_policy(); exile_append_path_policies(policy, EXILE_FS_ALLOW_ALL_READ, "/home/webapp/data"); exile_append_path_policies(policy, EXILE_FS_ALLOW_ALL_READ | EXILE_FS_ALLOW_ALL_WRITE, "/home/webapp/upload"); policy->vow_promises = exile_vows_from_str("stdio rpath wpath cpath thread inet unix"); exile_enable_policy(policy);
The API is much more traditional C-like. But it does the combination of pledge and unveil together. Be aware that though exile.h uses pledge style promises. They are in fact not the same. You can't share promises bewteen the two.
landlock-unveil is my creation. It's aim is to be a drop-in replacement for unveil on Linux. Currently there's several behaiour differences that I'm aiming to solve when I got the time. It's not production ready yet. I use it on TLGS and this site nevertheless. Dogfooding.
#define LLUNVEIL_USE_UNVEIL #include "llunveil.h" // exact same as landlock unveil("/home/webapp/data", "r"); unveil("/home/webapp/upload", "rwc"); unveil(NULL, NULL); // finalize. Futher unveil commands will fail
Limitation of webapp architecture
Unfortunatelly the above is the best we can do for now. One of the biggest limitation is that webapps doesn't do privilege separation. Unlike Chrom having a GPU process, an audio process, etc.. Webapps usually have rendering, SQL, networking, etc.. all in one single process. This makes development much easier. But it also means that OS-level APIs has to allow all threads in the same process to all resources.
Idealy there'll be some way to seperate network IO, request processing, SQL ... all into their own process. Then communicating through IPC. That way pledge can be used to stop the request processor from reaching to directly to the DB. I don't see how people would adopt this without some easy way to abstract the IPC completely away. Even so. The cost of IPC may be a showstopper.
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict
I run TLGS, a major search engine on Gemini. Used by Buran by default.
- marty1885 \at protonmail.com
- Matrix: firstname.lastname@example.org
- Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df