RestrictedPython
github.comA Python interpreter with costed opcodes would be smart.
But then still there is no oracle to predict whether user-supplied [RestrictedPython] code never halts; so, without resource quotas from VMs or Containers, there is unmitigated risk of resource exhaustion from user-supplied and developer-signed code.
IIRC there are one-liners that DOS Python and probably RestrictedPython too.
JupyterHub Spawners and Authenticators were created to spin up [k8s] containers on demand - with resource quotas - for registered users with e.g. JupyterHub and unregistered users with BinderHub.
Someone already has a presentation on how it's impossible to fully sandbox Python without OS process isolation?
You can instead have them run their code on their machine with Python in WASM in a browser tab; but should you trust the results if you do not test for convergence by comparing your local output with other outputs? Consensus with a party of one or an odd number of samples.
So, reproducibility as scientific crisis: "it worked on my machine" or "it works on the gitops CI build servers" and "it works in any browser" and the output is the same when the user runs their own code locally as when it is run on the server [with RestrictedPython].
I heard it was actually impossible to sandbox Python (and most or all other languages) with itself?
/? python sandbox RestrictedPython https://google.com/search?q=python+sandbox+RestrictedPython
> But then still there is no oracle to predict whether user-supplied [RestrictedPython] code never halts; so, without resource quotas from VMs or Containers, there is unmitigated risk of resource exhaustion from user-supplied and developer-signed code
This is actually something I was interested in a while ago, but in Scheme. For example GNU Guile has such a facility built in [0], though with a caveat that if you're not careful what you expose you could still make yourself vulnerable to DoS attack (the memory exhaustion kind). But if you don't expose any facility that allows the untrusted user to make large allocations at a time (in a single call) you should be fine (fingers crossed). I don't see a reason why the same mechanics couldn't be implemented in Python.
[0] https://www.gnu.org/software/guile/manual/html_node/Sandboxe...
Edit: in some sense this restricted python stuff also reminds me of Safe Haskell (extension) [1], which came out a bit ahead of it's time and by this point almost forgotten about. Might become relevant again in the future.
[1] https://begriffs.com/posts/2015-05-24-safe-haskell.html - better overview than the wiki page
Java implementations have run-time heap allocator limits configurable with the -Xms and -Xmx options for minimum and maximum heap size, and -Xss for per-thread stack size:
java -jar -Xms1024M -Xmx2048M -Xss1M example.jar
RAM and CPU and network IO cgroups are harder limits than a process's attempts to bound its own allocation with the VM.
TIL about hardened_malloc. Python doesn't have hardened_malloc, and IDK how configurable hardened_malloc is in terms of self-imposed process resource limits. FWIU hardened_malloc groups and thereby contains allocations by size. https://github.com/GrapheneOS/hardened_malloc
There is a reason that EVM and eWASM have costed opcodes and do not have socket libraries (blocking or async).
The default sys.setrecursionlimit() is 1000; so, 1000 times the unbounded stack size per frame: https://docs.python.org/3/library/sys.html#sys.setrecursionl...
Most (all?) algorithms can be rewritten with a stack instead of recursion, thereby avoiding per-frame stack overhead in languages without TCO Tail-Call Optimization like Python.
Kata containers or Gvisor or just RestrictedPython that doesn't run until it's checked into git [and optionally signed]?
Stackless Python had something like that (and is able to serialize and later resume the program state preemptively!) - but no sandboxing there unfortunately https://stackless.readthedocs.io/en/2.7-slp/library/stackles...
afaik, one problem with assigning costs to Python opcodes is that they are overloaded, so they may have different true time/memory costs depending on their arguments (types)
setup.py could or should execute in a RestrictedPython sandbox if run as root (even in a container).
Starlark and Skylark are also restricted subsets of Python, for build configuration with Blaze, Bazel, Buck2, Caddy,
"Starlark implementations, tools, and users" https://github.com/bazelbuild/starlark/blob/master/users.md
Sounds a lot like TCL's Safe interps?
Yeah, first thing that came to my mind too. Though at a deeper level than simply replacing global procs.
Overview for unfamiliar readers:
https://www.tcl.tk/software/plugin/safetcl.html
https://wiki.tcl-lang.org/page/Config+file+using+slave+inter...
Still blows my mind that TCL includes a firewall and virtual filesystem layer.
https://wiki.tcl-lang.org/page/firewall
https://wiki.tcl-lang.org/page/VFS
https://wiki.tcl-lang.org/page/tclvfs
You can even build a FUSE filesystem in TCL!
Oh, side note: Safe interps now have available time limits and a few other measures, though it seems memory allocation limits are a harder nut to crack due to all the places a memory allocation can occur in the TCL interpreter.