RestrictedPython

20 points by chriddyp 2 years ago · 9 comments

Reader

A Python interpreter with costed opcodes would be smart.

But then still there is no oracle to predict whether user-supplied [RestrictedPython] code never halts; so, without resource quotas from VMs or Containers, there is unmitigated risk of resource exhaustion from user-supplied and developer-signed code.

IIRC there are one-liners that DOS Python and probably RestrictedPython too.

JupyterHub Spawners and Authenticators were created to spin up [k8s] containers on demand - with resource quotas - for registered users with e.g. JupyterHub and unregistered users with BinderHub.

Someone already has a presentation on how it's impossible to fully sandbox Python without OS process isolation?

You can instead have them run their code on their machine with Python in WASM in a browser tab; but should you trust the results if you do not test for convergence by comparing your local output with other outputs? Consensus with a party of one or an odd number of samples.

So, reproducibility as scientific crisis: "it worked on my machine" or "it works on the gitops CI build servers" and "it works in any browser" and the output is the same when the user runs their own code locally as when it is run on the server [with RestrictedPython].

I heard it was actually impossible to sandbox Python (and most or all other languages) with itself?

/? python sandbox RestrictedPython https://google.com/search?q=python+sandbox+RestrictedPython

mhitza 2 years ago

> But then still there is no oracle to predict whether user-supplied [RestrictedPython] code never halts; so, without resource quotas from VMs or Containers, there is unmitigated risk of resource exhaustion from user-supplied and developer-signed code
This is actually something I was interested in a while ago, but in Scheme. For example GNU Guile has such a facility built in [0], though with a caveat that if you're not careful what you expose you could still make yourself vulnerable to DoS attack (the memory exhaustion kind). But if you don't expose any facility that allows the untrusted user to make large allocations at a time (in a single call) you should be fine (fingers crossed). I don't see a reason why the same mechanics couldn't be implemented in Python.
[0] https://www.gnu.org/software/guile/manual/html_node/Sandboxe...
Edit: in some sense this restricted python stuff also reminds me of Safe Haskell (extension) [1], which came out a bit ahead of it's time and by this point almost forgotten about. Might become relevant again in the future.
[1] https://begriffs.com/posts/2015-05-24-safe-haskell.html - better overview than the wiki page
- westurner 2 years ago
  
  Java implementations have run-time heap allocator limits configurable with the -Xms and -Xmx options for minimum and maximum heap size, and -Xss for per-thread stack size:
  java -jar -Xms1024M -Xmx2048M -Xss1M example.jar
  RAM and CPU and network IO cgroups are harder limits than a process's attempts to bound its own allocation with the VM.
  TIL about hardened_malloc. Python doesn't have hardened_malloc, and IDK how configurable hardened_malloc is in terms of self-imposed process resource limits. FWIU hardened_malloc groups and thereby contains allocations by size. https://github.com/GrapheneOS/hardened_malloc
  There is a reason that EVM and eWASM have costed opcodes and do not have socket libraries (blocking or async).
  The default sys.setrecursionlimit() is 1000; so, 1000 times the unbounded stack size per frame: https://docs.python.org/3/library/sys.html#sys.setrecursionl...
  Most (all?) algorithms can be rewritten with a stack instead of recursion, thereby avoiding per-frame stack overhead in languages without TCO Tail-Call Optimization like Python.
  Kata containers or Gvisor or just RestrictedPython that doesn't run until it's checked into git [and optionally signed]?
7373737373 2 years ago

Stackless Python had something like that (and is able to serialize and later resume the program state preemptively!) - but no sandboxing there unfortunately https://stackless.readthedocs.io/en/2.7-slp/library/stackles...
afaik, one problem with assigning costs to Python opcodes is that they are overloaded, so they may have different true time/memory costs depending on their arguments (types)
westurner 2 years ago

setup.py could or should execute in a RestrictedPython sandbox if run as root (even in a container).
Starlark and Skylark are also restricted subsets of Python, for build configuration with Blaze, Bazel, Buck2, Caddy,
"Starlark implementations, tools, and users" https://github.com/bazelbuild/starlark/blob/master/users.md

jakobson14 2 years ago

Sounds a lot like TCL's Safe interps?

a-french-anon 2 years ago

Yeah, first thing that came to my mind too. Though at a deeper level than simply replacing global procs.
Overview for unfamiliar readers:
https://www.tcl.tk/software/plugin/safetcl.html
https://wiki.tcl-lang.org/page/Config+file+using+slave+inter...
- jakobson14 2 years ago
  
  Still blows my mind that TCL includes a firewall and virtual filesystem layer.
  https://wiki.tcl-lang.org/page/firewall
  https://wiki.tcl-lang.org/page/VFS
  https://wiki.tcl-lang.org/page/tclvfs
  You can even build a FUSE filesystem in TCL!
- jakobson14 2 years ago
  
  Oh, side note: Safe interps now have available time limits and a few other measures, though it seems memory allocation limits are a harder nut to crack due to all the places a memory allocation can occur in the TCL interpreter.
  https://www.tcl-lang.org/man/tcl8.5/TclCmd/interp.htm#M46

Settings

RestrictedPython

Keyboard Shortcuts