CVE-2021-32471 – Input validation in Marvin Minsky 1967 Turing Machine

cve.mitre.org

139 points by wsces 5 years ago · 42 comments

Reader

A better link is the research paper:

> The universal Turing machine is generally considered to be the simplest, most abstract model of a computer. This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code. The article presents the discovered vulnerability in detail and discusses its potential implications. To the best of our knowledge, an arbitrary code execution vulnerability has not previously been reported for such a simple system.

> A common strategy for understanding a problem is to reduce it to its minimal form. In the field of computer security,we may ask the question: "What is the simplest system exploitable to arbitrary code execution?" In this article, we pro-pose an answer to that question by reporting on the discovery that a well-established implementation of the universal Turing machine is vulnerable to a both unintentional and non-trivial form of arbitrary code execution.

OskarS 5 years ago

Haven't looked into this paper deeply, but this reads very strange to me:
> This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code.
It's a universal Turing machine. Its whole purpose is running "user-provided code". That's what a Universal Turing machine does, it runs arbitrary Turing machines. This is a little bit like saying "we found a weakness in the Python interpreter whereby you can feed it specially crafted input that allows you to run arbitrary Python programs". Like... yeah... that's what it's supposed to do.
- tromp 5 years ago
  
  If you do read the paper, you see that the universal machine U expects both a description of the simulated machine T, and a description of the input to T. The exploit is that they can cause arbitrary code execution not by changing the former but the latter. They also suggest a fix to U that makes it robust against exploits in the description of T's input.
- segfaultbuserr 5 years ago
  
  > This is a little bit like saying "we found a weakness in the Python interpreter whereby you can feed it specially crafted input that allows you to run arbitrary Python programs".
  If my understanding is correct, it's more like, "we found a weakness in the Python interpreter whereby you can feed it a specially crafted input to a Python script that allows you override that script and forces the Python interpreter to do something else", which can be a reasonable point to make.
  BTW, if we keep using the Python analogy, this scenario sounds like a Python script that loads its configuration data from a Pickle file on the hard drive. In Python, Pickle doesn't validate the input and it's capable of modifying the internal state of the program. What the authors say is that malicious data in the Pickle file leads to code execution due to deserialization of untrusted data. Depending on your perspective, just like Python says a malicious Pickle file is not in its threat model, you can argue that the Universal Turing Machine is never meant to be protected from malicious data on the tape, and this is not a exploit, but an interesting thought experiment nevertheless.
  The paper says,
  > The universal machine, U, will be given just the necessary materials: a description, on its tape, of T and of [the initial configuration on T's own, simulated tape] s_x; some working space; and the built-in capacity to interpret correctly the rules of operation as given in the description of T. Its behavior will be very simple. U will simulate the behavior of T one step at a time [...]
  > [...] There is one obvious trust boundary in a universal Turing machine, U: the initial string on the tape of the simulated Turing machine, T. That string corresponds to the user-provided data of an ordinary computer program. Because the potential users may be unknown to the developers and administrators of the computer and its programs, it is common to view this data as untrusted. In our explorations of the universal Turing machine, we will make the same assumption. Therefore, if it were possible to execute arbitrary code without manipulating the program of T, but only by providing crafted data on T’s simulated tape, that would constitute a vulnerability.
  Basically the authors' reasoning is:
  1. An Universal Turing machine is an interpreter/simulation that is capable of executing code written for any Turing machine, provided by the user.
  2. The user code takes an external input - the initial content on the tape.
  3. It's possible to maliciously craft an input (the content on the tape) to the user code to hijack the simulation, without modifying the user code.
  - OskarS 5 years ago
    
    Ah, that makes a ton of sense. Thanks!
- qsort 5 years ago
  
  Yes, that's the joke.
  - UncleMeat 5 years ago
    
    No it isn't. The paper forces the universal turing machine to execute a different machine than the one on the input tape by providing crafted input to the intended simulated machine. It isn't a joke paper.
    U is fed machine T and input I. By only changing I you can force U to execute a different machine X.
    
    qsort 5 years ago
    
    Sure, but you achieve that by specifying I in a way you aren't really supposed to. You are effectively invoking undefined behavior.
    The implications of this are well understood. You'd have similar vulnerabilities if you ran raw machine code on, say, an x86 processor. Enforcing checks and sanitization in such a way that the 'exploit' can't happen is the exact kind of job you'd have to do in order to write an operating system.
  - OskarS 5 years ago
    
    I get that the CVE is a joke, but are you saying the paper is as well?
    
    toxik 5 years ago
    
    It’s not a joke, people saying otherwise did not understand the paper. The significance is debatable, it’s mostly a kind of collective facepalm for theoretical computer scientists.
    The CVE is a joke in the sense that you don’t really need a CVE for a universal Turing machine, as they are quite literally only academic.
    
    AreYouSirius 5 years ago
    
    i dont know if he says exactly that by i do say exactly that
    paper is essentially saying that by using simplest machine possible
    we prove it does not provide layers of security abstractions which were invented and deployed in last 40+ years to computers
    so thats like getting CVE for turning off ASLR, AppARMOR SElinux etc in linux kernel.
    
    qsort 5 years ago
    
    As far as I can tell, yes, it is.
    Showing a universal Turing machine is a proof of the fact that when you write stuff like the smn theorem or anything that uses universal functions you aren't writing nonsense.
    There is no "contract" between the universal machine and the hypothetical user. One could certainly ensure that, say, the input TM is properly formatted, or that "the runtime" will only use a certain set of cells, and so on. The difficulties and problems one would find in trying to do so are the same one would face when writing an operating system and a compiler.
  - mycall 5 years ago
    
    Wait until they try unicode, then the joke will be on them.
    
    jradd 5 years ago
    
    I cannot fathom how complicated arbitrary code execution could get with multi byte characters that could use shift registers, null bytes and byte order marks with determinism in a NOP slide on a heap spray.
    Filtering only printable user input helps but even bit map images can expose a heap to a sensitive registers that will execute some target specific generated shell code.
    https://en.m.wikipedia.org/wiki/NOP_slide.
    https://en.m.wikipedia.org/wiki/Heap_spraying

yabones 5 years ago

> NOTE: the discoverer states "this vulnerability has no real-world implications."

At least they're honest about it with this CVE...

stevekemp 5 years ago

Issues like this, in obsolete code, are a lot of fun. Even if they are essentially meaningless.
I reported CVE-2014-3423 back in the day, relating to GNU Emacs using a predictable filename when talking to the Mosiac browser. No choice, as that was what the browser required, something that wouldn't exist these days.
buitreVirtual 5 years ago

I'm not so sure. Some banks and airlines might still be running those machines.

avibhu 5 years ago

> NOTE: the discoverer states "this vulnerability has no real-world implications."

Not sure if declaring this is standard practice, but I had a good laugh.

zitterbewegung 5 years ago

Sure this CVE sounds like a joke but if you create a programming language that is non Turing complete it is much easier to secure than a Turing complete language.

Making a language that have the expressive power of finite state machines could be an example.

qsort 5 years ago

We are doing this already.
- Configuration languages (JSON, YAML, XML) are pure combinatorial logic.
- Regular expressions are... mostly not actually regular, but you get the idea.
- Some templating languages are deliberately less powerful than Turing machines, e.g. ST4 is context-free.
- Prepared SQL statements are a similar idea on a different axis.
The real question is whether a non-TC language could be useful for general purpose programming. Such a language might come with very strong guarantees (termination, time complexity, even correctness or a limited form of correctness), but they might be extra-cumbersome for 'normal' workloads.
- ludamad 5 years ago
  
  A non-TC language can be achieved just by forcing proofs. Oh, you want that program to run? Just prove that it is bounded memory and can't halt, etc
  - qsort 5 years ago
    
    Yeah, but that's the 'cumbersome' part. Could we imagine a non-TC language with simple syntax that's suitable for at least some of the task you'd use a TC language for?
    
    magmastonealex 5 years ago
    
    C compiled down to eBPF (as implemented with the Linux kernel) is not turing-complete. eBPF has a wide variety of uses from network filtering to hooking into certain syscalls.
    The bytecode is theoretically turing-complete, but the verifier present in the kernel ensures that the code contains no loops and accesses no out-of-bounds memory.
    With `clang`, you can compile C down to eBPF bytecode. It's a subset of C, with a lot of restrictions, but it is familiar and capable of some remarkable things past just simple packet parsing & filtering.
    
    qsort 5 years ago
    
    I had no idea, this is fascinating. Thanks for sharing.
    
    UncleMeat 5 years ago
    
    A useful subset of SQL is not turing-complete and is suitable for meaningful computational tasks. Datalog is also not turing-complete and is similarly useful.
    
    qsort 5 years ago
    
    Sure, more examples are CSP solvers and AMPL. They are all somewhat specific though. I was wondering if it would be possible to make one that's "general purpose" in the same sense you call programming languages "general purpose", i.e. one you'd write application software in.
    
    UncleMeat 5 years ago
    
    Datalog would be the closest I can think of. A bunch of static analysis engines are implemented in it.
tester34 5 years ago

What does Turing completness even mean in practice when it comes to non-theoretical languages?
- zitterbewegung 5 years ago
  
  You can't do anything like recursion.
  - klyrs 5 years ago
    
    I spent a while trying to parse this, and it doesn't make a lick of sense.
    Brainfuck is a classical example of a turing complete language that doesn't have a notion of functions. What does "recursion" mean here? Nothing, I think. So you don't need recursion for Turing completeness.
    Some people work on languages that support functions as first class objects, which don't natively support recursion. This is nice from a language design standpoint, because functions only see the scope of their arguments and can't look any further. The "Y Combinator" pattern is used to bootstrap recursion into these languages. You may have heard of it.
    Turing-incomplete languages can also support unlimited recursion. For example, take a Turing complete language, and equip it with a state of the art theorem prover. Before entering a loop, or calling a function, the theorem prover is allowed to spend up to 1 minute proving that the loop or function call will terminate. If the theorem prover times out, the program halts with an error. Some functions, such as a naive implementation of factorial, can be easily proved to halt for all inputs. So this language allows recursion, but by construction, all programs terminate, so it's turing incomplete.
    
    mousepilot 5 years ago
    
    I'm not that great a programmer, but this one seems easy to me, you write an interpreter in brainfuck that allows function definitions that support recursion, then include your recursive function definitions as an included constant then point your interpreter at the constant and tell it to run it. I assume bf allows constants but like I said I'm not that great a programmer.
    Obviously using a recursive descent parser is out but thats probably easy to work around right?
    I could be wrong tho so don't base mission critical code on this without rigorous testing!
    
    klyrs 5 years ago
    
    > I assume bf allows constants but like I said I'm not that great a programmer.
    Key to being a good programmer: at least read a bit about the language before speculating about it. Not sure what you mean by "allows constants", but you're only given increment and decrement operators, and a promise that fresh tape is zeroed out.
    Brainfuck supports iteration. Performing recursion in a higher level language will still look like iteration in Brainfuck.
    
    mousepilot 5 years ago
    
    100 percent agree, but honestly I couldn't make sense of the examples I looked at. It was probably just me tho.

al2o3cr 5 years ago

Meh. The "exploit" seems to rely on passing input that contains cells with values that are valid for the universal machine's tape alphabet, but not the simulated machine's.

It's as if you could pass a "null terminator" in a Unicode string that didn't match the "normal" null character.

mousepilot 5 years ago

I'm going to forward this to our local windows server admins, with the caveat that that it only applies if the particular windows server version is turing complete.

SilasX 5 years ago

@dang, this one should be merged here or vice versa:

https://news.ycombinator.com/item?id=27104125

userbinator 5 years ago

I think this should be considered a satire of what the "security industry" has become: finding any little thing that they can claim is exploitable, regardless of actual significance, and all the while using paranoia to slowly destroy general-purpose computing and user freedom.

Settings

CVE-2021-32471 – Input validation in Marvin Minsky 1967 Turing Machine

Keyboard Shortcuts