Settings

Theme

X86 mov is turing complete: mov-only compiler

github.com

238 points by thejj 11 years ago · 59 comments

Reader

M2Ys4U 11 years ago

Who needs instructions? Page faults are Turing complete: https://github.com/jbangert/trapcc

pokle 11 years ago

HN discussion of the paper that this is based on: https://news.ycombinator.com/item?id=6309631

  • jordigh 11 years ago

    That paper is so funny:

        Finding Turing-completeness in unlikely places has long been a
        pastime of bored computer scientists.
    
    And

        Removing all but the mov instruction from future iterations of the
        x86 architecture would have many advantages: the instruction
        format would be greatly simplified, the expensive decode unit
        would become much cheaper, and silicon currently used for complex
        functional units could be repurposed as even more cache. As long
        as someone else implements the compiler.
    
    Well, I guess someone has implemented it now!
    • cwre 11 years ago

      Also:

      > Thus, while it has been known for quite some time that x86 has far too many instructions, we can now contribute the novel result that it also has far too many registers.

pslam 11 years ago

That's a fun program. Similar techniques to this are used to create "ROP" (return-oriented programming) exploits. You get enough building blocks to get general purpose programming, and it's "just" matter of mapping to those building blocks.

X86 "mov" is a bit of a cheat, though. It's a single mnemonic for what's really dozens of quite different instructions. For example, ARM has "mov" for register-register operations, but "ldr/str" for register-memory, while X86 just uses mov for everything. Even ARM mov optionally puts a barrel shifter into the path, so it's not that pure either.

  • userbinator 11 years ago

    Even if they weren't a single mnemonic, if you read the paper you'll find that reg = mem[reg + C], mem[reg + C] = reg, and reg = C appear to be the only 3 instruction types that are actually needed; these are only register-memory and register-constant data transfers. Even register-register isn't needed.

    • awalton 11 years ago

      register-register movs are needed because compilers are not the world's smartest code generators, and we don't have an infinite number of registers. The biggest problem arises when some calling convention expects certain things to be in certain registers - unless you do some pretty extreme long term planning in your register allocation code in the compiler, you're pretty likely to eventually wind up with some piece of information in a register that you need to move to another register before making or returning from a call.

      Of course, modern processors don't really care about register-register moves because they have hundreds of hidden intermediate registers and register renaming which makes those types of moves "free." (e.g. Haswell has 168 GP registers, only 16 of which are exposed through the AMD64 ISA.)

      • gsg 11 years ago

        Note that you don't need a dumb compiler, register pressure, or an instruction with physical register constraints to require a copy. Two-address instructions suffice: simply have a register for the first operand that contains a value that is used later. A copy is required to preserve the value stomped by the mutating instruction.

        In any case I believe the point of the GP is that any necessary copy can be done by going through memory, so register-register moves can be dispensed with where minimalism is the goal.

      • rdc12 11 years ago

        Do you have a citation for the 168GP registers, just wondering if Intel have "offically" stated the number they use in a given microarch.

    • pslam 11 years ago

      Yes, and these are effectively 3 different instructions sharing the same mnenomic. If you assemble them:

              mov     0x1000(%rax,%rax), %eax
              mov     %eax, 0x1000(%rax,%rax)
              mov     $7, %eax
      
      You get:

              0000000000000000	8b840000100000  	movl	0x1000(%rax,%rax), %eax
              0000000000000007	89840000100000  	movl	%eax, 0x1000(%rax,%rax)
              000000000000000e	b807000000      	movl	$0x7, %eax
      
      Which is basically 0x8b->Load, 0x89->Store, 0xb8->Constant.
  • vardump 11 years ago

    ARM mov can be used as a JMP <addr in reg>. Program counter is in fact register R15 in ARM.

  • jimmaswell 11 years ago

    If you're going down that path, all of x86 is used as a 'bytecode' that turns into microcode instructions anyway in modern processors.

    • pmalynin 11 years ago

      Not exactly, most instructions are not "microcoded" for performance reasons. But there is a class of instructions that is.

      • delroth 11 years ago

        Intel CPUs translate x86 to some internal RISC-like "uops" before doing optimizations and execution. This is a widely documented fact (though Intel doesn't really talk publicly about it as far as I know): see for example §2.1 in http://www.agner.org/optimize/microarchitecture.pdf

        • mattgodbolt 11 years ago

          For a (hopefully!) more accessible introduction to the crazy world of what really happens inside the x86, I did a talk at GOTO: http://youtu.be/hgcNM-6wr34 - it's a really fascinating subject!

        • userbinator 11 years ago

          For marketing reasons, Intel made quite a big commotion over the fact that its Pentium was "RISC-like" when it was first released, but in reality what it meant was using a decoder that could emit multiple uops in parallel instead of sequentially like it was in the 486.

          Even CPUs considered RISC today, like ARMs, need to decode instructions into one or more wider uops for efficient execution.

Rexxar 11 years ago

Is there a way to estimate how many time slower would be a program compiled to use only "mov" instead of standard instructions ? 3x, 10x, 100x, more ?

  • DSMan195276 11 years ago

    It depends. In this case, I would say it would be at least 100x, probably much more. But, that's because this only compiles BF (It is basically a POC after-all), and thus you have to use that as an intermediate language. BF is extremely un-optimal, and the compiler here spits out a completely unoptimized one-to-one conversion of the BF source to assembly that uses only `mov` instructions. Thus, the size of the assembly will balloon extremely quickly simply because it takes a lot of BF to express fairly simple things. Branching using only `mov` is also impossible, so the code instead uses a flag to control where the assembly writes it's results. Because of this, even when you hit the condition of a loop, if your condition check is before the loop's code, then it results in you running the entire loop's code another time, and just throwing out the results. It's extremely wasteful.

    Based on this POC alone, it's pretty much impossible to guess how much slower a real `mov` only compiler would be vs. a regular compiler. It depends a lot on how much of this can be optimized. Directly compiling C to this via clang or gcc would be a good start. I have no idea how much work that would take though, and compiling anything more complex then BF is a real challenge in itself because of the branching problem.

    • 0xe2-0x9a-0x9b 11 years ago

      "compiling anything more complex then BF is a real challenge in itself because of the branching problem."

      That's not true.

  • 0xe2-0x9a-0x9b 11 years ago

    It is very hard to estimate the relative speed of a mov-only CPU (with a jump at the end of the program). The mov-only restriction applies only to how a program is input into the machine, it does not restrict the CPU architecture in any way. Obviously, the CPU would first analyze the whole mov-only program and convert it into something that is closer to x86 instructions: arithmetic instructions, unconditional jumps, conditional jumps. The success of this procedure, and ultimately the execution speed, depends on how many mov-patterns the CPU can recognize.

    Compilers would need to produce code that consists of standardized mov-only code patterns. This would help keep the CPU architectures relatively simple.

    Coming very close to native x86 performance seems possible, at least in theory.

cordite 11 years ago

Sounds like if we make a JS to BF transpiler, then we can complete this with any language that can be transpiled to JS.

Just imagine the applications of this!

The next step would be to make this into an eventual Quine.

Sir_Cmpwn 11 years ago

The Github description is misleading:

> The single instruction C compiler

I was seriously impressed until I got further down.

  • andrewflnr 11 years ago

    Well, if it reaches a point where it can compile itself, you'll get your compiler written with just one instruction (assuming that's what you were hoping for).

    • Sir_Cmpwn 11 years ago

      I meant that this software compiles brainfuck, rather than C. I was expecting a C compiler from the description.

      • ackalker 11 years ago

        From README.md:

            M/o/Vfuscator 2.0 is a complete C compiler, and will be available soon.
        
        Just be patient, or submit patches :-)
billforsternz 11 years ago

Forgive my ignorance but I thought being Turing complete was important because a Turing complete machine can in principle perform any arbitrary computation. As far as I know the program counter cannot be a destination for an X86 MOV. So you can't branch with only MOVs. How can you do an arbitrary computation if you can't branch?

Maybe you have to reorganize if(a) { b; } else { c; } as a kind of permuted b; followed by c; such that either the permuted b; or the permuted c; are effectively no-ops depending on the value of a.

  • DSMan195276 11 years ago

    You're right, they never actually do any jumping. From what I can tell, they use a variable called `on` to control where every set of commands write their results too. They do some funky `mov` stuff to do a comparison, and then set `on` with that comparison. They use a stack of `on` variables and then pop off the value of `on` from that stack when you exit a loop.

    They cheat to get a jump without using anything but `mov`'s by executing an illegal set of `cs` using `mov`, and then use the sigill generated to execute a signal handler, which is just the same spot that the code starts at. IMO, it doesn't really count, so a `mov`-only program requires at least one jmp instruction to work.

    I think loops are achieved by simply running the code to the end without saving any of the results, and then hitting the singal and going back to the beginning - ignoring all the code until you hit the loop you're executing. But, the `mov` code is fairly hard to understand so I could be wrong on that.

    • nitrogen 11 years ago

      Could one use self-modifying code (such as rewriting the SIGILL handler) to simplify this a bit?

  • Buge 11 years ago

    It also uses a single jmp instruction to branch back to the start in a loop.

    Two values a and b can be compared by storing 0 into * a and 1 into * b. Then a == b if * a is 1. Then this value can be used to index into an array to provide different behavior based on if a == b.

    Edit: apparently HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do.

    • thaumasiotes 11 years ago

      > HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do

      So? What does the markdown spec have to do with... anything? The XML spec says you have to begin your comment with a version number declaration, but you're not doing that.

  • nickodell 11 years ago

    It's not just mov's. There's a non-branching jump at the end that loops back to the beginning. Termination is accomplished by reading from address 0.

    It's based on this paper: http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

  • jacobwcarlson 11 years ago
bsder 11 years ago

It's a superset of the One Instruction Computer https://en.wikipedia.org/wiki/One_instruction_set_computer

spiritplumber 11 years ago

Could be good to defeat RF snooping attacks.

exabrial 11 years ago

"-O" enables optimizations

Replaces movs with less movs??

gnu8 11 years ago

This is almost surely a dumb question, but what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast. Would that be competitive with today's fastest CPUs? If it were, what would be the advantages and disadvantages of this design?

  • gizmo686 11 years ago

    Well, Turing machines do not have random access memory (which changes the big-theta complexity of algorithms), and that operations such as multiplications would be complicated and multi-cycle, whereas conventional CPUs can easily parralize them.

    If you were comparing the performs of mov-only programs, a specialized cpu would likely be faster. Otherwise, there is a good reason we do not actually use Turing machines.

  • sklogic 11 years ago
  • userbinator 11 years ago

    what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast.

    The laws of physics get in the way. It's essentially the same reason why clock frequencies have stopped going up.

  • 0xe2-0x9a-0x9b 11 years ago

    "a processor that only executed mov instructions, which would be extremely simple"

    I believe such a CPU wouldn't end up simple due to competition for better performance. It would end up being more complex than x86 CPUs.

  • asdfaoeu 11 years ago

    Might want to look at the examples. They turn an equality check into two register stores and a load.

liotier 11 years ago

Extremely Reduced Instruction Set Computer.

Or is it a Ridiculous Instruction Set ?

raverbashing 11 years ago

When I read the title I thought "ah of course cmov is turing complete", but no, it's only mov apparently.

Which is quite amazing (and probably dog slow)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection