X86 mov is turing complete: mov-only compiler

github.com

238 points by thejj 11 years ago · 59 comments

Reader

M2Ys4U 11 years ago

Who needs instructions? Page faults are Turing complete: https://github.com/jbangert/trapcc

pokle 11 years ago

HN discussion of the paper that this is based on: https://news.ycombinator.com/item?id=6309631

jordigh 11 years ago

That paper is so funny:

    Finding Turing-completeness in unlikely places has long been a
    pastime of bored computer scientists.

And

    Removing all but the mov instruction from future iterations of the
    x86 architecture would have many advantages: the instruction
    format would be greatly simplified, the expensive decode unit
    would become much cheaper, and silicon currently used for complex
    functional units could be repurposed as even more cache. As long
    as someone else implements the compiler.

Well, I guess someone has implemented it now!

cwre 11 years ago

Also:
> Thus, while it has been known for quite some time that x86 has far too many instructions, we can now contribute the novel result that it also has far too many registers.

pslam 11 years ago

That's a fun program. Similar techniques to this are used to create "ROP" (return-oriented programming) exploits. You get enough building blocks to get general purpose programming, and it's "just" matter of mapping to those building blocks.

X86 "mov" is a bit of a cheat, though. It's a single mnemonic for what's really dozens of quite different instructions. For example, ARM has "mov" for register-register operations, but "ldr/str" for register-memory, while X86 just uses mov for everything. Even ARM mov optionally puts a barrel shifter into the path, so it's not that pure either.

userbinator 11 years ago

Even if they weren't a single mnemonic, if you read the paper you'll find that reg = mem[reg + C], mem[reg + C] = reg, and reg = C appear to be the only 3 instruction types that are actually needed; these are only register-memory and register-constant data transfers. Even register-register isn't needed.
- awalton 11 years ago
  
  register-register movs are needed because compilers are not the world's smartest code generators, and we don't have an infinite number of registers. The biggest problem arises when some calling convention expects certain things to be in certain registers - unless you do some pretty extreme long term planning in your register allocation code in the compiler, you're pretty likely to eventually wind up with some piece of information in a register that you need to move to another register before making or returning from a call.
  Of course, modern processors don't really care about register-register moves because they have hundreds of hidden intermediate registers and register renaming which makes those types of moves "free." (e.g. Haswell has 168 GP registers, only 16 of which are exposed through the AMD64 ISA.)
  - gsg 11 years ago
    
    Note that you don't need a dumb compiler, register pressure, or an instruction with physical register constraints to require a copy. Two-address instructions suffice: simply have a register for the first operand that contains a value that is used later. A copy is required to preserve the value stomped by the mutating instruction.
    In any case I believe the point of the GP is that any necessary copy can be done by going through memory, so register-register moves can be dispensed with where minimalism is the goal.
  - rdc12 11 years ago
    
    Do you have a citation for the 168GP registers, just wondering if Intel have "offically" stated the number they use in a given microarch.
    
    pbsd 11 years ago
    
    You can find an official Intel source for Haswell in here (page 15): http://pages.cs.wisc.edu/~rajwar/papers/ieee_micro_haswell.p...
    
    sp332 11 years ago
    
    I might be misreading this, but I think it's on slide 24 of this PDF http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/...
- pslam 11 years ago
  Yes, and these are effectively 3 different instructions sharing the same mnenomic. If you assemble them:
  mov 0x1000(%rax,%rax), %eax mov %eax, 0x1000(%rax,%rax) mov $7, %eax
  You get:
  0000000000000000 8b840000100000 movl 0x1000(%rax,%rax), %eax 0000000000000007 89840000100000 movl %eax, 0x1000(%rax,%rax) 000000000000000e b807000000 movl $0x7, %eax
  Which is basically 0x8b->Load, 0x89->Store, 0xb8->Constant.
vardump 11 years ago

ARM mov can be used as a JMP <addr in reg>. Program counter is in fact register R15 in ARM.
- StillBored 11 years ago
  
  This changed with V8, IIRC the PC can only be updated on branch or an exception...
jimmaswell 11 years ago

If you're going down that path, all of x86 is used as a 'bytecode' that turns into microcode instructions anyway in modern processors.
- pmalynin 11 years ago
  
  Not exactly, most instructions are not "microcoded" for performance reasons. But there is a class of instructions that is.
  - delroth 11 years ago
    
    Intel CPUs translate x86 to some internal RISC-like "uops" before doing optimizations and execution. This is a widely documented fact (though Intel doesn't really talk publicly about it as far as I know): see for example §2.1 in http://www.agner.org/optimize/microarchitecture.pdf
    
    mattgodbolt 11 years ago
    
    For a (hopefully!) more accessible introduction to the crazy world of what really happens inside the x86, I did a talk at GOTO: http://youtu.be/hgcNM-6wr34 - it's a really fascinating subject!
    
    userbinator 11 years ago
    
    For marketing reasons, Intel made quite a big commotion over the fact that its Pentium was "RISC-like" when it was first released, but in reality what it meant was using a decoder that could emit multiple uops in parallel instead of sequentially like it was in the 486.
    Even CPUs considered RISC today, like ARMs, need to decode instructions into one or more wider uops for efficient execution.

Rexxar 11 years ago

Is there a way to estimate how many time slower would be a program compiled to use only "mov" instead of standard instructions ? 3x, 10x, 100x, more ?

DSMan195276 11 years ago

It depends. In this case, I would say it would be at least 100x, probably much more. But, that's because this only compiles BF (It is basically a POC after-all), and thus you have to use that as an intermediate language. BF is extremely un-optimal, and the compiler here spits out a completely unoptimized one-to-one conversion of the BF source to assembly that uses only `mov` instructions. Thus, the size of the assembly will balloon extremely quickly simply because it takes a lot of BF to express fairly simple things. Branching using only `mov` is also impossible, so the code instead uses a flag to control where the assembly writes it's results. Because of this, even when you hit the condition of a loop, if your condition check is before the loop's code, then it results in you running the entire loop's code another time, and just throwing out the results. It's extremely wasteful.
Based on this POC alone, it's pretty much impossible to guess how much slower a real `mov` only compiler would be vs. a regular compiler. It depends a lot on how much of this can be optimized. Directly compiling C to this via clang or gcc would be a good start. I have no idea how much work that would take though, and compiling anything more complex then BF is a real challenge in itself because of the branching problem.
- 0xe2-0x9a-0x9b 11 years ago
  
  "compiling anything more complex then BF is a real challenge in itself because of the branching problem."
  That's not true.
0xe2-0x9a-0x9b 11 years ago

It is very hard to estimate the relative speed of a mov-only CPU (with a jump at the end of the program). The mov-only restriction applies only to how a program is input into the machine, it does not restrict the CPU architecture in any way. Obviously, the CPU would first analyze the whole mov-only program and convert it into something that is closer to x86 instructions: arithmetic instructions, unconditional jumps, conditional jumps. The success of this procedure, and ultimately the execution speed, depends on how many mov-patterns the CPU can recognize.
Compilers would need to produce code that consists of standardized mov-only code patterns. This would help keep the CPU architectures relatively simple.
Coming very close to native x86 performance seems possible, at least in theory.

cordite 11 years ago

Sounds like if we make a JS to BF transpiler, then we can complete this with any language that can be transpiled to JS.

Just imagine the applications of this!

The next step would be to make this into an eventual Quine.

shaftoe 11 years ago

http://copy.sh/v86/
It's turtles all the way down.
- mastax 11 years ago
  
  I'm not sure if I'm more amazed by how slow that is or how fast it is.
  Oh the world we live in: playing minesweeper in a graphical OS running in a javascript x86 emulator.

Sir_Cmpwn 11 years ago

The Github description is misleading:

> The single instruction C compiler

I was seriously impressed until I got further down.

andrewflnr 11 years ago

Well, if it reaches a point where it can compile itself, you'll get your compiler written with just one instruction (assuming that's what you were hoping for).
- Sir_Cmpwn 11 years ago
  
  I meant that this software compiles brainfuck, rather than C. I was expecting a C compiler from the description.
  - ackalker 11 years ago
    
    From README.md:
    M/o/Vfuscator 2.0 is a complete C compiler, and will be available soon.
    Just be patient, or submit patches :-)
    
    masklinn 11 years ago
    
    Would be an interesting LLVM code generator too.

billforsternz 11 years ago

Forgive my ignorance but I thought being Turing complete was important because a Turing complete machine can in principle perform any arbitrary computation. As far as I know the program counter cannot be a destination for an X86 MOV. So you can't branch with only MOVs. How can you do an arbitrary computation if you can't branch?

Maybe you have to reorganize if(a) { b; } else { c; } as a kind of permuted b; followed by c; such that either the permuted b; or the permuted c; are effectively no-ops depending on the value of a.

DSMan195276 11 years ago

You're right, they never actually do any jumping. From what I can tell, they use a variable called `on` to control where every set of commands write their results too. They do some funky `mov` stuff to do a comparison, and then set `on` with that comparison. They use a stack of `on` variables and then pop off the value of `on` from that stack when you exit a loop.
They cheat to get a jump without using anything but `mov`'s by executing an illegal set of `cs` using `mov`, and then use the sigill generated to execute a signal handler, which is just the same spot that the code starts at. IMO, it doesn't really count, so a `mov`-only program requires at least one jmp instruction to work.
I think loops are achieved by simply running the code to the end without saving any of the results, and then hitting the singal and going back to the beginning - ignoring all the code until you hit the loop you're executing. But, the `mov` code is fairly hard to understand so I could be wrong on that.
- nitrogen 11 years ago
  
  Could one use self-modifying code (such as rewriting the SIGILL handler) to simplify this a bit?
Buge 11 years ago

It also uses a single jmp instruction to branch back to the start in a loop.
Two values a and b can be compared by storing 0 into * a and 1 into * b. Then a == b if * a is 1. Then this value can be used to index into an array to provide different behavior based on if a == b.
Edit: apparently HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do.
- thaumasiotes 11 years ago
  
  > HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do
  So? What does the markdown spec have to do with... anything? The XML spec says you have to begin your comment with a version number declaration, but you're not doing that.
  - Buge 11 years ago
    
    I thought it was trying to implement markdown by italicizing half my sentence when I put * in. Markdown was what popped into my mind when I saw that behavior. It would be nice if I could find a way of escaping them without having to insert spaces where I don't want spaces.
    
    lorenzhs 11 years ago
    
    There isn't one, HN doesn't use markdown. See https://news.ycombinator.com/formatdoc
    
    kuschku 11 years ago
    
    It should use markdown, though.
nickodell 11 years ago

It's not just mov's. There's a non-branching jump at the end that loops back to the beginning. Termination is accomplished by reading from address 0.
It's based on this paper: http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf
jacobwcarlson 11 years ago

http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

bsder 11 years ago

It's a superset of the One Instruction Computer https://en.wikipedia.org/wiki/One_instruction_set_computer

spiritplumber 11 years ago

Could be good to defeat RF snooping attacks.

exabrial 11 years ago

"-O" enables optimizations

Replaces movs with less movs??

makeset 11 years ago

Fewer.

gnu8 11 years ago

This is almost surely a dumb question, but what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast. Would that be competitive with today's fastest CPUs? If it were, what would be the advantages and disadvantages of this design?

gizmo686 11 years ago

Well, Turing machines do not have random access memory (which changes the big-theta complexity of algorithms), and that operations such as multiplications would be complicated and multi-cycle, whereas conventional CPUs can easily parralize them.
If you were comparing the performs of mov-only programs, a specialized cpu would likely be faster. Otherwise, there is a good reason we do not actually use Turing machines.
sklogic 11 years ago

An extreme example:
http://laughtonelectronics.com/Arcana/One-bit%20computer/One...
A bit less extreme: http://www.sccs.swarthmore.edu/users/06/adem/engin/e25/final...
userbinator 11 years ago

what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast.
The laws of physics get in the way. It's essentially the same reason why clock frequencies have stopped going up.
0xe2-0x9a-0x9b 11 years ago

"a processor that only executed mov instructions, which would be extremely simple"
I believe such a CPU wouldn't end up simple due to competition for better performance. It would end up being more complex than x86 CPUs.
asdfaoeu 11 years ago

Might want to look at the examples. They turn an equality check into two register stores and a load.

liotier 11 years ago

Extremely Reduced Instruction Set Computer.

Or is it a Ridiculous Instruction Set ?

__alexs 11 years ago

One instruction set computer.
https://en.wikipedia.org/wiki/One_instruction_set_computer

raverbashing 11 years ago

When I read the title I thought "ah of course cmov is turing complete", but no, it's only mov apparently.

Which is quite amazing (and probably dog slow)

Settings

X86 mov is turing complete: mov-only compiler

Keyboard Shortcuts