Reversing LZ91 from Commander Keen
lodsb.comThe article doesn't make this explicit, but this game is just using LZEXE. This was an executable file compressor that was very widely used in the early 90s. An early project by Fabrice Bellard!
Oh my goodness! I spend countless hours using LZEXE back in the days and never made the link. I never realized it was made by Fabrice Bellard!
From his homepage:
> "I wrote LZEXE in 1989 and 1990 when I was 17."
Incredible.
Also tools like CUP386 did that for free, but anyway interesting read.
Note that Commander Keen itself was also reverse engineered. The most active project is Commander Genius: https://clonekeenplus.sourceforge.io/ https://github.com/gerstrong/Commander-Genius (You will find some code by me in this. :))
Forgive me, as I've never disassembled anything myself(!), but would it not be helpful to be able to disassemble an executable into pseudo-code or something (I guess ideally something a bit higher level but re-compilable) alongside the assembly language? It seems to me that it could be much easier to understand what is happening that way, no?
You'd think so, but it turns out that reversing compiled code in an automated fashion doesn't usually produce very readable results:
Sometimes even assembly is too high level too: https://www.youtube.com/watch?v=eunYrrcxXfw
Sometimes you get lucky and debug information is left in the binary.
Hex-Rays begs to differ.
this is why i wrote the unpacker in python so you can see what's happening :)
the reason i posted this isn't because there's a lack of LZEXE unpackers around, but because learning to reverse is hard and i wanted to share an example of the step by step process of what reversing looks like. it's very tedious and slow. functions like these are places where a lot of reversers are going to give up and look elsewhere, so i wanted to show an example of how to break it apart piece by piece instead of getting lost or giving up
It's not possible in the most cases - unless you have the exact same version of the compiler that was used and you can figure out the build settings that were used (especially optimizations, but also stuff like include order), you can't recreate the assembler code from pseudo / C code.
Modernizations are especially tricky. Modern compilers can do all sort of weird magic, sometimes combining two or more lines of code into one instruction. Old school compilers don't optimize much which is part of why performance-critical parts of game engines were written in Assembler for a long time.
Not to mention that some stuff you can do in Assembler has no equivalent in higher-level code (e.g. dealing with raw stack frames), and even Assembler to byte code is nowhere near 1:1 reversible.
You might not get the exact same code, but it is certainly possible to generate C/pseudo-code from the binary.
IDA Pro and Ghidra can identify functions and generate the equivalent C code. I know that this is not the original code, but it does help a bit when you are trying to get an idea of what a large function doing.
You're both right.
I've used Ghidra to reverse-engineer a game's serialization format[0] and, even though the C-ish result was marginally better than manually tracking registers across the disassembly, it was far from understandable.
A great deal of the work was cleaning up the resulting C into something that a human would've written instead of the garbage ASM-with-C-syntax that Ghidra produced.
That is nowhere near what OP was suggesting (although useful nonetheless).
I'm actually reverse-engineering a game myself... interestingly, for me Ghidra produces very good results, way better than IDA did ten years ago. On the other hand I may be lucky simply because 1996 Borland C++ is a pretty dumb, unoptimizing compiler and there is absolutely no copy protection or whatever present in the game, not even a dead code elimination.
Only thing where Ghidra lacks any form of knowledge of is how to deal with the FS register that is used for SEH on win32... it just marks it as in_FS_offset with no way to tell it that it can replace FS:[0xXX] with appropriate TIB access macros.
Some delta compressors like Google Courgette actually do this.
Off-Topic but the domain-name for that site is perfect. (I wonder how many people these days would even recognize it!)
you have no idea how happy it makes me that people recognize the reference :)