Settings

Theme

Reversing LZ91 from Commander Keen

lodsb.com

75 points by samrussellbg 5 years ago · 17 comments

Reader

pdw 5 years ago

The article doesn't make this explicit, but this game is just using LZEXE. This was an executable file compressor that was very widely used in the early 90s. An early project by Fabrice Bellard!

  • TacticalCoder 5 years ago

    Oh my goodness! I spend countless hours using LZEXE back in the days and never made the link. I never realized it was made by Fabrice Bellard!

    From his homepage:

    > "I wrote LZEXE in 1989 and 1990 when I was 17."

    Incredible.

  • _kdave 5 years ago

    Also tools like CUP386 did that for free, but anyway interesting read.

albertzeyer 5 years ago

Note that Commander Keen itself was also reverse engineered. The most active project is Commander Genius: https://clonekeenplus.sourceforge.io/ https://github.com/gerstrong/Commander-Genius (You will find some code by me in this. :))

indentit 5 years ago

Forgive me, as I've never disassembled anything myself(!), but would it not be helpful to be able to disassemble an executable into pseudo-code or something (I guess ideally something a bit higher level but re-compilable) alongside the assembly language? It seems to me that it could be much easier to understand what is happening that way, no?

  • AnIdiotOnTheNet 5 years ago

    You'd think so, but it turns out that reversing compiled code in an automated fashion doesn't usually produce very readable results:

    https://derevenets.com/examples.html

  • samrussellbgOP 5 years ago

    this is why i wrote the unpacker in python so you can see what's happening :)

    the reason i posted this isn't because there's a lack of LZEXE unpackers around, but because learning to reverse is hard and i wanted to share an example of the step by step process of what reversing looks like. it's very tedious and slow. functions like these are places where a lot of reversers are going to give up and look elsewhere, so i wanted to show an example of how to break it apart piece by piece instead of getting lost or giving up

  • mschuster91 5 years ago

    It's not possible in the most cases - unless you have the exact same version of the compiler that was used and you can figure out the build settings that were used (especially optimizations, but also stuff like include order), you can't recreate the assembler code from pseudo / C code.

    Modernizations are especially tricky. Modern compilers can do all sort of weird magic, sometimes combining two or more lines of code into one instruction. Old school compilers don't optimize much which is part of why performance-critical parts of game engines were written in Assembler for a long time.

    Not to mention that some stuff you can do in Assembler has no equivalent in higher-level code (e.g. dealing with raw stack frames), and even Assembler to byte code is nowhere near 1:1 reversible.

    • bugfix 5 years ago

      You might not get the exact same code, but it is certainly possible to generate C/pseudo-code from the binary.

      IDA Pro and Ghidra can identify functions and generate the equivalent C code. I know that this is not the original code, but it does help a bit when you are trying to get an idea of what a large function doing.

      • kaoD 5 years ago

        You're both right.

        I've used Ghidra to reverse-engineer a game's serialization format[0] and, even though the C-ish result was marginally better than manually tracking registers across the disassembly, it was far from understandable.

        A great deal of the work was cleaning up the resulting C into something that a human would've written instead of the garbage ASM-with-C-syntax that Ghidra produced.

        That is nowhere near what OP was suggesting (although useful nonetheless).

        [0] https://github.com/alvaro-cuesta/townsclipper

        • mschuster91 5 years ago

          I'm actually reverse-engineering a game myself... interestingly, for me Ghidra produces very good results, way better than IDA did ten years ago. On the other hand I may be lucky simply because 1996 Borland C++ is a pretty dumb, unoptimizing compiler and there is absolutely no copy protection or whatever present in the game, not even a dead code elimination.

          Only thing where Ghidra lacks any form of knowledge of is how to deal with the FS register that is used for SEH on win32... it just marks it as in_FS_offset with no way to tell it that it can replace FS:[0xXX] with appropriate TIB access macros.

  • mips_avatar 5 years ago

    Some delta compressors like Google Courgette actually do this.

stevekemp 5 years ago

Off-Topic but the domain-name for that site is perfect. (I wonder how many people these days would even recognize it!)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection