Overriding C++ virtual functions at run time

blog.visionappster.com

59 points by topiolli 5 years ago · 38 comments

Reader

> The C++ standard does not specify how virtual functions should be implemented. In practice, however, compilers generate a virtual function table and place a pointer to it as the first member of a class.

wishful thinking: https://gcc.godbolt.org/z/qWEe9r

badsectoracula 5 years ago

Not sure what you are trying to show, the object still has a vtable and is placed as the first member (and in your example, only) of the class, so that quote is correct.
Obviously if you enable optimizations and one of those optimizations is avoiding the virtual call when the compiler thinks it isn't necessary, then sure you wont get a virtual call everywhere.
But if your code is relying on implementation assumptions like having a vtable at the start of a class, then it should also make sure that this assumption holds by not trying to work around it (e.g. via final) and using compiler options that control that optimization (e.g. GCC has -fno-devirtualize).
It doesn't make much sense to both try and take advantage of implementation details and work against taking advantage of implementation details at the same time.
- MaulingMonkey 5 years ago
  
  The article talks about vtable patching in scenarios where you might not be able to recompile the original target, which limits your ability to add unusual flags like -fno-devirtualize, or remove common optimization flags like -O3.
  Which doesn't make the technique completely useless, but raising this "obvious" important caveat - that it's likely to be an imperfect patch on it's own - when the article completely fails to do so, is worthwhile. I promise you there's C++ programmers out there who weren't aware of how aggressive optimizers can be.
  - badsectoracula 5 years ago
    
    The scenario is just an example of where it could be useful, but what you describe isn't something that is unique to patching the vtable - it is something that can happen with any form of function hotpatching, e.g. a library might use its own functions and some of those might be inlined by the compiler, so if you try to hotpatch one of those functions not all uses of that function will be replaced with your own.
    This is something that you should have in mind when hotpatching in general, but that doesn't make hotpatching any less useful.
- jcelerier 5 years ago
  
  > Not sure what you are trying to show, the object still has a vtable and is placed as the first member (and in your example, only) of the class, so that quote is correct.
  I would not say that it is correct - the "object"'s actual existence is only through the interpretation that is made of it by the functions that work on your bytes. There's no struct definition in compiled code, only functions that do something with memory at a given offset. I won't go as far as to say that "objects" don't exist once your code leaves C++'s abstract machine to enter the compiled code world... but quite close.
  With that said, 4 out of these 5 functions completely (and rightfully) disregard the vtable so if you're relying on that behaviour to be in place consistently to, say, fix a security issue in a given binary... you're in for some surprises.
  - badsectoracula 5 years ago
    
    With "the object" i meant the bytes in memory that make up an object without taking optimizations into account which can ignore parts of it (since, as i wrote, those can interfere with what you are trying to do).
    And yes, the compiler made 4 out of these 5 functions to disregard the vtable, but this is again something the compiler did that you have control over - if you are trying to take advantage of such implementation specific assumptions, you wont realistically use optimizations that break these assumptions nor write code that do not follow them.
    The only place where this can break is if a library uses its own functions, those functions are inlined in some places and you want to hotpatch them. But that is an issue with hotpatching 3rd party code you have no control over in general regardless of language (it can happen in C too, for example), not just with vtable hotpatching.
    
    jcelerier 5 years ago
    
    > this is again something the compiler did that you have control over
    I've rarely if ever heard of patching vtables in cases where you have access to the code - it's always been about fixing a binary you cannot recompile with some LD_PRELOAD trick or similar
- leni536 5 years ago
  
  In other words if your program relies on undefined behavior then make sure you look at your compiler's documentation and then you might be able to make it implementation defined.
  - formerly_proven 5 years ago
    
    The C++ standard is written in such a way that a vtable as the first member of a polymorphic class instance is the obvious way to satisfy the standard's demands. I don't think there is any mainstream C++ implementation that doesn't use vtables, although some older C++ compilers used a slightly different layout.
    The fact that inside a single CU an optimizing compiler can determine the targets of polymorphic dispatch statically is unrelated, because most usage scenarios (more than one CU, external modules) preclude such optimizations anyway.
    vtables aren't undefined, they are implementation defined and part of the C++ ABI. As soon as you expose something through the C++ ABI the compiler has to use the ABI's definitions.
    
    qppo 5 years ago
    
    People might bring up that the only ABI that is "stable" is Itanium, while MSVC has historically broken ABI stability quite regularly until fairly recently (2017, iirc?).
    That said, so much software on Windows relies on vtable layout that breaking their particular implementation in the ABI would be a massive breaking change, so it's unlikely that it will happen.
    
    gpderetta 5 years ago
    
    I'm not a Windows programmer,but IIRC vtable layout is very thightly tied to COM so I doubt that MS has broken or will break this specific bit of the ABI anytime soon.
    
    leni536 5 years ago
    
    The layout of objects, including dynamic objects, is implementation defined.
    Accessing them through reinterpret_cast is however undefined. Don't expect that compilers won't screw you over this.
topiolliOP 5 years ago

Cool. Is this due to "final" and the fact that the compiler can figure out the target at compile time?
- MaulingMonkey 5 years ago
  
  One call site (the one that takes bar&) relies on the final to devirtualize. The other call sites merely rely on the compiler being able to determine the exact type involved at compile time to devirtualize.

bregma 5 years ago

We were overriding non-virtual functions at run time in the 8-bit days. Even in the feature article's case it would be easier and more reliable to patch the GOT (since he's using ELF on Linux).

It's hardly news but I guess it makes this common cracking technique more accessible.

saagarjha 5 years ago

Patching the GOT doesn't work for internal function calls, though.

mehrdada 5 years ago

As you might imagine, overwriting vtables in memory is a common technique to hijack control flow and making your program execute attacker's code in an exploit.

saagarjha 5 years ago

Patching function pointers in general is very desirable thing to be able to do when writing these kinds of things. vtables are interesting because unlike normal C function pointers they can only be used in a standards-compliant manner in a few very limited ways, so if you’re implementing control flow verification you can really ratchet up the security for these. For normal C function pointers, sadly, the best you can do is usually very little, if anything at all. Especially because the use of non-compliant constructs like forging ordinary function pointers is extremely common in things like language runtimes.

Someone 5 years ago

In Objective-C, that’s called “method swizzling”, and better supported by the runtime. See https://nshipster.com/method-swizzling/

And of course, Common Lisp has “change-class” (https://www.snellman.net/blog/archive/2015-07-27-use-cases-f..., discussed at https://news.ycombinator.com/item?id=734025) and Smalltalk has “become:” (https://gbracha.blogspot.com/2009/07/miracle-of-become.html. Short discussion at https://news.ycombinator.com/item?id=734025)

jamesu 5 years ago

Had to go a step further in a project and patch static functions in a codebase with no source. It’s certainly enlightening how much you can do with just a symbol map and type info.

I don’t think the articles vtable layout is entirely accurate for gcc though - usually you’ll get 2 destructors at the start of the vtable (assuming the first virtual func declared is the destructor).

greesil 5 years ago

Are you willing to share any info on how you got stuck with this project, and what this "codebase" was? Can you call it a codebase is it doesn't have source code?
- topiolliOP 5 years ago
  
  I really cannot recall which library it was exactly. I worked on a project that had many proprietary dependencies. It might have been a GenICam camera driver that caused us a lot of other headaches as well.
- Google234 5 years ago
  
  I would guess a normal project that involves this would be making a hack for a video game
  - greesil 5 years ago
    
    Games ship with symbols unstripped?
    
    MaulingMonkey 5 years ago
    
    You'd be suprised what games accidentally ship. I've seen everything from pdbs to unoptimized debug builds accidentally included. If you don't make those mistakes, you might still have std::type_info (or custom equivalents) / RTTI info and __FILE__ / __LINE__ spam from macros, that can still sneak their way in.
    Suppose you fix all that - the game still has a good chance of including 1 or more scripting languages. They may be included complete in their original unobufscated glory. Or the bytecode might include unstripped debug information. Or the C++ <-> script binding layer might still include unobfuscated string identifiers.
    If you're clever, you'll rebrand this lack of obfuscation as being "mod friendly" ;)
    
    diath 5 years ago
    
    Wouldn't that be useful for crash reporting?
    
    antiuniverse 5 years ago
    
    The standard approach to this, at least on Windows, is to build the debug symbols into a separate database (PDB file), and reconnect the addresses to the symbol names on the back end. Microsoft makes tons of symbols available for their own code via a symbol server which debuggers can query by the combination of a module hash and a relative virtual address.
    
    Google234 5 years ago
    
    Some games do, sometimes during a beta.

The_rationalist 5 years ago

I was wondering whether such a thing is possible for JVM based languages and it turns out it is: https://stackoverflow.com/questions/8273685/is-it-possible-t...

danmg 5 years ago
In Java, you can write an agent and intercept when the bytecode for a class is loaded by the JVM by the Class Loader.
ASM is just a bytecode reader/writer/visitor library that can then modify it. You could also just do it statically by having the bytecode you want injected ready to go in an array or resource.
```
   if (className.equals("a/b/c/d$e"))
       return fixedClassBytecode;
   else
       return bytecode;
```
It's possible to create custom loaders (e.g., loading a class from an encrypted zip-file, or one that creates custom bytecode on the fly) and things which are trying to obfuscate what they're doing will have custom loaders, but this is a choke point that every class that's read in and instantiated must pass through.
MaxBarraclough 5 years ago

So, is it possible? I get the impression you can create a new class using a bytecode-manipulation library, but that's not the same as modifying an existing class at runtime.
I believe classes, once loaded by the JVM, cannot be changed. https://stackoverflow.com/a/43653466/
- CHY872 5 years ago
  
  It's generally possible to rewrite loaded classes. Some changes are possible (e.g. replacing a method's body), others are not (adding new methods). The state of the art library for doing this at the application layer is ByteBuddy https://github.com/raphw/byte-buddy#changing-existing-classe... but this functionality is used by plenty of tooling - profilers such as YourKit rewrite methods to add telemetry - I've seen some security libraries attempt to add additional security related hooks.
  - The_rationalist 5 years ago
    
    When would you use ByteBuddy versus https://asm.ow2.io ?
    
    CHY872 5 years ago
    
    ASM is far lower level than ByteBuddy. To write some ASM code you probably want to have some proficiency with the bytecode format itself (e.g. you're typically outputting individual JVM bytecode instructions ;(invokespecial,athrow,etc). Personally I'd probably always use ByteBuddy unless I had some very specific reason why not. For comparison, these two examples https://github.com/raphw/byte-buddy#changing-existing-classe... http://web.cs.ucla.edu/~msb/cs239-tutorial/ explain how to do a System.out.println wrapper around a method.
    In one case, the developer writes `System.out.println`. In the other, the developer must individually get the static field System.err and push it to the stack, reference PrintStream's method println, including the arguments (Ljava.lang.String;)V. That means it takes an array of strings and returns void.

ppg677 5 years ago

FDO compilation will often de-virtualize and remove vtable indirection.

rurban 5 years ago

AutoCad does this in their ObjectARX technology, with a fixed compiler version, to support user or vendor provided plugins to extend classes. At runtime. For decades.

bitwize 5 years ago

And this is why despite trying real hard several times, they couldn't remove Autolisp completely...

rootlocus 5 years ago

This all fine and well for single public inheritance. Multiple inheritance and virtual inheritance don't generate layouts this simple.

Settings

Overriding C++ virtual functions at run time

Keyboard Shortcuts