Making Computer Faster: a Deep Dive into Dynamic Dispatch (part 1)

Introduction

I think I’ve found a solution to an old engineering problem: dynamic method dispatch. It happens when the program decides at runtime which piece of code to run. This mechanism is fundamental to method calls in Javascript and Python and widely used in any OOP programming language (and now even in Rust). I’m explaining all that in depth in the next section of the post.

The goal of this series of posts is to present my solutions so you can poke holes in order to improve it. If you like it, I’ll implement it with production ready code. To be extra-clear, it’s a side project for me I do for fun (i.e. outside of my day job).

My initial results show an average speedup of 5% on real-world programs. Five percent doesn’t sound like much… until you realize it’s the equivalent of half a CPU generation (source).

Because I’ve run these tests from my home often in my pajamas, I invite you to reproduce them and share with me the results.

My experiments focus on the JVM because its openness and advanced compiler/JIT (and of course me being poor). Nonetheless, the algorithms and concepts generalize to all computing platforms (e.g., x86, Python). I didn’t do that work (due to me being poor). If you’re interested in testing this on other platforms, feel free to reach out.

This post introduces the problem and its underlying concepts. In the next post, I’ll share my experiments and benchmarks. Then, I’ll explain my solution, along with the resulting performance improvement.

Dynamic Method Dispatch

If you already know what dynamic dispatch is, you can skip the rest of this section. If you don’t, I’ll explain what it is.

A dynamic method dispatch event happens when the program decides which code to execute at runtime (rather than compile time). Interpreted languages like Python use this mechanism for most function calls, while compiled languages like Rust use it sometimes.

Here’s an example to make it clearer:

// Parent class

class Animal {

    void makeSound() {

        System.out.println("Some sound");

    }

}

// Child classes

class Dog extends Animal {

    void makeSound() {

        System.out.println("Woof!");

    }

}

class Cat extends Animal {

    void makeSound() {

        System.out.println("Meow!");

    }

}

Animal[] animals = { new Dog(), new Cat(), new Animal() }; 

// Shuffle the list to randomize the order 

Collections.shuffle(animals);

// Loop through the array and call makeSound() 

for (Animal animal : animals) { 

            animal.makeSound(); // Actual method depends on the object's runtime type 

}

}

The computer can’t predict at compile time which makeSound() method will be called (neither can you because of the shuffle call). Instead, it runs a “bit of additional code” at runtime to decide which method to run for each of these types of calls. This additional code is the dynamic method dispatch.

Each of these overheads is small, but they compound because they disrupt CPU prediction (i.e. and the prefetching associated with it). And since this process happens billions of times per second in real-world programs, it ends up mattering.

Using the JVM JIT

The dynamic dispatch overhead is a well-known problem, and a lot of engineers and scientists have worked on it. There are various optimization techniques available all implemented by the JVM.

JVM uses a JIT (Just In Time compilation), which optimizes the program based on how many times it’s used. At first, the program will run in an interpreted mode without any optimizations. With repeated execution, the codeis getting compiled and optimized using all available techniques. In the case of dynamic dispatch, it uses two main techniques (that can be combined):

Inlining: it replaces the dynamic dispatch directly with the method call, effectively eliminating the need for dynamic dispatch (in some cases; in others it’s a IF).
Cache: When a particular dispatch repeatedly invokes the same method,, the JVM uses a cache to speed up the resolution.

These are the two main techniques I’ve seen in my research. Please let me know if I’ve missed any. These approaches are detailed in great details in the two papers I reference below.

Previous Research

Another reason I want to use the JVM is it’s an active research area. There are super-smart research teams working on it. Surprisingly, Java’s interface dispatch performance has not been studied very much. I could only find two relevant documents:

Historical Perspective: A 2001 paper published in ACM SIGPLAN Notices (https://dl.acm.org/doi/10.1145/504311.504291) suggested that Java’s method dispatch, including interfaces, was “harmless” in terms of performance. This was groundbreaking at the time, as it suggested using interfaces wouldn’t lead to notable slowdowns in Java programs. In the meantime however, computer architecture has significantly changed with RAM becoming much slower than the CPU.
2015 Analysis: Aleksey Shipilev’s comprehensive study, “The Black Magic of (Java) Method Dispatch” (https://shipilev.net/blog/2015/black-magic-method-dispatch/), provides a more nuanced view. It shows that interface method calls can be 50% slower than direct method calls in some cases and almost as fast in others. However, Shipilev doesn’t provide data for real-world programs.

Next Steps

This post laid the groundwork by introducing the problem and discussing existing solutions. The next post will be about reproducing the interesting results of these papers and real-world programs (that they don’t discuss).