Benchmarks: Go's FFI is finally faster then GDScript (and Rust?) · quaadgras graphics.gd · Discussion #277

One of the most frequently raised concerns about using Go for graphics & game development is the FFI overhead. cgo (the layer that lets you call the C ABI from Go) has to (amongst other things) switch from a goroutine stack, to a C stack, which for a long time, has contributed to a cost of more than 100ns of overhead per call. This meant that for a long time, the caveat of graphics.gd has been "sure, Go is way faster than GDScript in general but calling into the engine is way slower".

I would like to thank @timklge, who's been spearheading the effort on putting together the first set of FFI benchmarks for the community-maintained GDExtension languages. spritebench creates 20,000 extension nodes that each do some vector math and then make a call into the engine each frame. This provides a practical baseline measurement for FFI through Engine -> Go -> Engine.

Before Go 1.26	Latest Benchmark Results

BenchmarkMethodBindCallWithReturnValue-32     	26411888	        44.17 ns/op	       0 B/op	       0 allocs/op
BenchmarkMethodBindCallThatReturnsVoid-32     	51284286	        23.30 ns/op	       0 B/op	       0 allocs/op
BenchmarkGDScriptCall-32                      	52647039	        23.10 ns/op	       0 B/op	       0 allocs/op

Yes, that's almost three times faster! graphics.gd is even ahead of 'safe' Rust (for now). Not only does graphics.gd have great tooling, type-safety, readability & portability. We also know now that Go can reach equivalent FFI performance to GDScript (which means it should be much faster in practice).

How?

Go 1.26 released with cgo overhead reduced by approximately 30%. We didn't initially see such a wide improvement though and I was sure, that graphics.gd could do better. I had previously explored the option of optimizing cgo calls and I thought that now, thanks to spritebench, it was the perfect time to game some benchmarks and land some serious optimizations!

GDExtension exposes object_method_bind_ptrcall, a function that can be used to call a method on any engine object. What's great about this, is that it's the same signature for any function, just different parameters. This makes it an excellent candidate for batching! So the biggest speed up comes from loading sequential calls made on the main thread, into a ring buffer which flushes everything in C when you need a result back from the engine, or when it's full.

That's why the CallThatReturnsVoid in the go test -bench results above has virtually no additional overhead versus GDScript, as it's eligible for batching. In practice, this means if you only call rendering functions and setters on the main thread, the cgo overhead is effectively nil. We've also landed a number of optimizations around method calls & engine objects (the internal representation for objects has been entirely reworked for higher performance).

Web?

Yes web performance has been improved too! In the hot paths, Go now bridges wasm2wasm without needing to use the intermediate JS layer.

BenchmarkMethodBindCallWithReturnValue  	 1519410	       786.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkMethodBindCallThatReturnsVoid  	 5741487	       209.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkGDScriptCall                   	23573400	        50.62 ns/op	       0 B/op	       0 allocs/op

More!

We've also landed optimizations to avoid cgo entirely when calling into engine leaf functions, this means directly jumping into them when the functions have been statically determined to be a leaf (that doesn't make any further function calls). This effects about 25% of the GDExtension API and eliminates cgo overhead for many setters and getters.

Take this arm64 benchmark result as an example:

BenchmarkMethodBindCallWithReturnValue-10       148123444                8.114 ns/op           0 B/op          0 allocs/op
BenchmarkMethodBindCallThatReturnsVoid-10       91260824                13.10 ns/op            0 B/op          0 allocs/op
BenchmarkGDScriptCall-10                        92015772                12.84 ns/op            0 B/op          0 allocs/op

There's still room for improvements in cgo & web builds but these results show that graphics.gd is now a competitive option for high-performance cross-platform [1] graphics!

[1] Keep in mind that the official C# bindings + SwiftGodot don't support web yet, their web benchmark scores are zero!