Integrated assembler improvements in LLVM 19
maskray.meNice summary! Additional changes I have planned:
- Removing per-instruction timers, which add a measurable overhead even when disabled (https://github.com/llvm/llvm-project/pull/97046)
- Splitting AsmPrinterHandler (used for unwind info) and DebugHandler (used also for per-instruction location information) to avoid two virtual function calls per instruction (https://github.com/llvm/llvm-project/pull/96785)
- Remove several maps from ELFObjectWriter, including some std::map (changed locally, need to make PR)
- Faster section allocation, remove ELF "mergeable section info" hash maps (although this is called just ~40 times per object file, it is very measurable in JIT use cases when compiling many small objects) (planned)
- X86 encoding in general; this consumes quite some time and looks very inefficient -- having written my own x86 encoder, I'm confident that there's a lot of improvement potential. (not started)
Some takeaways on a higher level -- most of these aren't really surprising, but nonetheless are very frequent problems(/patterns) in the LLVM code base:
- Maps/hash maps/sets are quite expensive when used frequently, and sometimes can be easily avoided, e.g., with a vector or, for pointer keys, a pointer dereference
- Virtual functions(/abstraction) calls comes at a cost, especially when done frequently
- raw_svector_ostream is slow, because writes are virtual function calls and don't get inlined (I previously replaced raw_svector_ostream with a SmallVector&: https://reviews.llvm.org/D145792)
- Frequent heap allocations are costly, especially with glibc's malloc
- Many small inefficiencies add up (=> many small improvements do, too)
Big thanks for the recent performance changes! The "many small inefficiencies" point resonates – it definitely shows how performance is hurt in many small areas.
(I aim to write blog posts every 2-3 weeks, but this latest one was postponed... I wrote this in relatively short time so that the gap would not be too long, and I really should take time to refine the post.)
Side note, but I was looking for a pre-built binaries in releases of LLVM project. Specifically I was looking for clang+llvm releases for x86_64 linux (ubuntu preferably) in order to save some time (always had trouble compiling it) and to put it into my own `prefix` directory. It's kind of wild to see aarch64, armv7, powerpc64, x86_64_windows.. but not something like this. I am aware of https://apt.llvm.org/ and its llvm.sh - but as I said, I'd prefer it to live in its own `prefix`. Anyone knows where else there might be pre-builts? There used to be something just like that for v17, like https://github.com/llvm/llvm-project/releases/download/llvmo...
https://mirrors.edge.kernel.org/pub/tools/llvm/ provides a PGO-optimized LLVM toolchain. It is likely much faster than Distro provided Clang.
You might also want to replace the malloc with mimalloc/snmalloc, which might yield ~10% performance boost.
oh this is nice! Thanks!
> It's kind of wild to see aarch64, armv7, powerpc64, x86_64_windows.. but not something like this.
Yeah, sorry, mostly my fault. I'd been producing these regularly and haven't done as well lately. I'll get one uploaded for 18 soon. :(
thank you, friend! you're awesome
Ok, 18.1.8 uploaded [1] [2].
[1] https://github.com/llvm/llvm-project/releases/tag/llvmorg-18...
[2] https://discourse.llvm.org/t/18-1-8-has-been-tagged/79726/10...
Fantastic, thank you! Can you share the process how you build and package?
I can - it's well documented by the project itself [1].
Very nearly all of the work is done by the test-release.sh script.
In the first sentence, "[Intro to the LLVM MC Project]" was likely intended to be a link[1].
[1] https://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html
Thx. Fixed
TLDR: building projects with Clang is now about 4% faster due to optimizations in the way it internally handles assembly.
Perhaps more important, someone is going through MC and simplifying it. Decent chance that's a net reduction in bugs as well.
Thanks!