Show HN: I wrote a tool in Rust for tracking all allocations in a Linux process

113 points by mkimball 3 years ago · 30 comments

Reader

mitchs 3 years ago

Neat. I had made something similar for work a while back, but as a LD_PRELOAD library that intercepted calls to malloc and friends. It would add extra space to every allocation so it could add a pointer at the end that would point into a leaf node of a call graph backtrace tree it maintained. Each node in the tree had lifetime allocated/freed block counts and bytes by code site. The cool part about it was that it barely affected the performance of the application.

It made its own socket and thread to listen on it. It would just dump a snapshot of tree to anything that connected. I also had some tooling that would let you diff two snapshots, since it was helpful to see if particular stimuli cause persistent extra allocations. While finding the largest outstanding delta between allocated and free bytes was great for finding leaks, sorting by lifetime count of blocks allocated was also fun. I remember some little puzzle game I enjoyed playing at the time would allocate and free tens of thousands of blocks as you dragged a line around for a second.

There was a tricky chicken and egg problems with LD_PRELOAD wrapping one of the allocation functions, because it was used internally by dlsym, which I was using to retrieve pointers to the proper function implementations. (calloc if I recall correctly.) I hacked around it by making my library allocate bytes out of a static char array for the calloc call that would happen while dlsym-ing for calloc. Debugging this was a nightmare, since it would break so early in the process's lifetime that GDB breakpoints weren't functioning. Tracking in a second process seems like a way simpler idea, and probably doesn't have too much of an impact on performance.

Scramblejams 3 years ago

If $JOB will let you throw it on GitHub, I'll try it!

arsome 3 years ago

Sounds interesting but I'd very much appreciate knowing what the output any exploration capabilities look like in allocscope-view before jumping into installation, maybe add some screenshots to the readme. Poking around the code it looks like a curses-based interface.

mkimballOP 3 years ago

Yeah, it's a curses based interface, but with an option to output a text report for offline use.
Good idea to add screenshots.

wongarsu 3 years ago

That looks quite neat.

Though I'm currently not on a x64 linux, and since the main selling point seems to be the TUI it would be great to have a couple screenshots, or even better a gif of an asciinema recording (or whatever people use now).

yohannesk 3 years ago

This might help https://twitter.com/KimballCode/status/1614276163005726720?c...

catskul2 3 years ago

Could you compare/contrast its functionality to https://github.com/KDE/heaptrack ?

alschwalm 3 years ago

Interesting approach. How is performance compared to something like https://github.com/koute/bytehound

kouteiheika 3 years ago

Bytehound author here.
Just from a cursory look at the README:
> allocscope-trace attaches to another process as a debugger. By using breakpoints on memory allocation functions such as malloc it tracks allocations made by that process.
Looks like it's using breakpoints so I'd expect it to be orders of magnitude slower. And looking at the source code it's also using `libunwind`, so even if it wasn't using breakpoints it'd still be at least another order of magnitude slower since Bytehound has a custom unwinder that's specially optimized for this purpose.
One advantage it has is that it can be attached to an already running process; Bytehound can't do that. (I have ideas how I could do that, and it should be technically doable by dynamically injecting Bytehound's .so into the target process' address space, but so far I haven't needed it so I did not implement it)
- kouteiheika 3 years ago
  
  Out of curiosity I ran a quick test on my private benchmark.
  libbytehound.so (with extra debug assertions, because I'm too lazy to recompile in release mode): 4s
  allocscope: did not finish after 4 minutes (I got bored waiting and CTRL+C'd it)
- alschwalm 3 years ago
  
  Yeah, that was my assumption as well, good to have it confirmed though. Thanks for your excellent work on bytehound!
dmos62 3 years ago

Why is this being downvoted?
Edit: now this comment is being downvoted.

JoshMcguigan 3 years ago

Thanks for sharing! I built a similar tool (also in Rust) which allows tracing system and library calls, and could be used for this purpose. I wanted to expose the functionality both as a library and CLI, but for now I’ve only published documentation on using the CLI.

https://github.com/JoshMcguigan/backlight

stevefan1999 3 years ago

If this project can trace memory allocation/deallocation and their call stacks in real time -- this would be super useful, because we can statistically profile which function is always allocating without proper free in a certain time frame (when the memory is supposed to be freed), because valgrind only tells you there are memory leaks but not where is the leak exactly.

mkimballOP 3 years ago

Depends on what you mean by "real time". My method of crawling the stack impacts execution speed of the app you are tracing. I intended to do future work to minimize that impact.
With allocscope, you do get a callstack for the allocations which leak, though.
- stevefan1999 3 years ago
  
  For real time I mean it will not severely slows down a game from 120fps to 40fps, this kind of real time
Too 3 years ago

Valgrind with --leak-check does include the stack trace of where unfreed memory was originally allocated.

wyldfire 3 years ago

I'd be curious to see how this ptrace tool performs compared with one that relies on ELF symbol interposition (a la LD_PRELOAD). Other heap profilers (heaptrack, libtcmalloc, etc) use this method. Presumably the loader resolves the symbols once at load time and there's little cost overhead to switch to the profiler code.

However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.

kouteiheika 3 years ago

> I'd be curious to see how this ptrace tool performs compared with one that relies on ELF symbol interposition (a la LD_PRELOAD).
I've posted some very quick numbers in my comment here comparing it to Bytehound: https://news.ycombinator.com/item?id=34806401
> However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.
Bytehound also gathers mmaps. (:

linuxftw 3 years ago

Assuming you're running linux, there are some ebpf programs that can accomplish this already, no breakpoints needed.

nequo 3 years ago

Do you have any that you would recommend specifically?
- linuxftw 3 years ago
  
  This blog post talks about some of the existing tools and their tradeoffs: http://mysqlentomologist.blogspot.com/2021/05/dynamic-tracin...

catskul2 3 years ago

I really like the picture at the top, was that the work of stable diffusion?

mkimballOP 3 years ago

Midjourney, actually. :)

behnamoh 3 years ago

Let me guess: it made the HN front page because Rust.

Thaxll 3 years ago

So it's like strace looking for brk()?

wyldfire 3 years ago

strace is limited to system calls but this particular tool uses ptrace to trap symbolic references to mmap, malloc, calloc, etc. This provides better resolution because your allocator probably asks for large chunks of memory from the system and allocates from those instead of making each request one-for-one.
weinzierl 3 years ago

Sorry if this is a dumb question, but can't strace trace brk() calls?
And as kind of a follow up what is the easiest way to trace all allocations (brk() and mmap) but nothing else?
- matheusmoreira 3 years ago
  > can't strace trace brk() calls?
  Absolutely.
  > what is the easiest way to trace all allocations (brk() and mmap) but nothing else?
  strace -e mmap "$command"
  I don't think anything modern still uses the program break but one should know brk and sbrk exist. To see deallocations, add munmap to the filter. Note that these represent operating system allocations: programs usually request huge chunks and then manage that memory in user space in order to avoid system call overhead. In many systems, this memory won't actually count as used unless the process actually touches it and causes page fault.
  - zokier 3 years ago
    
    Fyi there is -e %memory alias in strace for all memory related syscalls

Settings

Show HN: I wrote a tool in Rust for tracking all allocations in a Linux process

Keyboard Shortcuts