Settings

Theme

Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

zml.ai

77 points by steeve 3 months ago · 12 comments

Reader

rdyro 2 months ago

Looks cool!

nvtop can actually support TPUs too via https://github.com/rdyro/libtpuinfo/ https://github.com/Syllo/nvtop/blob/76890233d759199f50ad3bdb...

serialx 2 months ago

Look into all-smi https://github.com/lablup/all-smi It supports all GPUs thinkable including Apple Silicon and many AI accelerator cards.

mrflop 3 months ago

Renaming fopen64 to intercept library calls feels like a brittle hack masquerading as "sandboxing." Why not just upstream this hardware support to nvtop instead of fragmenting the ecosystem?

  • steeveOP 3 months ago

    sadly, sandboxing is something that can't be upstreamed. this way, sandboxing is kept in zml instead of patching mesa.

    as for nvtop, great program, but we missed a few features (such as sandboxing)

    • pstuart 2 months ago

      It looks cool and I was excited to get monitoring for the NPU on my Ryzen AI 395+, unfortunately it does not show. NPU support in linux really seems to be an afterthought.

      • steeveOP 2 months ago

        Weird, because we tried it. It doesn’t show anything?

        We use the amdsmi to get metrics. I’ll investigate.

  • marwanet 2 months ago

    If this logic were pushed into nvtop, wouldn't the codebase become unmaintainable? Each vendor's interception method is going to be different.

imcritic 2 months ago

Is it capable of exposing metrics in Prometheus format?

152334H 2 months ago

"NPU" seems to refer to trainium only?

synergy20 2 months ago

would be nice to have cpu usage added so I have all in one?

currently I use btop which shows basic gpu load along with cpu, network, etc.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection