AMD AI Software Solved – MI300X Pricing, Perf, PyTorch, FlashAttention, Triton
semianalysis.comHonestly, the likes of OpenAI and Mosaic need to consider Nvidia a huge threat long term.
Nvidia has shown time and time again, they will royally fuck over anyone they have to in order to drive profit.
Then, if they dare talk negatively about them, they will just discontinue their access to hardware.
Not saying AMD is a savior, but having only ONE option will lead to long term issues.
This is a very real take. Large institutions tend to have teams dedicated to future proofing and mitigating these risks, i'd hope and imagine it is on their radar.
I do hope that AMD gets their shit in order because we need the competition to keep this space energized.
> they will just discontinue their access to hardware
Yup - which is exactly what is going on in the cloud space right now.
Because AWS and GCP chose to innovate with their own accelerators, Nvidia heavily favoured Azure for a while. Recently, GCP seem to have capitulated somehow and so are back on the bandwagon. Oracle, of course, never had any hope of success in cloud without leaning on some form of non-technical manipulation, which is why they were the first on board with DGX Cloud.
Sadly I don't see AMD as the solution, since they too have associated themselves more with Azure than the other clouds.
> Because AWS and GCP chose to innovate with their own accelerators, Nvidia heavily favoured Azure for a while.
AWS and GCP work on competitors to Nvidia's products, so Nvidia favors Azure who is not doing that, and this is somehow Nvidia's fault or even a problem?
Looks more like Nvidia was hedging its bets in case AWS or GCP succeeded at developing competitive AI chips and then transitioned completely away from Nvidia.
It's not a problem from Nvidia's perspective, and AWS/GCP wouldn't have a right to complain - but users might reasonably conclude that this is a problem, and likewise regulators might consider it unacceptable for Nvidia to have this much market power, if they actually had the technical expertise to examine such things.
GPT4 please port my CUDA to work on AMD
A non-trivial amount of effort has gone (and is going) into this, see the Hipify tool.
Unfortunately, this does not work. I've tried.
Somewhat related: starting a new Ubuntu-pytorch-cuda project? That’ll be 10-15 gigabytes, please. Is there some way to strip down the individual deps? I imagine it’s the way that it is because the drivers and the pytorch source are probably machine generated before compilation, to some extent. Is there any hope for triton becoming a lightweight interstitial frame in the codebase stack that can just allow me to do the hw codegen after pulling, without having to rebuild completely everything, and also simultaneously allowing more accelerators?
Here is what really annoys me:
The dependencies are such a mess that even if you try to install only pytorch-cpu, at some point some random package will cause pytorch-cuda and those 10GBs to be installed.
I use Nix to manage my machine learning development environment: https://github.com/nixvital/ml-pkgs
Sure after the building the binary is HUGE. But I only have to build it once and cache it so that all my workstations and training servers can use it.
Is ROCm actually usable in this years machine learning ecosystem? Can I just drop in any PyTorch model that was developed on CUDA and expect it to work?
Is ROCm actually usable in this years machine learning ecosystem?
I don't known, as I'm only just now building out my first AMD based ML machine to run ROCm. All I can really say is that AMD really seem to be making a genuine effort to get ROCm to that level. See the two links I submitted yesterday[1][2] for more details.
The two things in particular that stand out to me from all this are:
1. They are at least publicly declaring their intention to make ROCm a player in AI/ML. Previously there was at least a perception (and quite possibly a reality) that ROCm was more focused on other HPC workloads and not really AI / ML. AMD seems committed to changing that.
2. It seems that they are finally serious about getting ROCm working on their consumer Radeon cards. Even though 5.6 didn't include the long hoped-for announcement of such support, the blog post they put out did at least officially declare their intent to do so in a release this fall. And maybe more to the point, the batch of changes in 5.6 did actually include some fixes for problems encountered running on Radeon cards, even though they aren't yet officially listed as supported.
On my projects it kind of works. Given the usual driver installing work. Depending on the card you'd have to use some env flags like HSA_OVERRIDE_GFX_VERSION to make it not crash.
On an MI250+ system or other similar architectures that mirror what El-Capitan is going to look like, ROCm is stable and there are pytorch + cupy backends for it. It mostly just works. If you have custom kernels as part of your pipeline you'd need to convert them from CUDA to HIP though.
If you're looking for something on AMD consumer cards...then you have to keep waiting.
obviously not there yet, otherwise AMD stock will be doubled.
A recent latent.space podcast with geohot discusses a lot of related information; his approach to using AMD and the challenges of getting ML to work on anything other than NVidia.
I've been waiting years and will wait more for PyTorch etc to work under Mesa rather than install RocM
Mesa is a 3D lib for GPU rendering, does it do compute like what Nvidia's GPU does? we're talking about Mesa has its own small kernels that run matrix muts in parallel at scale. As far as I can tell, Mesa at the moment is not going to work for any ML framework like pytorch etc.
Mesa advertises support for OpenCL[1], so the idea of using it as an ML backend isn't ridiculous. But I can't speak to whether or not anybody has actually tried to make that work, or where it stands.
I think OpenCL lost the battle in ML era, CUDA crashed it, followed by newcomers like SYCL and ROCm these days.
Oh yeah. I didn't mean to suggest this as something anybody would want to do for any kind of serious use. Just pointing out that, in the abstract, the idea of doing it isn't totally ridiculous.
Is this real? is ROCm not horrible to use now? Anybody using it rn, if so what card and how is it working?