AMD leaps after launching AI chip that could challenge Nvidia dominance

68 points by ek750 2 years ago · 88 comments

Reader

AMD gatekeeps this functionality behind it's non-consumer cards. They don't realize that having a consumer card and being able to develop on it is a gateway to using AMD. I can use CUDA on any Nvidia card I buy. I can't believe they are so incredibly dense on this.

65a 2 years ago

You can run inference and training on consumer AMD cards today. It works fine, including llama.cpp, stable diffusion, hugging face transformers, etc. Way cheaper for a given performance/VRAM target as well.
- thrtythreeforty 2 years ago
  
  Maybe so. But it isn't confidence inspiring when I go to see which cards are supported and I see this issue:
  https://github.com/ROCm/ROCm/issues/1714
  With Nvidia cards, I know that if I buy any Nvidia card made in the last 10 years, CUDA code will run on it. Period. (Yes, different language levels require newer hardware, but Nvidia docs are quite clear about which CUDA versions require which silicon.) I have an AMD Zen3 APU with a tiny Vega in it; I ought to be able to mess around with HIP with ~zero fuss.
  The will-they-won't-they and the rapidly dropped support is hurting the otherwise excellent ROCm and HIP projects. There is a huge API surface to implement and it looks like they're making rapid gains.
  - Symmetry 2 years ago
    
    That's from 2022. AMDs move to start generally supporting consumer cards is very recent.
    
    echelon 2 years ago
    
    Where's the official show of support? I'll believe it when I see it.
    
    Symmetry 2 years ago
    
    They're listed as supported on their website and they work. I'm not sure what there is besides that.
    https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h...
    You have to click on the "Radeon" tab for the commercial cards.
    Yes, it's annoying that they only officially support Ubuntu 22.04 but it is official support and you can get other OSs and cards to work.
  - anuraaga 2 years ago
    
    The article specifically is about AI. Don't most useful LLM models require too much RAM for consumer Nvidia cards and also often need those newer features, making it irrelevant that a G80 could run some sort of cuda code?
    I'm not particularly optimistic that ecosystem support will ever pan out for AMD to be viable but this seems to be giving a bit too much credit to Nvidia for democratizing AI development, which is a stretch.
    
    nightski 2 years ago
    
    First of all, LLMs are not the only AI in existence. A lot of ML, stats, and compute can be run on consumer grade GPUs. There are plenty of problems that aren't even applicable with an LLM.
    Second, you absolutely can run and fine tune many open source LLMs on one or more 3090s at a time..
    But being able just to tinker, learn to write code, etc.. on a consumer GPU is a gateway to the more compute focused cards.
  - zamalek 2 years ago
    
    There's a difference between officially supported, and supported. My 6900XT, an unsupported card, works just fine.
    
    thrtythreeforty 2 years ago
    
    Then they should indicate that! Putting me off from considering an AMD card for purchase is very detrimental to building a userbase.
    
    zamalek 2 years ago
    
    I 100% agree with that. The override envar (HSA_OVERRIDE_GFX_VERSION) is also buried deep in their documentation. NVIDIA is eating AMD's breakfast with GTX3060s while they are trying to peddle 7900XTs.
  - 65a 2 years ago
    
    Pretty sure my Radeon R9-285 would work if I force gfx802 offload arch when building for ROCm, but...what are you going to do with decade-old VRAM support? 2gb is not enough for anybody.
Symmetry 2 years ago

That's something that's started changing over the last few months. Official support for the RX 7900 GPUs for Linux has been added to the most recent versions of ROCm and over on the ROCm subreddit people are reporting success getting other RDNA 3 cards working. On Windows you've got consumer cards from the previous generation getting official support too.
This is, obviously, way overdue and it might not be enough to let AMD get back into the race but
latchkey 2 years ago

In my eyes, the real problem is that there is no cost effective developer access to high end cards, like the MI300x. This breaks the developer flywheel that you would normally point at consumer cards for.
Where can you rent time on one? Traditionally, AMD has only helped build super computers, like Frontier and El Capitan, out of these cards.
This time around Azure [0] and other CSP's (cloud service providers) are working to change that. I will have the best of the best of their cards/systems for rent soon.
[0] https://techcommunity.microsoft.com/t5/azure-high-performanc...
LeanderK 2 years ago

lol, this is so stupid. Don't they realise that people usually develop locally and train on a server? You don't need a super beefy GPUs to do it, so you buy nvidia. So people are used to nvidia, debug and fix bugs etc. It's not a very smart decision, looks like the decision makers have no idea what's going on.
- paulmd 2 years ago
  
  AMD’s opencl runtime also historically has an incredible number of bugs and paper features that make any sort of portability between nvidia and AMD cards quite difficult - you are running a special build and a different codepath for AMD anyway, there was no gain from using the ostensibly portable approach.
  - latchkey 2 years ago
    
    The gain is that you can't find time on NVIDIA cards right now. Decentralizing away from a single provider lowers the risk on your business significantly.
    Case in point, OpenAI closed new signups because they couldn't keep up with demand and they literally have all the resources in the world to make things happen.
riffic 2 years ago

*its
- joshstrange 2 years ago
  
  It's a possessive apostrophe: https://www.grammarly.com/blog/possessive-apostrophe/
  - skyyler 2 years ago
    
    Did you read that article you linked?
    >The important thing to remember is don’t use possessive apostrophes with any pronouns, either possessive pronouns or possessive adjectives.
    >If you see an apostrophe with a pronoun, it must be part of a contraction.
    >its—possessive adjective of it
    >it’s—contraction for “it is”
    "its" would be correct in the root comment.
    
    joshstrange 2 years ago
    
    No I didn't read it fully, I knew what the concept was called, found an article, skimmed it until I was sure it was talking about the thing I thought it was. I'm almost positive I was taught in school that "it's" is a valid possessive apostrophe case and honestly I think it's stupid that it's not. I find "its" more confusing personally.
    
    skyyler 2 years ago
    
    >I knew what the concept was called, found an article, skimmed it until I was sure it was talking about the thing I thought it was
    You just managed to summarise why having conversations with strangers is so difficult on the internet these days.
    Instead of considering that you were incorrect, even for a moment, you sought an article that you thought would confirm the ideas that you already had. Without even actually reading it, you used it as evidence that you were correct all along.
    Even though the article very clearly illustrates that you were mistaken.
    Fascinating.
    
    joshstrange 2 years ago
    
    You do understand that's not really what happened here right?
    The concept IS called possessive apostrophe and literally until this minute I wasn't aware that "it's" isn't grammatically correct when the "it" in the sentence is being used possessively. I didn't just find an article I thought agreed with me and fire it off, I thought "it's" was valid and riffic didn't know about possessive apostrophes (which again, I had wrong in this case). I didn't look for "it's" in the article because that wasn't up for debate in my mind (again, I can't stress this enough, I was wrong about the usage, any comment I've ever made uses "it's" in this case because that's how I thought it was used), I was just looking for the concept as a whole to link. I got a minor piece wrong and you want to lump me in with everyone who picks the first article that "agrees" with them.
    
    skyyler 2 years ago
    
    I'm not lumping you in with anyone, I merely commented on how you sent a link because you thought it advanced your position when it actually advanced the position of the person you were "correcting". It's a behaviour I see all the time now, from all sorts of people.
    I certainly don't think less of you specifically for doing this.
    Thanks for being open to discussion, though!
    
    metabagel 2 years ago
    
    > "I got a minor piece wrong"
    You were entirely wrong.
    I generally remember this bit of grammar as the apostrophe replacing the missing letters of the contraction (which is not needed for the possessive situation).
    
    mft_ 2 years ago
    
    Thank you for owning the mistake - you're already ahead of many people on the internet :)
    
    riffic 2 years ago
    
    just sound it out in your head. If you see an apostrophe with it's, it usually means "it is"
    Otherwise someone oopsed and you can leave a dumb grammar comment on HN
    
    HDThoreaun 2 years ago
    
    You’re thinking of its’
    
    riffic 2 years ago
    
    Please prove that is a valid construction.
  - riffic 2 years ago
    
    that's nonstandard though, lol.
    it's means it is or it has

FredPret 2 years ago

AMD needs to launch drivers, not chips. Who’s going to develop ML models on a card that can’t connect to a popular ML model?

I look at their financial performance and it’s staggering how they’ve missed the boat - and this is during a huge boom on gaming, crypto, and AI.

Compare:

https://valustox.com/AMD

https://valustox.com/NVDA

qaq 2 years ago

That is the confusing part say you need to hire 100 people at 1 mil. per year comp to get drivers to a good state. Thats 1/3 of their quarterly profit but would prob double the revenue in a few years.
- rafaelmn 2 years ago
  
  From what I've heard they aren't offering competitive salaries, so introducing people at insane comps would probably destroy existing teams(if there was no salary bumps to match)/budgets (if there was). Doubt you can do much with new hires in 1 year in such environment, by the time you start seeing results it's probably too late to capitalize on this bubble.
  More likely, they wait to see how the AI HW startups shake up and then acquire the ones that have anything worth paying for.
  - logicchains 2 years ago
    
    >From what I've heard they aren't offering competitive salaries
    Seems like a common thing in hardware companies, they chronically underpay, which for some reason hardware/electrical engineers seem to accept, but that makes them a last-choice for competent software engineers, who have much better-paying options.
    
    lesuorac 2 years ago
    
    > which for some reason
    Didn't you just explicitly say the reason? A SWE can go off and make Google in their Gargage; a EE can't make a fab in their garage.
    
    fragmede 2 years ago
    
    Even if they could make a fab, it would still be a logistical nightmare to scale from 1-1,000 users. Meanwhile, my SaaS company could have 100,000 users thanks to the cloud, and I wouldn't even have to get up from my desk.
    
    djmips 2 years ago
    
    AMD doesn't even own a fab.
    
    lesuorac 2 years ago
    
    Non-sequitor [1]. AMD not having a fab doesn't mean a EE can run a fab out of their garage.
    It may be relatively easier for people to make new chips without needing AMD/Intel (see all of fang company making their own). But it's still companies with lots of money making new chips and not people in their garage.
    [1]: https://en.wikipedia.org/w/index.php?title=Non_sequitur_(fal...
    
    p_j_w 2 years ago
    
    They own licenses to the insanely expensive software you need to build a modern IC.
  - xchip 2 years ago
    
    I heard the same too (speaking for a friend)
- wongarsu 2 years ago
  
  It's not just drivers though. Nvidia has invested close to two decades into documentation, teaching materials, developer tooling and libraries for CUDA, plus all the work on gaining mindshare.
  You could probably get 80% there by dedicating enough AMD developers to improving AMD support in existing AI frameworks and software, in parallel with improving drivers and whatever CUDA equivalent they are betting on right now. But it would need a massive concerted effort that few companies seem to be able to pull off (probably it's hard to align the company on the right goals)
- hardware2win 2 years ago
  
  1mil comp? Haha, what?
  Salaries at semico companies are not even close to this
  Also why would you even need ppl this good? People who earn 1 mil offer way, way more than just tech skills
  - logicchains 2 years ago
    
    NVidia is a semico and they pay much better than AMD: https://www.indeed.com/companies/compare/Amd-vs-Nvidia-b78a5... . That's a big part of why they're so far ahead now and their drivers are much better. If AMD wants to catch up to NVidia within a reasonable timeframe, they wouldn't just need to match it, they'd need to pay way better than NVidia to attract the best people.
    
    djmips 2 years ago
    
    Agreed - Nvidia has made software a core pillar since the nineties.
visarga 2 years ago

That's not such a big concern, LLMs run on all things today. It won't be that hard to make them work on AMD. Before 2020 we had much more architectural diversity.
- FredPret 2 years ago
  
  The last time I tried getting Tensorflow / Pytorch to work on my good (and cheap) AMD card, I literally ended up just buying an Nvidia card.
  I’m just one guy but my experience carries over to subsequent business decisions made by me, and there are many like me.
  - Workaccount2 2 years ago
    
    I'll throw my hat in the ring, I didn't buy an nvidia card, I simply gave up on trying to get it working.
    
    FredPret 2 years ago
    
    Nvidia drivers on Linux aren’t a walk in the park either, but at least there’s a way to do it.
- dharma1 2 years ago
  
  they've had a long time to get ML working as well as with CUDA and CuDNN, mostly failed attempts so far. How close are they?
bigbillheck 2 years ago

> AMD needs to....
https://www.amd.com/en/newsroom/press-releases/2023-10-10-am...
Workaccount2 2 years ago

Whats insane is that AMD has been known for it's shit drivers for over a decade now...and nothing has happened to address this. Like surely everyone internally knows it, all the execs know it, the board knows it, investors know it...but somehow it has never been addressed.
At this point it's almost like it has to be intentional, like some perceived tradeoff ingrained in the culture that generates shit software.
- logicchains 2 years ago
  
  >At this point it's almost like it has to be intentional, like some perceived tradeoff ingrained in the culture that generates shit software.
  They're underpaying their hardware engineers, and if they wanted to hire good software engineers they'd need to pay more, which would cause their hardware engineers to demand better pay too.
  - FredPret 2 years ago
    
    The price of underpaying their employees like this can be seen very clearly in their revenue graph.
machinekob 2 years ago

They are not only missed AI boom, they are also seems to be overpriced by a huge margin compared to Intel or even NVDA.

brucethemoose2 2 years ago

Wait... this headline totally wrong.

This is the actual source[1]:

> The AMD Instinct M1300A APU was launched in January 2023 and blends a total of 13 chiplets, of which many are 3D stacked, creating a single chip package with 24 Zen 4 CPU cores fused with a CDNA 3 graphics engine and eight stacks of HBM3 memory totaling 128GB.

Its literally a typo (or renamed SKU?) for the MI300A. So... the street is jumping on AMD because of a typo echoed by a ton of outlets?

https://www.datacenterdynamics.com/en/news/genci-upgrades-ad...

jacoblambda 2 years ago

It is a typo but in a different way. They are referring to the MI300X (mentioned again later).
The discussion on the MI300X was on HN like 12 hours ago (after the AMD announcement event yesterday):
https://news.ycombinator.com/item?id=38550271
https://www.youtube.com/watch?v=tfSZqjxsr0M
- brucethemoose2 2 years ago
  
  That's technically not new either?
  Are they just talking about MI300X availability?
  - jacoblambda 2 years ago
    
    It very much is though. the MI300X may have been known about prior to the event but specs and performance were all under embargo until December 6.

w-m 2 years ago

From yesterday: AMD MI300 performance – Faster than H100, but how much?

https://news.ycombinator.com/item?id=38550271

xnx 2 years ago

> Advanced Micro Devices shares were marked 2% higher in premarket trading

2% is a "leap"?

It looks like NVDA is up ~1.5% since yesterday.

jraby3 2 years ago

They are up 6% today which is significant.

DeathArrow 2 years ago

Do they have a good CUDA alternative?

no_wizard 2 years ago

Can't speak to how it compares to CUDA however they are developing ROCm[0]
[0]: https://www.amd.com/en/products/software/rocm.html
- zamalek 2 years ago
  
  I can attest that it works really well on my 6900xt. Compiling CUDA kernels is merely a matter of a using a #define shim. Also, provided you download the ROCm pytorch (and force compatibility with the HSA_OVERRIDE_GFX_VERSION) everything just works.
- echelon 2 years ago
  
  Next to nothing is written that targets this. Not Stable Diffusion, not RVC, not Vall-E, not Tacotron, not Tortoise, ...
  Maybe the LLM space is better about this, but the generative media side definitely isn't.
  AMD has a market share of 0% here, and nobody publishes models with AMD support.
  - Symmetry 2 years ago
    
    The things is you can actually run Stable Diffusion.
    And I got PyTorch working on my AMD 7900 XT graphics card recently, though it was a bit of a hassle to do so.
    
    godshatter 2 years ago
    
    You can also run Stable Diffusion in cpu mode, if you don't mind it being slower. I have an NVIDIA card but it's not powerful enough to run it. I'm on Ubuntu.
    
    echelon 2 years ago
    
    > though it was a bit of a hassle to do so.
    Incredible understatement. And the diverse set of community tools also breaks down.
    We're still a year or more out from proper AMD support in the ecosystem.
  - coffeebeqn 2 years ago
    
    Could there be a compiler/transpiler from CUSA to whatever AMD is pushing ?
    
    puzzlingcaptcha 2 years ago
    
    That's what https://github.com/ROCm/HIPIFY is (as a part of ROCm)
    
    zamalek 2 years ago
    
    I think it's llama.cpp that simply #defines all cuda_ functions to rocm_ (99% name-name). Porting seems to be that trivial.
dekken_ 2 years ago

it's called hip, and it's mostly the same
AMD have their own thrust gpu impls, so from a high level they are somewhat interchangeable
2OEH8eoCRo0 2 years ago

I don't know, but couldn't people use LLMs to drastically lower the cost of switching? Converting a codebase to use a different platform doesn't require creativity.
- gumballindie 2 years ago
  
  I think we shouldnt even try, because LLMs will simply design new hardware and software all by its own. All we need to do is sit and watch, while collecting UBI.
machinekob 2 years ago

No

TheBigSalad 2 years ago

It barely moved. This stock goes up and down 5% all the time. It was at the current price a week ago.

ChrisArchitect 2 years ago

Some more discussion yesterday: https://news.ycombinator.com/item?id=38548330

cloudengineer94 2 years ago

They gotta keep working on the software side of things like drivers, FSR and more.

AMD is so far behind on this.

In the other hand though, their 3D Cache chips are amazing

bryanlarsen 2 years ago

On Linux, AMD drivers are far ahead of Nvidia. Switching from a 970 to a 3750 fixed a bunch of visual glitches for me.

riffic 2 years ago

msn.com, really?

ek750OP 2 years ago

what would you prefer? Would WSJ or other paywalled sites make you happy?
Don't make me start submitting aol links.

machinekob 2 years ago

This time AMD will win for sure (Copium)

mupuff1234 2 years ago

AMD market cap is bigger than Intel.

gafage 2 years ago

According to the news and tech blogs AMD is always ahead of nvidia in all regards. But then, you have real life...

eganist 2 years ago

> According to the news and tech blogs AMD is always ahead of nvidia in all regards. But then, you have real life...
Do you have examples of both sides of this claim?
- gafage 2 years ago
  
  One side of this claim: https://www.msn.com/en-us/lifestyle/shopping/amd-leaps-after...
  The other side of this claim: sales numbers of GPUs.
  - eganist 2 years ago
    
    > One side of this claim: https://www.msn.com/en-us/lifestyle/shopping/amd-leaps-after...
    The stated claim was "According to the news and tech blogs AMD is always ahead of nvidia in all regards." Which necessitates sources that aren't the parent article being discussed. Because "always" and "in all regards" were invoked, not just AI
    > The other side of this claim: sales numbers of GPUs.
    Your claim "But then, you have real life..." implies that the news sites you didn't cite are wrong in claiming AMD technical supremacy, not that sales numbers are off.
    To restate the question: do you have news references stating that AMD is always and in all regards ahead of nvidia? And then do you have real life data points to prove that said news sites are wrong?

Settings

AMD leaps after launching AI chip that could challenge Nvidia dominance

Keyboard Shortcuts