How Graphics Cards Work
extremetech.comAnyone have a better article addressing the question in the title? This one felt more like a grab bag of basic terminology with a little bit of history thrown in - less how they work and more of how to compare specs of related cards.
As an aside, untextured polygons is making a huge comback as an art style. I can't think of it even having been common in early era of GPU's like the article suggests. I personally really like the look but it could end up over used just like other lofi styles. Strangely it's referred to as "low poly" even though the defining feature is lack of textures and often the poly count is fairly high, it's just not hidden from view.
It's quite misleading to call GPU's SIMD lanes "cores" and then compare the core count to actual CPU cores (that have SIMD as well).
In fact it's often mentioned that the reason GPUs are fast is that they have "a lot of dumb cores". It's true, but that's somewhat unrelated to the SIMD width.
One of the big reason why you can get away with dumb cores is that since you have so many independent threads (actual threads with independent PC), you can go with super-wide SMT (up to say 64 instead of 2 for intel HT).
This way you keep a deep pipeline full cheaply (by issuing from each thread in turn), while a CPU has to scramble and spend power/area to do so (branch pred., OoO, ...). Well, I guess the huge register files are not quite cheap but still.
The barrel processor style is to me a way more defining characteristic of GPU vs CPU design than the SIMD capabilities.
Then of course, wide SIMD on top of that is good for flops.
A minor correction: as far as I know, none of the modern GPUs or CPUs by Intel, AMD, or Nvidia are barrel processors. The Xeon Phis were the only exception, but I don't know whether they kept that part of the design after KNC.
edit: This document [1] on pages 26-28 describes both execution on pascal and volta GPUs and how they differ. AMD/Intel GPUs are something similar.
[1] http://images.nvidia.com/content/volta-architecture/pdf/volt...
I guess I think of SMT as the logical successor to barrel in a way : as long as you can store more state and have independent instruction streams you get a full(er) pipeline.
You're right though that GPU schedulers are more advanced that plain round robin since they work around stalls and issue insns from multiple thread / cycle when the backend can take it.
I have no experience with KNC/KNL, but I wish I did :-)
> I have no experience with KNC/KNL, but I wish I did :-)
No you don't, believe me.
To add to this: "Why CUDA 'Cores' Aren't Actually Cores, ft. David Kanter" (Gamers Nexus, 2018 April 18, 18 min.) https://youtu.be/x-N6pjBbyY0
The ads on this make mobile reading near impossible. It's a shame ads are still this invasive.
adblock