Libcu++: Nvidia C++ Standard Library

226 points by andrew3726 5 years ago · 138 comments

Reader

fanf2 5 years ago

“Whenever a new major CUDA Compute Capability is released, the ABI is broken. A new NVIDIA C++ Standard Library ABI version is introduced and becomes the default and support for all older ABI versions is dropped.”

https://github.com/NVIDIA/libcudacxx/blob/main/docs/releases...

MichaelZuo 5 years ago

It’s interesting that they use the word to broken to describe incompatible machine code. Well if the code is recompiled for each new version then it’s different from the old machine code, that’s by definition. Does any major software vendor support older versions of the ABI or machine code?
- geofft 5 years ago
  
  > Does any major software vendor support older versions of the ABI or machine code?
  Yes, this is extraordinarily common. The ABI is an interface, a promise that new versions of the machine code for a library can both be used by binaries compiled against the old one. There's new machine code, but there's no "by definition" of whether they make this promise or not.
  glibc (and the other common libraries) on basically all the GNU/Linux distros does this: that's why it's called "libc.so.6" after all these years. New functions can be introduced (and possibly new versions of functions, using symbol versioning), but old binaries compiled against a "libc.so.6" from 10 years ago will still run today. (This is how it's possible to distribute precompiled code for GNU/Linux, whether NumPy or Firefox or Steam, and have it run on more than a single version of a single distro.)
  Apple does the same thing; code linked against an old libSystem will still run today. Android does the same thing; code written to an older SDK version will still run today, even though the runtime environment is different.
  Oracle Java does the same thing: JARs built with an older version of the JDK can load in newer versions.
  Microsoft does this at the OS level, but - notably - the Visual C++ runtime does not make this promise, and they follow a similar pattern to what Nvidia is suggesting. You need to include a copy of the "redistributable" runtime of whatever version (e.g. MSVCR71.DLL) along with your program; you can't necessarily use a newer version. However, old DLLs continue to work on new OSes, and they take great pains to ensure compatibility.
  - aronpye 5 years ago
    
    Excellent comment, I was wondering how glibc handled backwards compatibility.
    Is symbol versioning an ELF object file thing, or is it more universal than that?
    
    geofft 5 years ago
    
    Almost all of the time, they do it via just adding new features and not breaking old ones.
    But yeah, GNU/Linux and Solaris both have symbol versioning as part of ELF (I'm not sure if other executable formats have it; it doesn't actually require very much out of the format). The approach, roughly, is that each symbol in the file is named something like "memcpy@GLIBC_2.2.5", and if you see symbol versions in the library you're linking against, you include those references. The dynamic linker is also smart enough to resolve unqualified symbols against some default version the library specifies. This is important for backwards-compatibility, for the ability for distros to add symbol versions when upstream doesn't have them yet, and for things like dlsym("memcpy") keeping working. When they make a backwards-incompatible change (e.g., old memcpy supports overlapping ranges, new memcpy does not promise to do the right thing and you need to use memmove instead), they add a new version (e.g., "memcpy@GLIBC_2.14"). Anything compiled against the newer library will reference the new version, but an implementation of the old version still sticks around for older functions.
    And yes, there were older versions before libc.so.6 - libc.so.5 was used, I think, in the early 2000s, but they've avoided changes since then. (The approach used there is that you can install both of them on a single system, but "libc.so" symlinks to one of them, and that name is used when you compile code. When you run gcc -lfoo, it looks libfoo.so, but if the library has a header saying its "real" name, called its "SONAME", is libfoo.so.1, the compiled program looks for libfoo.so.1 and not libfoo.so.) Now you only have to have a single glibc version and it works with many years of updates.
    
    jcelerier 5 years ago
    
    ELF: https://lists.debian.org/lsb-spec/1999/12/msg00017.html
- haberman 5 years ago
  
  > Does any major software vendor support older versions of the ABI or machine code?
  The C++ Standards Committee has been prioritizing ABI compatibility at the cost of performance for the last decade or so (mostly in the standard library, as opposed the language itself, as I understand it). Some people (especially people from Google) have been arguing that this is the wrong priority, and that C++ should be more willing to break ABI. See:
  https://cppcast.com/titus-winters-abi/
  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p186...
  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p213...
  Disclosure: I work at Google with several of the people advocating for ABI breaking changes.
  - blelbach 5 years ago
    
    You'll notice some people from NVIDIA are authors on those papers too! :)
- my123 5 years ago
  
  Note here that your binaries will continue to run even on future driver versions - and future hardware - that's what PTX is for, as the standard libraries are statically linked in.
  It's just your object files that aren't compatible, so that you can't mix and match libraries built with different CUDA versions into the same binary.
  - blelbach 5 years ago
    
    Yep, this is a good summary (good enough that perhaps I should put something similar in the docs).
- londons_explore 5 years ago
  
  Famously Microsoft does with Windows. That's how an exe file from 25 years ago can still run today.
  - moonchild 5 years ago
    
    Yes, but GPU architecture changes very frequently.
    Shaders from 15 years ago still work, but they're compiled on-the-fly to a GPU-dependent format. I expect you don't want to have to recompile an entire c++ stdlib every time you recompile your own code.
    
    blelbach 5 years ago
    
    > I expect you don't want to have to recompile an entire c++ stdlib every time you recompile your own code.
    That's basically our current model, I discussed this on Twitter recently.
    https://twitter.com/blelbach/status/1307396914057326592
  - higerordermap 5 years ago
    
    Do they use some kind of ABI versioning?
  - formerly_proven 5 years ago
    
    Running 32 bit x86 code on a AMD64 machine is possible on most operating systems which supported both of these, and has probably more to do with AMD64 supporting that execution model.
    
    londons_explore 5 years ago
    
    Try that on Linux and you'll find most libraries no longer have the same entry points and that various data structures have changed leading to fun fun crashes...
    The kernel itself has maintained (mostly) ABI compatibility though.
    
    formerly_proven 5 years ago
    
    That's a "you're holding it wrong" problem, though. Projects like GTK or Qt never claimed they'd be backwards-compatible 26 years (Qt has specific backwards-compatibility API and ABI guarantees and are in my experience pretty diligent about it), so if you want a binary to work for a long time, you have to ship your own versions of these. Libraries like Xlib on the other hand are very stable and much more similar to the Win32 API in that respect. In theory Linux has versioning for libraries, in practice it is never used correctly and useless anyway, since distros generally only keep around one version of everything, so even if you'd link against a specific version (e.g. libfoobar.so.2.21 instead of libfoobar.so.2, which will break if you don't recompile and/or patch the source), it wouldn't exist _anyway_ after a few updates. And that's mostly because distros never promised you'd be able to run binaries built outside their packaging infrastructure anyway; it being common practice and sometimes working doesn't imply it's guaranteed to work.
    Hence why C applications only linking these "basic" libraries (libc, Xlib, zlib, ...) are regarded as so stable and portable, because they're built and linked against system components which rarely change. (Keep in mind to build this kind of binary on ancient systems, otherwise glibc will make sure it won't work everywhere).
    
    XorNot 5 years ago
    
    This is one of those things it feels like all the content addressable initiatives should be able to solve somehow. With near ubiquitous internet access, why can't a program ship with a list of standard library hashes it'll link against and my distro go fetch them from IPFS or whatever if they're not local.
    
    patrec 5 years ago
    
    Nix basically already does this, apart from the decentralised distributed cache (there is a centralised one and you can easily set up your own, too). All references, including to dynamically linked libraries are via unique, content addressable hash -- where "content" currently still happens to be content of the build recipe and all dependencies and sources, recursively, not the built artefact. There is work on referencing artefacts by the binary output hash though, because that obviously has better security properties when you want to have a non-centralized cache; the main problem is that a lot of software still has no reproducible build.
    
    pantalaimon 5 years ago
    
    The solution for this is now Docker, Flatpack, Snap, …
    Just ship the whole environment and only rely on the stable kernel API.
    
    surajrmal 5 years ago
    
    This is very wasteful. On servers in the cloud that may be a reasonable approach, but there are still devices that are memory, storage, and/or network constrained enough where it's not. It's still necessary to have relatively stable interfaces such that most things can share the same version of a dependency and their still exists the ability to deduplicate the dependencies between different programs. I do agree that the current OS approaches to handle this are not great and there is room for new models, but docker containers are not a holistic solution.
    
    formerly_proven 5 years ago
    
    The Windows Component Store (aka Side-by-Side aka WinSxS) is sort of a content-addressed store for DLLs and the like, except the content-addressing isn't facilitated by a literal cryptographic hash over the contents, but instead by the logical identity of the component (name+version but more). And, it doesn't fetch anything automatically. Writes to certain paths are just intercepted and redirected into it, while storing an association somewhere else that the app that did that (or was installed doing that) wants that particular component (or at least, that's how I think that works).
    
    rkeene2 5 years ago
    
    This is what AppFS does, as well as CernVM-FS, though AppFS has more features
- blelbach 5 years ago
  
  We change the mangling of all the symbols by changing the inline namespace that they are in, regardless of whether or not functional ABI breaks occurred. That's why it says the ABI is broken on major releases. We do this to try and loudly break people who are trying to depend on ABI stability, instead of silently failing them.
quotemstr 5 years ago

There should be no expectation of C++ ABI compatibility. Do you want your system to be ABI compatible or do you want it to evolve? You can't have both. You have to pick one. I favor evolution.
- retrac 5 years ago
  
  A properly designed ABI is capable of expansion. The design risk is not so much being backed into a corner, as just accumulating a great deal of obsolete cruft over the years/decades.
  Win32 is a great example of this. It has been extensively overhauled, and best practice for writing a new application today is quite different from 25 years ago, but unmodified Windows 95 applications still usually run correctly.
  - quotemstr 5 years ago
    
    The C++ standard library is a bit different from a platform API. The C++ standard library is a template system designed to conform exactly to your particular program. It should always be statically linked. I see zero advantage to dynamic linking of libc++. A platform API, on the other hand, is designed to stable, safe, and flexible, not fast --- because programs in general shouldn't be calling into the system so much. The performance costs we accept in a platform API (e.g., user to kernel privilege transition) would be totally unacceptable as part of an STL data structures implementation.
    
    pjmlp 5 years ago
    
    Except when OSes happen to be written in C++, like most modern non-UNIX/POSIX clones, and security updates come into play.
    
    quotemstr 5 years ago
    
    The OS implementation language is irrelevant: there's no reason that applications and the OS core need to use the same C++ standard library.

lionkor 5 years ago

> Promising long-term ABI stability would prevent us from fixing mistakes and providing best in class performance. So, we make no such promises.

Wait NVidia actually get it? Neat!

matheusmoreira 5 years ago

This is an awesome quote... Same argument used by the Linux kernel developers.

lars 5 years ago

It really is a tiny subset of the C++ standard library, but I'm happy to see they're continuing to expand it: https://nvidia.github.io/libcudacxx/api.html

shaklee3 5 years ago

Nvidia has had many members on the c++ standards committee for a while.
- blelbach 5 years ago
  
  I'm not sure how making a large contribution to the C++ Standard is a problem?
  - shaklee3 5 years ago
    
    Sorry, that was not my intent. I was pointing out that Nvidia has made significant contributions already to the c++ standard, so this is not the first thing they've done.
roel_v 5 years ago

Yeah, really tiny... At first I thought 'wow this is a game changer', but then I looked at your link and thought 'what's the point?'. Can someone explain what real problems you can solve with just the headers in the link above?
- jpz 5 years ago
  
  I guess that the point is that when writing CUDA code (which looks like C++), you can use these libraries which are homogenous with CPU code.
  Looking at the functions, chrono/barrier etc require CPU level abstractions, so using the STL versions (which are for the CPU) aren't going to work really.
- happyweasel 5 years ago
  
  It runs on the GPU?
  - roel_v 5 years ago
    
    What runs on the gpu?
    
    jcelerier 5 years ago
    
    this library
- blelbach 5 years ago
  
  https://youtu.be/75LcDvlEIYw
  https://youtu.be/VogqOscJYvk
- TillE 5 years ago
  
  I would have expected the <algorithm> header, but instead...synchronization primitives? std::chrono? I'm completely baffled about how that would be useful, but that's probably because I know very little about CUDA.
  - blelbach 5 years ago
    
    GPUs are parallel processors. So, yes, synchronization primitives are the highest priority.
    We focused on things that require /different/ implementations in host and device code.
    The way you implement std::binary_search is the same in host and device code. Sure, we can stick `__host__ __device__` on it for you, but it's not really high value.
    Synchronization primitives? Clocks? They are completely different. In fact, the machinery that we use to implement both the synchronization primitives and clocks has not previously been exposed in CUDA C++.
blelbach 5 years ago

Today, you can use the library with NVCC, and the subset is small. We'll be focusing on expanding that subset over time.
Our end goal is to enable the full C++ Standard Library. The current feature set is just a pit stop on the way there.

RcouF1uZ4gsC 5 years ago

For everyone wondering where are all the data structures and algorithms, vector and several algorithms are implemented by Thrust. https://docs.nvidia.com/cuda/thrust/index.html

Seems the big addition of the Libcu++ to Thrust would be synchronization.

blelbach 5 years ago

Yep, that's correct. My team develops Thrust, CUB, and libcu++.

davvid 5 years ago

Here's a somewhat related talk from CppCon '19: "The One-Decade Task: Putting std::atomic in CUDA"

https://www.youtube.com/watch?v=VogqOscJYvk

jlebar 5 years ago

This is super-cool.

For those of us who can't adopt it right away, note that you can compile your cuda code with `--expt-relaxed-constexpr` and call any constexpr function from device code. That includes all the constexpr functions in the standard library!

This gets you quite a bit, but not e.g. std::atomic, which is one of the big things in here.

BoppreH 5 years ago

Unfortunate name, "cu" it's the most well known slang for "anus" in Brazil (population: 200+ million). "Libcu++" is sure to cause snickering.

unrealhoang 5 years ago

It’s penis in Vietnamese (pop. 80M), I guess people don’t really care since tech language is usually English
blelbach 5 years ago

"cu" is a pretty common prefix for CUDA libraries. cuBLAS, cuTENSOR, CUTLASS, CUB, etc.
It gets worse if you try to spell libcu++ without pluses:
libcuxx libcupp (I didn't hate this one but my team disliked it).
We settled on `libcudacxx` as the alphanumeric-only spelling.
jcampbell1 5 years ago

These things never seem to matter even in English. How many times have you heard someone say “I don’t like Microsoft”, followed by “that’s what she said”.
- matheusmoreira 5 years ago
  
  That joke appears in a lot of Microsoft memes though. Not sure if posting some is appropriate here. Probably not.
gumby 5 years ago

cu is, or was back in the day, a standard Unix utility (call up) — connect to another machine via modem.
It doesn’t appear to be in Ubuntu any more but still in openbsd, netbsd, and macos!
You can’t win win these namespace collisions: I have friends whose names are obscenities in other languages I speak.
CyberDildonics 5 years ago

Wait until you see the namespace the standard library is under.
Although maybe short words that are slang in languages different from what something was written in aren't a big deal.
amelius 5 years ago

"CU" is also an abbreviation of "see you". I don't think it causes much awkwardness, but I could be wrong.
- ufo 5 years ago
  
  As a Brazilian, I can confirm that we chuckle whenever we see someone use that word :)
NullPrefix 5 years ago

This only affects developers. Limited scope.
Wasn't there something related about Microsoft Lumia phones?
- virgulino 5 years ago
  
  Unix users have "cu". Do "man cu", if you are curious. I haven't played with "cu" since the UUCP email era. Good times.
  - moonchild 5 years ago
    
    Doesn't exist on my system, but is at https://linux.die.net/man/1/cu
- nonbirithm 5 years ago
  
  Or how Siri means "buttocks" in Japanese?
  - jki275 5 years ago
    
    It’s oshiri, not Siri.
    
    nonbirithm 5 years ago
    
    Actually the 'o' is a modifier for politeness. It seems there's a version of the word without an 'o'.
    https://jisho.org/search/shiri
    
    jki275 5 years ago
    
    I'm aware... But the word isn't pronounced the same anyway, and I've never heard it used without the o either.
- kitd 5 years ago
  
  cf. the Vauxhall Nova car
  "No va" means "doesn't go" in Spanish.
  - geofft 5 years ago
    
    Customers either didn't make that association or didn't care: https://www.snopes.com/fact-check/chevrolet-nova-name-spanis...
  - sterwill 5 years ago
    
    I think it's unlikely that Spanish speakers would have been confused about the word "nova" when used as a car name. In Spanish "nova" describes the same astronomical event we call a "nova" in English: a new light in the sky. Additionally Spanish "nuevo" and English "new" seem to share the same root. My point is these words all mean similar things to English- and Spanish-speaking car buyers.
  - andrepd 5 years ago
    
    Also Hyundai Kona, "cona" means "cunt" or "pussy" in Portuguese.
    
    FridgeSeal 5 years ago
    
    Wow, Kona Bikes [0] must have a fun time in Portugal then..
    [0] https://konaworld.com/
    
    fullstop 5 years ago
    
    I wonder how kona coffee sells over there.
nitrogen 5 years ago

Do chemists have similar problems working with copper, whose chemical symbol is Cu?
- matheusmoreira 5 years ago
  
  Probably not. I heard a few jokes during high school and that's it. Not even that funny. I remember my class had a lot more fun with iron(II) hydroxide: when the compound's name is pronounced in portuguese it sounds like the teacher is threatening to screw over two students.
gswdh 5 years ago

In all honesty, out of the combinations for two and three letter acronyms there’s bound to be a language out the there where the meaning is crude. I recall on here recently, something being rude in Finnish or Swedish. We’re professionals, it’s just a name, who cares.

einpoklum 5 years ago

1. How do we know what parts of the library are usable on CUDA devices, and which are only usable in host-side code?

2. How compatible is this with libstdc++ and/or libcu++, when used independently?

I'm somewhat suspicious of the presumption of us using NVIDIA's version of the standard library for our host-side work.

Finally, I'm not sure that, for device-side work, libc++ is a better base to start off of than, say, EASTL (which I used for my tuple class: https://github.com/eyalroz/cuda-kat/blob/master/src/kat/tupl... ).

...

partial self-answer to (1.): https://nvidia.github.io/libcudacxx/api.html apparently only a small bit of the library is actually implemented.

blelbach 5 years ago

> apparently only a small bit of the library is actually implemented.
Yep. It's an incremental project. But stay tuned.
> I'm somewhat suspicious of the presumption of us using NVIDIA's version of the standard library for our host-side work.
Today, when using libcu++ with NVCC, it's opt-in and doesn't interfere with your host standard library.
I get your concern, but a lot of the restrictions of today's GPU toolchains comes from the desire to continue using your host toolchain of choice.
Our other compiler, NVC++, is a unified stack; there is no host compiler. Yes, that takes away some user control, but it lets us build things we couldn't build otherwise. The same logic applies for the standard library.
https://developer.nvidia.com/blog/accelerating-standard-c-wi...
> Finally, I'm not sure that, for device-side work, libc++ is a better base to start off of than, say, EASTL (which I used for my tuple class: https://github.com/eyalroz/cuda-kat/blob/master/src/kat/tupl... ).
We wanted an implementation that intended to conform to the standard and had deployment experience with a major C++ implementation. EASTL doesn't have that, so it never entered our consideration; perhaps we should have looked at it, though.
At the time we started this project, Microsoft's Standard Library wasn't open source. Our choices were libstdc++ or libc++. We immediately ruled libstdc++ out; GPL licensing wouldn't work for us, especially as we knew this project had to exchange code with some of our other existing libraries that are under Apache- or MIT-style licenses (Thrust, CUB, RAPIDS).
So, our options were pretty clear; build it from scratch, or use libc++. I have a strict policy of strategic laziness, so we went with libc++.
- justicezyx 5 years ago
  
  How this library works?
  There appears a llvm libcxx bundled in as part of the repo. What's the purpose of that libcxx?
  - blelbach 5 years ago
    
    That involves a few diagrams, but essentially, we have two layers:
    - the libcu++ layer, which has some of our extensions and implementations specific to our platform. - the libc++ layer, which is a modified upstream libc++.
    A header in the libcu++ layer defines the libc++ internal macros in a certain way, and then includes the applicable libc++ header.
    This is the current architecture, but we're moving away towards a more integrated approach where almost everything is in the libc++ layer.

Mr_lavos 5 years ago

Does this mean you can do operations on struct's that live on the GPU hardware?

shaklee3 5 years ago

You have been able to do that for a long time with UVA.
- blelbach 5 years ago
  
  Since Unified Memory. UVA, or Unified Virtual Addressing, just ensured that a GPU-private object wouldn't have the same address as a CPU-private object.
  - shaklee3 5 years ago
    
    You're right, sorry. Mixing up terms.
    
    blelbach 5 years ago
    
    Not your fault, we don't make it easy. The acroynms are terrible! That's why I typically spell out the full term.
    My first week at NVIDIA:
    Me, to very senior engineer: something something UVM.
    Very senior engineer: What's UVM?
    Me: Unified Virtual Memory.
    Very senior engineer: Don't call it that, call it Unified Memory, no abbreviation. TLAs are evil.
    Me: What's TLA?
    Very senior engineer: Three letter acronym.

gj_78 5 years ago

I really do not understand why a (very good) hardware provider is willing to create/direct/hint custom software for the users.

Isn't this exactly what a GPU firmware is expected to do ? Why do they need to run software in the same memory space as my mail reader ?

blelbach 5 years ago

NVIDIA employs more software engineers than hardware engineers.
> Why do they need to run software in the same memory space as my mail reader ?
It is a lot more expensive to build functionality and fix bugs in silicon than it is to do those same things in software.
At NVIDIA, we do as much as we possible can in software. If a problem or bug can be solved in software instead of hardware, we prefer the software solution, because it has much lower cost and shorter lead times.
Solving a problem in hardware takes 2-4 years minimum, massive validation efforts, and has huge physical material costs and limitations. After it's shipped, we can't "patch" the hardware. Solving a problem in software can sometimes be done by one engineer in a single day. If we make a mistake in software, we can easy deploy a fix.
At NVIDIA we have a status for hardware bugs called "Won't Fix, Fix in Next Chip". This means "yes, there's a problem, but the earliest we can fix it is 2-4 years from now, regardless of how serious it is".
Can you imagine if we had to solve all problems that way? Wait 2-4 years?
On its own, our hardware is not a complete product. You would be unable to use it. It has too many bugs, it doesn't have all of the features, etc. The hardware is nothing without the software, and vice versa.
We do not make hardware. We make platforms, which are a combination of hardware and software. We have a tighter coupling between hardware and software than many other processor manufacturers, which is beneficial for us, because it means we can solve problems in software that other vendors would have to solve in hardware.
> I really do not understand why a (very good) hardware provider is willing to create/direct/hint custom software for the users.
Because we sell software. Our hardware wouldn't do anything for you without the software. If we tried to put everything we do in software into hardware, the die would be the size of your laptop and cost a million dollars each.
You wouldn't buy our hardware if we didn't give you the software that was necessary to use it.
> Isn't this exactly what a GPU firmware is expected to do ?
Firmware is a component of software, but usually has constraints that are much more similar to hardware, e.g. long lead times. In some cases the firmware is "burned in" and can't be changed after release, and then it's very much like hardware.
Const-me 5 years ago

> Isn't this exactly what a GPU firmware is expected to do?
The source data needs to appear on the GPU somehow. Similarly, the results computed on GPU are often needed for CPU-running code.
GPUs don’t run an OS and are limited. They can’t possibly access file system, and many useful algorithms (like PNG image codec) is a poor fit for them. Technically I think they can access source data directly from system memory, but doing that is inefficient in practice, because GPUs have a special piece of hardware (called copy command queue in d3d12, or transfer queue in Vulcan) to move large blocks of data over PCIe.
That library implements an easier way to integrate CPU and GPU pieces of the program.
dahart 5 years ago

What do you mean about running in the same memory space? Your operating system doesn’t allow that. Is your concern about using host memory? This open source library doesn’t automatically use host memory, users of the library can write code that uses host memory, if they choose to.
How would a firmware help me write heterogeneous bits of c++ code that can run on either cpu or gpu?
- blelbach 5 years ago
  
  > What do you mean about running in the same memory space? Your operating system doesn’t allow that. Is your concern about using host memory?
  Actually, the basis of our modern GPU compute platform is a technology called Unified Memory, which allows the host and device processor to share access to memory spaces. We think this is the way going forward.
  Of course, there's still the process isolation provided by your operating system.
- gj_78 5 years ago
  
  IMHO, the question is not that we need code to run on CPUs and GPUs , we do need that, The question is whether the GPU seller has to control both sides. Until I buy a CPU from nvidia I want to keep some kind of independence.
  When will we be able to use a future riscv-64 CPU with an nvidia GPU ? we will let the answer to nvidia ?
  - blelbach 5 years ago
    
    > IMHO, the question is not that we need code to run on CPUs and GPUs , we do need that, The question is whether the GPU seller has to control both sides.
    The question is not about running code on CPUs, or running code on GPUs. It's about running code on both CPUs and GPUs at the same time. It's about enabling the code on the CPU and the code on the GPU to seamlessly interoperate with each other, communicate with each other, move objects and data to and from each other.
    Who do you expect to make that happen?
    > Until I buy a CPU from nvidia I want to keep some kind of independence
    You can buy a CPU from NVIDIA, check out our Tegra systems. We also sell full systems, like DGX platforms, which use a 3rd party CPU.
    > When will we be able to use a future riscv-64 CPU with an nvidia GPU ? we will let the answer to nvidia ?
    Who else would answer this question?
    Okay, you want to use <insert some future CPU> with our GPU.
    Who is going to design and build the interconnect between the CPU and the GPU?
    Who is going to provide the GPU driver?
    The CPU manufacturer? Why would they do that? They don't make any money from selling NVIDIA products. Why should they invest effort in enabling that?
  - dahart 5 years ago
    
    You can use this library to write code that runs on both risc-v and a GPU! You seem to be pretty confused about what this library is. It’s not exerting any control. It’s open source! It’s strictly optional, and it only allows developers to do something they actually want, to write code that will compile for any type of processor that a modern c++ compiler can target.
    
    gj_78 5 years ago
    
    Again, I see what you mean. I am even against nvidia advising the developers to use such or such C++ library (be it GNU). It is not their role to do that. We need smarter and more shining GPUs from nvidia, not software.
    I would say .... The hardware must be sold independently of the software ... but it is a bit too complex, I know.
    
    dahart 5 years ago
    
    I'm not understanding your point at all. You don't think developers should be able to write C++ code for the GPU?
    What do you even mean about 'it is not their role to do that.' and 'hardware must be sold independently of the software'?? Why are you saying this? Software interfaces are critical for all GPUs and all CPUs, just ask AMD & Intel. There is no such thing as CPU or GPU hardware independent of software. Plus, the specific library here is being sold independently of the hardware, it is doing exactly what you say you want, it's separate and doesn't require having any other nvidia hardware or software. (I can't think of any good reasons to use it without having some nvidia hardware, but it is technically independent, as you wish.)
    
    gj_78 5 years ago
    
    > You don't think developers should be able to write C++ code for the GPU?
    To be clear, I don't think nvidia-paid developers should be able to write C++ Code for a nvidia-sold GPU. The world will be better if any developer (paid by nivida or not) is able to write code for any GPU (sold by nvidia or not). It is not nvidia role to say how or when software will be written. Their hardware is good and that's more than OK.
    AI/CUDA code written specifically for nvidia is useless/deprecated in the long term. A lot of brain waste.
    
    jki275 5 years ago
    
    That doesn’t make any sense.
    You’re free to write whatever you want. This is Nvidia providing interfaces to their hardware for those of us who don’t want to write them for ourselves.
    It’s a gift. Take it or don’t. How in the world you can say Nvidia shouldn’t be allowed to write software for their GPUs makes no sense at all. Should the government stop them? Any developer can write anything they want - but Nvidia is obviously going to support their own hardware. How does it make any sense otherwise?
    All code is “deprecated in the long term” for a long enough “long term”. That doesn’t equal useless. Your comment is nonsensical.
    
    blelbach 5 years ago
    
    > It’s a gift.
    I wouldn't say it's a gift, though; it's part of what you pay for when you buy one of our products.
    Sure, it's not listed as a spec on the box, but users expect that we're going to provide them with a good software stack and support it.
    
    blelbach 5 years ago
    
    > To be clear, I don't think nvidia-paid developers should be able to write C++ Code for a nvidia-sold GPU.
    I'm not sure what you're saying here? You think another company or organization should write all the software for our hardware?
    I don't think you understand the semiconductor industry.
    Our business model relies on hardware and software engineers working closely together, as I've described in other replies.
    We would not be able to produce a viable product that is solely raw hardware.
    Also, what motivation does this other organization or company have to create software for our hardware?
    > The world will be better if any developer (paid by nivida or not) is able to write code for any GPU (sold by nvidia or not).
    This library is something that is designed to help you write Standard C++ code that runs on our GPU. Standard C++ runs everywhere.
    > It is not nvidia role to say how or when software will be written.
    Providing the SDKs and toolchains to program our platform is definitely part of our role in the ecosystem.
    > Their hardware is good and that's more than OK.
    Our hardware is useless without our software.
    > AI/CUDA code written specifically for nvidia is useless/deprecated in the long term. A lot of brain waste.
    I expect CUDA will be around for a while.
    
    jameshilliard 5 years ago
    
    I get the impression nvidia puts out a lot of their hardware supporting software themselves because they are hostile to open source community collaboration in general. This could be because nvidia is a big fan of vendor lock in.
    > Also, what motivation does this other organization or company have to create software for our hardware?
    Typically an organization like this would be a user of nvidia hardware, that's like asking what motivation Microsoft has for writing software/toolchains for Intel hardware. Maybe this attitude is why nvidia is notorious for having a terrible open source software experience for Linux graphics.
    > Our hardware is useless without our software.
    It's not(https://nouveau.freedesktop.org/wiki/) but nvidia is considered one of the worst hardware vendors when it comes to having proper open source driver/toolchain support. The situation used to be even worse but it's still not great now.
    
    dahart 5 years ago
    
    Aren’t we commenting in response to the posting of a new open source library that helps support a standard tool chain?
    
    jameshilliard 5 years ago
    
    From my understanding it's an open source library to support a Nvidia hardware specific partially(mostly?) proprietary toolchain.
    
    dahart 5 years ago
    
    All implementations of a standard are hardware specific. This library is implementing part of the C++ standard, and the source is open. I don't understand why you're complaining about lack of openness in response to increasing openness. Would you prefer that the library was closed and proprietary?
    
    blelbach 5 years ago
    
    > I get the impression nvidia puts out a lot of their hardware supporting software themselves because they are hostile to open source community collaboration in general. This could be because nvidia is a big fan of vendor lock in.
    It's not hostility, it's about agility. More so than other hardware vendors, we rely on really tight integration between hardware and software.
    In some situations, we find a hardware bug that would require another manufacturer to do a "respin" (e.g. restart the manufacturing process with a new, fixed design). Because we have tight control over the software stack, we can workaround that bug. It's faster for us to do this when we have full control. Also, sometimes these bugs have security implications, etc.
    That said, we've been moving in the direction of open source for a long time.
    > It's not(https://nouveau.freedesktop.org/wiki/)
    The other fellow was suggesting we should write no software at all. Nouveau's struggles is an excellent example of how difficult it is to write software for hardware without the engagement and interaction of the manufacturer of that hardware.
    TL;DR you're talking about whether our software should open source versus closed source; the other fellow was suggesting we shouldn't have software at all.
    
    mambru 5 years ago
    
    I think Nouveau would be in better shape with a bit of collaboration from NVIDIA.
    About libcu++, you guys would not be doing it if it weren't a differentiator thanks to Universal Memory. All that is OK until GPUs become a strategic topic for governments, then you may be regarded as monopolistic.
    Painting it as 'OSS frienly' or 'open' or 'standards compliant', when code written like that is likely to remain NVIDIA-only for many years seems intentionally deceiving.
    Also, there is an industry standard that uses standard C++ (Sycl), that you refuse to participate in or implement.
    
    jameshilliard 5 years ago
    
    > In some situations, we find a hardware bug that would require another manufacturer to do a "respin" (e.g. restart the manufacturing process with a new, fixed design).
    Sure, but I think that's got less to do with the tight hardware/software integration and more to do with they type of bug and the skills of the engineers tasked with finding workarounds. When you can make changes to the entire OS by working with the community you can often workaround more severe bugs than would otherwise be possible.
    > More so than other hardware vendors, we rely on really tight integration between hardware and software.
    Which is fine when the software is reasonably open, plenty of software is tightly coupled to specialized hardware, but when a hardware manufacturer like Nvidia with a significant market-share does everything their own way and effectively refuses to work with the community in sufficient capacity when it comes to open source integration you end up with a situation where others are having to put in extra effort to make things work. This is a major reason why a lot of people are freaking out when it comes to Nvidia buying ARM, Nvidia has a pretty bad reputation among the open source community when it comes to these issues.
    > Because we have tight control over the software stack, we can workaround that bug.
    I really don't see why that tight control is necessary, Nvidia isn't all that unique here either, many vendors have hardware bugs that are ultimately worked around in the mainline Linux kernel for example.
    > It's faster for us to do this when we have full control.
    Participating in community/open development processes doesn't preclude Nvidia from directly distributing hot-fixes or even make distributing hot-fixes more difficult in any meaningful way. Mainlining proper Linux drivers can also make distribution of hot-fixes easier as they would get included in distro packaging systems and update cycles effectively automatically. One thing to keep in mind is that if Nvidia were to step up and dedicate a properly sized team to mainline open source driver development Nvidia would have a lot more control over the direction of the open source driver development for Nvidia hardware.
    > Nouveau's struggles is an excellent example of how difficult it is to write software for hardware without the engagement and interaction of the manufacturer of that hardware.
    Of course, that's really the big issue, Nvidia refuses to properly engage in community driver development, this lack of engagement is the source of a lot of animosity from developers. In general reverse engineered drivers like Nouveau are a last resort option usually only done when a vendor refuses to properly engage with the open source communities for development. Since graphics drivers are probably the most complex kernel drivers out there this is especially problematic for hardware like Nvidia GPU's.
    > That said, we've been moving in the direction of open source for a long time.
    Hopefully that is true, but it's something we've heard before, and to be fair things did look like they were on the right track for improvement but that seems to have stalled somewhat back in 2017 when Alexandre Courbot left Nvidia as he seemed to be the main developer who was spearheading engagement with the Nouveau project. A company of Nvidia's size can easily afford to dedicate a full time development team to open driver development, it would be great if Nvidia would step up to the plate and put in the resources to maintain the mainline kernel drivers for Nvidia hardware.
    
    dahart 5 years ago
    
    With libcu++, Nvidia is not saying how or when software should be written. Because the library is meeting the C++ standard, it does exactly what you said you want, it allows any developer to write code for any GPU (or CPU!) The library is doing the thing you‘re asking for. AMD & Intel can support the same code with only namespace changes, using their own version, because it’s open and written to the open standard.
    
    blelbach 5 years ago
    
    > It is not their role to do that.
    You are incorrect.
    NVIDIA employs more software engineers than hardware engineers.
    > We need smarter and more shining GPUs from nvidia, not software.
    Software is a part of the GPU. You get better GPUs by having hardware and software engineers collaborate together.
    It is extremely expensive to put features into hardware. It costs a lot of money and takes a very long time. It takes 2-4 years at a minimum to put features into hardware. And there are physical constraints; we only have so many transistors.
    If we make a mistake in hardware, how are we supposed to fix it? At NVIDIA we have a status for hardware bugs called "Fix in Next Chip". The "Next Chip" is 2-4 years away.
    So what do we do? We solve problems in software whenever possible. It's cheaper to do so, it has a quicker turnaround time, and most importantly, we can make changes after the product has shipped.
    > I would say .... The hardware must be sold independently of the software ... but it is a bit too complex, I know.
    We don't sell hardware and you don't want to buy hardware. Trust me, you wouldn't know what to do with it. It's full of bugs and complexity.
    We sell a platform that consists of hardware and software. The product doesn't work without software.
    If we tried to make the same product purely in hardware, the die would be the size of your laptop and would cost a million dollars.

scott31 5 years ago

A pathetic attempt to lock developers into their hardware.

jpz 5 years ago

They seem to be pushing the barrier on innovation on GPU compute. It seems a little unfair to call that pathetic, whatever strategic reasons they have to find OpenCL unappetising (which simply enables their sole competitor in truth.)
Their decision making seems rational, of course it's not ideal if you're consumer. We would like the ability to bid off NVidia with AMD Radeon.
Convergence to a standard has to be driven by the market, but it's impossible to drive NVidia there because they are the dominant player and it is 100% not in their interests.
It doesn't mean they're a bad company. They are rational actors.
- my123 5 years ago
  
  With nvc++, they are converging towards a standardised source code standard: https://developer.nvidia.com/blog/accelerating-standard-c-wi...
  However, this notably doesn't cover binaries, which are GPU vendor specific in that case, so AMD for example would have to provide a C++ compiler implementing stdpar for GPUs targeted to their hardware.
blelbach 5 years ago

> A pathetic attempt to lock developers into their hardware
Ah-ha, you've caught us! Our plan is to lock you into our hardware by implementing Standard C++.
Once you are all writing code in Standard C++, then you won't be able to run it elsewhere, because Standard C++ only runs on NVIDIA platforms, right?
... What's that? Standard C++ is supported by essentially every platform?
Darnit! Foiled again.
daniel-thompson 5 years ago

I think CUDA itself is the locking attempt; this is just a tiny cherry on top.
pjmlp 5 years ago

The other vendors are to blame for sticking with outdated C and printf style debugging.
- einpoklum 5 years ago
  
  1. printf-style debugging is what we use on NVIDIA hardware too.
  2. OpenCL 2.x allows for C++(ish) source code. Not sure how good the AMD support is though.
  - pjmlp 5 years ago
    
    1. Ever heard of Nights and Visual Studio plugins?
    2. OpenCL 2.0 was a failure, so OpenCL 1.2 got renamed as OpenCL 3.0. C++ bindings were dropped and SYSCL is now backend agnostic.
    
    einpoklum 5 years ago
    
    > 1. Ever heard of Nights and Visual Studio plugins?
    Those are apples and oranges... also, you forget cuda-gdb.
    > OpenCL 1.2 got renamed as OpenCL 3.0. C++ bindings were dropped
    Well, yes, but also no. They were made optional, and transitioned to some other C++-cum-OpenCL initiative:
    https://github.com/KhronosGroup/Khronosdotorg/blob/master/ap...
    I'm not exactly sure how this differs and what's usable in practice though.
    
    pjmlp 5 years ago
    
    While SYSCL might stand a chance against CUDA, thanks to it being backend agnostic and a compiler neutral standard, C++ for OpenCL is a clang specific project which remains to be seen if it ever will get any adoption.
    > For C++ kernel development, the OpenCL Working Group has transitioned from the original OpenCL C++ kernel language, defined in OpenCL 2.2, to the ‘C++ for OpenCL’ community, open-source project supported by Clang. C++ for OpenCL provides compatibility with OpenCL C, enables developers to use most C++17 features in OpenCL kernels, and is compatible with any OpenCL 2.X or OpenCL 3.0 implementation that supports SPIR-V™ ingestion.
    https://www.khronos.org/news/press/khronos-group-releases-op...
    
    einpoklum 5 years ago
    
    > C++ for OpenCL is a clang specific project which remains to be seen if it ever will get any adoption.
    About adoption - you're right. But about being Clang-specific - that's not an issue. That is, OpenCL is also specific to an OpenCL compiler; so the fact that compilation from C++-for-OpenCL to SPIR-V is clang-based is not a problem. We can still compile our host-side code with whatever compiler we like, but the runtime compiler of OpenCL C++ kernels will be clang-based.
gj_78 5 years ago

Agree++. They are good at hardware and should stay that way.
- my123 5 years ago
  
  The thing is: that hardware isn't very usable without good software, and an easy to use software stack at that.
  That's what NVIDIA understood and made them what they are today.
  - gj_78 5 years ago
    
    A lot of hardware has builtin software, either inside a firmware or as a driver. Keeping the software part in firmware lets customer free to use any kind of OS. Using host cpu and memory is bad design IMHO.
    
    dahart 5 years ago
    
    Can you elaborate on what you mean? This is an open source library for developers to write code that can compile without changes on both CPU and GPU. This solves a problem that can’t be solved in firmware, and this is not a case of nvidia using cpu and host memory - whether to use cpu and host memory is strictly up to the developer.
    
    gj_78 5 years ago
    
    Sorry, related to cpu and host memory , I was wrong. I meant : having the GPU seller control/write code that plays with host cpu and memory is bad. Let people use their own gcc/g++ or whatever compiler and publish the specs. Unless they also start selling CPUs.
    
    dahart 5 years ago
    
    This is gcc or whatever compiler, it is not nvidia's compiler. This library does not give nvidia any "control" over host operations, it gives developers another tool.
    They did publish the specs, it's open source. BTW, Nvidia's acquisition of ARM means that it will be selling CPUs.
    P.P.S., the driver runs on the host, so your proposed alternative doesn't address the point you think you're making.
    
    gj_78 5 years ago
    
    I did not say the library controls anything, Nvidia controls the library : its features, its roadmap, its bugs corrections, development efforts (people) etc. All these choices are made by Nvidia. It is not just another tool , it is the tool that is closest to hardware evolution.
    Nvidia buying ARM is not a good news for me. The same way I don't like them making software, I also don't like them selling CPUs or seafood. They are good at GPUs and that's OK.
    The drivers are usually running in the kernel space and do not involve much of interaction with users. Firmware, on the other side, is hardware-close software and can be gradually replaced by specific hardware continuous improvements without the user/software or the OS noticing.
    
    blelbach 5 years ago
    
    > I did not say the library controls anything, Nvidia controls the library : its features, its roadmap, its bugs corrections, development efforts (people) etc. All these choices are made by Nvidia.
    libcu++ is a fork of LLVM's libc++, which we do not control. We contribute upstream and engage with that community.
    libcu++ is an implementation of the C++ Standard Library, which is controlled by an ISO standardization committee, which has about ~300 members, 10 of whom work at NVIDIA.
    > The same way I don't like them making software, I also don't like them selling CPUs or seafood. They are good at GPUs and that's OK.
    We employ more software engineers than hardware engineers. We don't sell hardware, we sell software + hardware.
    We manufacture and sell CPUs today. I like to think we're quite good at it.
    > The drivers are usually running in the kernel space and do not involve much of interaction with users.
    Incorrect. The core part of the driver, called the Resource Manager (RM), runs in the kernel space. Each different SDK (CUDA, OpenGL, Vulkan, etc) has its own "user mode driver", which is a shared library that interacts with RM. It's hard to say what the split is, but I'd say roughly half of what you think of as "the driver" is in user mode.
    > Firmware, on the other side, is hardware-close software and can be gradually replaced by specific hardware continuous improvements without the user/software or the OS noticing.
    Firmware runs on the GPU. You can't do everything from the GPU.
    But neither firmware nor drivers have anything to do with the toolchain that you use to write heterogeneous programs. That's what this is a part of.
    
    dahart 5 years ago
    
    > It is not just another tool , it is the tool that is closest to hardware evolution.
    This is an open source library that meets the C++ standard, which is designed and contributed to by many companies, not just nvidia. Like AMD and Intel, Nvidia does release some proprietary things that your complaints might apply to, but this is not one of them.
    
    my123 5 years ago
    
    > Keeping the software part in firmware lets customer free to use any kind of OS
    Raspberry Pi initially shipped with such a graphics stack, with the Arm side just being a communication driver in the kernel and an RPC stack in user-space.
    It isn't a good idea (for numerous reasons, including security) and is even more closed in practice than what ships today.
    
    gj_78 5 years ago
    
    Raspberry Pi is not marketed for graphics as nvidia is doing with their GPUs. What I mean is that firmware is running on a usually small cpu and memory that is sold as a part of the GPU. No security issues here as the main security issue is to plug the whole GPU inside your PC.
    
    my123 5 years ago
    
    With the complexity of GPU driver stacks, what you are asking for is not firmware, but a multi GHz+ set of CPUs just for that purpose.
    + RPC needed all the time... with its latency would tank the performance
    It'd also be not tinkerable at all unlike what we have today, it's exactly advocating for the opposite of open.
    
    kortex 5 years ago
    
    That sounds like vendor binary blob sdk libraries, only everything is an rpc and you're not even in the same memory space, aka distributed computing, except you have no control over the device stack. Sounds kinda awful to me.
    
    blelbach 5 years ago
    
    > A lot of hardware has builtin software, either inside a firmware or as a driver.
    Correct.
    > Keeping the software part in firmware lets customer free to use any kind of OS.
    Do you mean firmware, or firmware and driver?
    You can't do everything in firmware.
    > Using host cpu and memory is bad design IMHO.
    How do you propose that you program the GPU then?
    The CPU has to interact with the GPU. Some software has to manage that interaction.
    That said, we are not talking about either a driver or firmware. This is a part of our toolchain. It is a library that you use when writing a heterogeneous program.
- blelbach 5 years ago
  
  We employ more software engineers than hardware engineers. Our hardware doesn't really do much in isolation, software is part of the product.
  - gj_78 5 years ago
    
    The question is not about the head count. How many software engineers at nvidia produce software that is expected run/compile on the host CPU of the customer, like this library ? I expect not too much.
    
    blelbach 5 years ago
    
    The majority of software engineers at NVIDIA write software that runs on the host CPU.
    The majority of software written at NVIDIA (by any metric, lines of code, number of projects, etc) runs either solely on the CPU, or on both the CPU and the GPU.

Settings

Libcu++: Nvidia C++ Standard Library

Keyboard Shortcuts