The beauty and simplicity of the good old C-style void* in C++

66 points by movd128 4 days ago · 177 comments

Reader

stinos a day ago

> Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??

Fair point (although to be honest: 'complexify' feels a bit of an exaggeration here to me), but the answer to this why is simple: document and express intent clearly. The compiler gave you an error first such that you're forced to consider what you're doing. Any seasoned C++ developer seeing this knows what this reinterpret_cast means.

> Wow. With std::span the complexity-meter bumps in the red zone and goes even higher!

Same remark: yes, it's a bit more text to read, but again: to me (and many others I'm guessing) this clearly expresses intent. I also do not find it particularly hard to read. I mean, it's C++, you're likely going to encounter templates at one point or another, except in super specific software perhaps. But no-one also ever argued the C++ learning curve was easy, and trying to make it easier by refusing to use features which were added for good reasons and instead going back to constructs which are the very source of those reasons seems a bit backwards.

> As a nice addition, if you use SAL annotations, the function could be decorated a bit to help code analyzers detecting memory bugs

Some might also say it complexifies and uglifies the code. And in any case makes it non-portable on top of that.

VulgarExigency a day ago

It seems unlikely that this is the case, as the author appears to be experienced, but the post reads like the author has never had to maintain a "simple" and "beautiful" function that was mangled into incomprehensibility over the years, and where if a more expressive type signature had been written from the start, it would have restricted the damage caused over time.
- zahlman a day ago
  
  > a "simple" and "beautiful" function that was mangled into incomprehensibility over the years, and where if a more expressive type signature had been written from the start, it would have restricted the damage caused over time.
  ...Can you give a concrete example? I've been programming literally since the 80s and that doesn't ring true at all for me.
  - locknitpicker 14 hours ago
    
    > ...Can you give a concrete example? I've been programming literally since the 80s and that doesn't ring true at all for me.
    Even this week I stumbled upon legacy code that started off with a clean function, void DoSomething(Foo). Then a few years passed and someone started using Foo to handle two scenarios, let's call them Left and Right. They could have simply introduce two new types, FooLeft and FooRight. But no. Instead they kept Foo after adding a few extra optional fields, and extended DoSomething(Foo) as
    DoSomething(Foo foo, bool isLeft, bool isRight)
    This took place during the mid 2010s.
    Where have you been during all this time?
  - jerf a day ago
    
    Have you been in static types the whole time? It's a really, really common failure state in dynamic programming languages, the Everything Function, that started out as something simple, but then someone added a flag to make it also do this other thing, and then you need a flag to only do that other thing sometimes, and someone needed to operate on multiple things so they made a string parameter also optionally an array, and later someone allowed it to also be an object with this one method, or maybe another method if it's present because some other team implemented that before the first one and can't switch now, and before you know it it's a free-for-all of people adding flags and options and type analysis and if statements and you have a complete mess. Especially if this function is shared by many disparate teams, each of whom isn't "allowed" to break the others, though a single team can fail this way plenty fine.
    You can still do this in static languages, but they do push back a bit more because you don't get the flexibility that dynamic languages offer when it comes to accepting a huge variety of different input types.
    I've torn a few of these apart over the years. Never fun. Haven't tried with AI but suspect that would only be a quantitative change rather than a qualitative change. The fundamental problem with fixing these is lack of information about the exponential complexity of possible call mechanisms and the AI will have the exact same information problems I will, just faster.
    Edit: One of them that I tore apart ended up being two entirely separate functions slammed together into one by historical contingency. I don't just mean that I broke the functionality down into multiple functions, that's a basic tool of how you tear these down and is nothing of note. I mean that one of the "everything functions" I tore down had two distinct calling patterns that were distinct functions that not only shouldn't have been festooned with so many options, but never should have been one function at all because they weren't even conceptually the same thing or even particularly related.
    Think of it as two stages of a straight-line process, that were just jammed together because of the fact they got called at similar times, and the original writers weren't clear on the unrelated nature of the tasks and nobody was able to see it through all the obfuscation until I sat down, very deliberately, and I realized this as I was tearing it apart. I don't remember the details, I tend to remember things very conceptually and thus I have a hard time remembering the details of functions with no conceptual purity, but you can get close by thinking of the function as validating incoming parameters, and then applying the parameters to a database. And people were so confused that despite the fact this function, when tickled correctly, could do it all in one shot, sometimes, kinda, with some caveats, there were places where this function was called first to validate (with flags to shut off the application), and then to apply (with flags to shut off the validation). And to be clear, I mean, I did not realize it either even from my contact with the function over the years. It was only when I sat down with it for hours and systematically tore it down that I figured that out.
  - VulgarExigency a day ago
    
    I can't, as my employer owns the code, not me, but there are several examples in one of the Ruby codebases I unfortunately maintain where I can see this degeneration happen via the git history. A small 8 line method with just two parameters slowly grows in complexity over time, until one day one of the original parameters supports two different shapes, and later on it's not that easy to understand which shape it should have in the specific conditional branch you're trying to fix, and the last person to touch that code left the company 4 years ago.
    The fault, of course, ultimately lies with the people who wrote and approved this nonsense, but types, or at least type hints, help to avoid this issue.
    
    antiframe a day ago
    
    Can you point to it? That doesn't sound like "the language forced all this extra baggage on it due to 'safety'" so much as the developers kept adding functions to the function without rethinking if and how they should.
    
    VulgarExigency a day ago
    
    My point was not about the safety of the code, it was about the expressiveness, which is also what the comment I replied to was about. If the parameter has an explicit type (instead of no type, as is normal in Ruby, or `void*`, which is the C equivalent), it forces the developer to consider the design of the function, instead taking the path of least resistance because they're inexperienced/incompetent/a large language model/burnt-out to the point where even the thought of opening the file makes them feel the not-anxiety of burnout/<insert reason here>.
    
    dpark 15 hours ago
    
    > instead of no type, as is normal in Ruby, or `void`, which is the C equivalent*
    “void *” is not the equivalent of “no type” from Ruby. “void *” says “I operate on raw memory”. It says exactly the same thing as “byte *”.
    For sure you should generally not write a function that accepts a “void *” and then internally casts it to some concrete pointer type and operates on that type, but the problem there is the internal behavior, not the choice of byte vs void pointer.
    
    lostglass a day ago
    
    Forcing developers to consider and more is harmful though. You're arguing to put all of the forethought upfront, when you have the least context and least understanding of what can go wrong, and carrying that complexity forward rather than starting simple and refactoring over time.
    
    dpark 15 hours ago
    
    You’re really citing a mess in a Ruby code base caused by lack of typing as evidence for why void * is problematic in C/C++?
    These are so wildly different cases that the comparison isn’t meaningful. This is like saying you should wear a helmet while playing tennis because sometimes helmets save bicyclists lives.
    
    locknitpicker 14 hours ago
    
    > You’re really citing a mess in a Ruby code base caused by lack of typing as evidence for why void * is problematic in C/C++?
    If you read GP's post you'll understand it exemplifies exactly the issue that the likes of (void *) present in C.
    I mean, read the message, particularly this:
    > later on it's not that easy to understand which shape it should have in the specific conditional branch you're trying to fix
    That is exactly the purpose of void *. By design. It's a pointer to an unspecified type. The unspecified type is exactly why this thing is used.
    
    dpark 6 hours ago
    
    > later on it's not that easy to understand which shape it should have in the specific conditional branch you're trying to fix
    This is not idiomatic C. I have no doubt that someone (likely many someones) have written a function that takes a void * and then internally does some insane half baked dynamic typing. But I’ve never seen it and it’s not common.
    You also cannot fix this behavior by changing the pointer type. The type of the pointer is essentially meaningless in this case.
    > That is exactly the purpose of void *. By design. It's a pointer to an unspecified type. The unspecified type is exactly why this thing is used.
    This is also the purpose of byte * in the examples. Coercing an arbitrary pointer from void to byte doesn’t accomplish anything. It’s lipstick on a pig at best.
thomasmg a day ago

I don't have a strong opinion what is better in this case, but my view is:
> document and express intent clearly
Arguably, the void* does that as well?
> Any seasoned C++ developer seeing this knows what this reinterpret_cast means.
Same for void*?
> it's a bit more text to read
If you have to call it many times, this adds up.
> Some might also say it complexifies and uglifies the code
I think the point is that it adds security, which the other options don't. And, it doesn't add complexity on the caller, but only at one place: the implementation.
> makes it non-portable on top of that.
This can be solved.
- stinos a day ago
  
  > Arguably, the void* does that as well?
  Sort of (I mean: seeing void* and a size probably means 'arbitrary sequence of bytes' or something like that, but well it's void* so it can be like anything whereas with std::span you get more of a hint what's going on just based on the type), but not at the callsite which is what the author is referring to when it's about reinterpret_cast.
  > I think the point is that it adds security, which the other options don't
  Imo span also does that to some extent, but already when writing the code and not afterwards in e.g. static analysis. E.g. if I get an std:span<const char> I'd have to do counterintuitive things to misuse it. Annotating a void* still leaves it a void* which I then need to cast to char* if I think that's what it is intended.
  Don't get me wrong: I've written my fair share of void* but these days I really feel like there's almost always a better thing which can be used instead. Though I do admit that since I've written and consumed a lot of code with such alternatives I'm not hindered by readability/apparent complexity of it anymore but I understand that's not the same for everyone.
  - dpark 15 hours ago
    
    > whereas with std::span you get more of a hint what's going on just based on the type
    You don’t if it’s a span of bytes (or equivalent).
    Encoding the length in a span is a meaningful thing. But the fact that it holds a random memory pointer labeled “byte” instead of “void” doesn’t change anything.
- throwaway27448 a day ago
  
  > Arguably, the void* does that as well?
  How do you figure? The type is a pointer to quite literally anything, including nothing (ie a pointer that cannot be dereferenced). If you're working with bytes, indicate this with the type.
  - addaon 16 hours ago
    
    > The type is a pointer to quite literally anything
    No, it can only be a pointer to an object. It can't be a pointer to a function, for example.
  - dpark 15 hours ago
    
    All pointers in C/C++ can point to “nothing”. Swapping “void *” for “byte *” is basically an aesthetic choice.
    
    throwaway27448 3 hours ago
    
    Right. So why not use a typed reference?
    
    dpark 3 hours ago
    
    1. void * is a type
    2. The functions in question here operate on raw memory. void * conveys the same information as byte *. It’s not untyped vs typed. It’s two different types that convey the same information but require slightly different mechanics to obtain.
    3. The author explained his reasoning for preferring void * over byte * (or uint8_t *)
    4. A reference also isn’t the same thing as a pointer in C++
repelsteeltje a day ago

+1
And SAL annotations aren't even C++ proper.
MaulingMonkey a day ago

> Fair point (although to be honest: 'complexify' feels a bit of an exaggeration here to me)
Both uint8_t and std::byte require a header (<cstdint> or <cstddef>) which may expose you to platform x config specific build failures if you do any conditional #including, and the latter is a whole damn enum class with a strange adversion to arithmetic, where `byte |= 1` becomes `byte |= std::byte(1)`, `byte += 1` becomes `byte = std::byte(std::to_integer<std::uint8_t>(byte) + 1);`, and both become something you can accidentally step into in your full debug builds because it's an actual function call (at least on MSVC - still extra instructions on clang/gcc, but I can see the dang call instruction on MSVC!) instead of a compiler built in.
Not to mention, neither is vanilla C++03... I threw a `std::byte` example in a quick godbolt snippet and MSVC wouldn't compile without adding /std:c++17, because of course it defaults to earlier. Which is silly, but that's also the story of my life.
And don't get me wrong - that's all relatively minor - but it's all for middling to negative value IME. `void*` is frequently clearer - it's a signal that it's an opaque blob at this point in the code, and that something else will try to give it meaning later. I struggle to think of a single bug that I've encountered, that would've been caught by the compiler had I used `std::byte` over `unsigned char` or `void`. And conversely, I've seen APIs accepting `std::byte` but requiring higher alignment, where with `void` I might not have dropped my guard as much.
> `std::span`
At least manages to bind pointer and size into a single variable, which IME at least has the advantage of eliminating some bugs (e.g. mismatching pointers and sizes) and allowing some nifty utility functions to become a lot more wieldy. You can do things like feed it an array and not have to do any of your own `sizeof(...)` shenannigans. At this point you're possibly getting into positive expected value, but I'm going to eye roll at pull requests refactoring `void*` based stuff to use it unless I see at least one actual concrete example of calling code improving alongside it - I don't want just hypothetical theoretical ergonomics, I want actual concrete ergonomics!
- account42 7 hours ago
  
  > and the latter is a whole damn enum class with a strange adversion to arithmetic, where `byte |= 1` becomes `byte |= std::byte(1)`, `byte += 1` becomes `byte = std::byte(std::to_integer<std::uint8_t>(byte) + 1);`, and both become something you can accidentally step into in your full debug builds because it's an actual function call (at least on MSVC - still extra instructions on clang/gcc, but I can see the dang call instruction on MSVC!) instead of a compiler built in.
  That's the point, std::byte is for opaque bytes. You're not expected to do arithmetic directly just like you can't do arithmetic on void.
- AnimalMuppet a day ago
  
  > `void*` is frequently clearer - it's a signal that it's an opaque blob at this point in the code, and that something else will try to give it meaning later.
  And that's fine, until something else gives it the wrong meaning later. If you're just plumbing, and you're pumping around opaque blobs, if somewhere in the plumbing you connect the wrong source to the wrong destination, you get no warning.
  - MaulingMonkey 20 hours ago
    
    > if somewhere in the plumbing you connect the wrong source to the wrong destination, you get no warning.
    A valid concern. I've been in the position of having to fix those bugs.
    I'm not going to recommend `qsort` over `std::sort` or anything like that. I've seen `void*` "user data" pointers in C APIs and decided to stuff runtime checkable handle values in there instead of real pointers to blobs. In Rust, this can mean avoiding some unnecessary `unsafe { ... }` blocks, since while reading anything from a `*const c_void` requires such things, reading from a global `Mutex<HashMap<*const c_void, Arc<dyn Any>>>` or similar nonsense does not. I'll eat the performance hit unless I'm worrying about an actual hot path!
    And I'm not going to stand in the way of newtype wrappers around void pointers either. I'll +1 the PRs with `class ASpecificKindOfBlob { void* data; size_t length; };` and suchlike as well, if `std::span`s aren't your type of thing, and abide by measures to centralize such plumbing in one place such that the opportunities to make a mistake are fewer.
    But sometimes a blob is just a blob, and breaking out `std::byte` is putting lipstick on a pig. And it's not even the right color lipstick.
  - ryandrake a day ago
    
    Yes, in this case, void* is kind of smelly. If the intent of your function is to receive a const struct MyCustomData*, then that should be the type of the argument. If you later need to handle a const struct MyOtherCustomData*, you can add an overload that takes that argument. Or use a template as others pointed out. Use the type system to help you, so you're warned if you try to pass the method const struct BadCustomData* by accident.
    If you truly don't know what the underlying structure of the "blob" of data is, sure, go ahead and use void* and explicitly convert the pointer type when you know what it is, but at least add a comment that you're entering the danger zone.
    
    MaulingMonkey 20 hours ago
    
    To be fair, the `void*` is already a pretty big hint that you're in the danger zone.
drivebyhooting a day ago

Span will increase compilation time for no useful reason. It’s not any safer at the call site.
- ndiddy a day ago
  
  Span lets you use a ranged for loop to iterate over the contents without worrying about exceeding the bounds, which is safer than pointer+size if that's all you'll be doing. C++26 also introduces .at() for span, and the new hardened standard library enforces bounds checking when using operator [] on a span.
  - drivebyhooting a day ago
    
    The caller still needs to construct the span correctly.
    
    ndiddy a day ago
    
    You can pass both C arrays and some STL containers (i.e. std::vector, std::array) into a function that takes a span, and the span will get constructed automatically. You have to construct it manually if all you have is a pointer and a length, but I don't know what you'd expect to happen there.
    
    drivebyhooting 18 hours ago
    
    It’s not semantically safe to pass arbitrary vectors into a generic buffer copying function. The T in a vector<T> could have internal pointers or worse things.
    Either the objects are simple and trivially copiable, or you need a proper serialization library.
    Sure you can use span to generalize slides and iteration, but I don’t think that’s the point of the article.
    
    fluffybucktsnek 17 hours ago
    
    In comparison to a plain void* and a separate size, it's still an improvement. As others mentioned, void* suffers from the same problem (it might point to a type that is not trivially copyable), except it has more opportunities for mistakes.
    In contrast, with span, you can instantiate only to span<uint_8> (or something similar) and you'd still be able to accept other buffer types (such as vector<uint_8>, array<uint_8>, etc.). Alternatively, you can make T bounded to be trivially copyable. You can't do that with void*.
- throwaway27448 a day ago
  
  Well if compilation time is an issue, you chose the worst possible language to use. But if you must use C++, you should use the mechanisms that best communicate intent.
  - drivebyhooting a day ago
    
    It’s this kind of attitude that perpetuates bad compilation time.
    Templates have to get parsed and instantiated over and over again. Then you need link time optimization to deduplicate all the redundant copies of the same code.
    
    pjmlp 8 hours ago
    
    No they don't, modules exist, as do external template libraries.
    
    fluffybucktsnek 21 hours ago
    
    If youif you are using C++, your last concern will be compilation times. By this point, just use C.
    
    drivebyhooting 18 hours ago
    
    Untrue. Lots of effort is spent optimizing compilation time at big FAANG companies. And there a lot of established techniques for creating “compiler firewalls” and explicitly instantiating templates once.
    
    fluffybucktsnek 17 hours ago
    
    But these compilation times optimizations don't significantly undermine the other goals. Given that we're talking about std::span<T>, a pretty small template all things considered, I think practical evidence (e.g. actual cases) of impact is needed.
    
    drivebyhooting 14 hours ago
    
    The problem is not the size of the span template, it’s putting whatever logic into a header file instead of a sealed compilation unit.
    In a void* function prototype, whatever network code or gnarly dependency is all shielded behind a compilation unit. If you make it a template function the compiler will have to (re)process a lot of code. You could make the interface templatized in a header file and have the actual implementation use void * or char * pointer. That would recover good compiler performance.
    I don’t think span provides much if any safety for the implementer of the library function. I’m also not convinced it’s more ergonomic for the caller.
    
    throwaway27448 3 hours ago
    
    It is, however, more readable and maintainable.
    
    fluffybucktsnek 2 hours ago
    
    You don't have to make function templated to use span. Given:
    void DoSomething(void* p, size_t numBytes);
    Presuming p to mean a buffer of bytes, the direct declaration equivalent using span would be:
    void DoSomething(std::span<std::uint_8> /* or std::span<char> */ p);
    No templated logic in header files necessary. The only template instantion is std:span, which, in theory, should already be used in most files. The author argues this still makes the code more complex, because of the need of reinterpret_cast, but does it actually?
    std::span provides multiple ways of safely accessing data. For one, it provides an contiguous iterator, so you get access to the algorithms library basically for free. Second, you get safer accessors to the data inside, such as at, and even [] can be protected through contracts. Finally, even if you don't care about/can use these features, tying the pointer and length together reduces chances for variable confusion.

gignico a day ago

> It seems that some people are really losing the taste for good readable code.

It seems that some people never had taste for good reliable code. Use `void ` and now any error whatsoever is a direct undefined behavior. Moreover `std::span` clearly says that you are not* taking ownership of the memory (even though the language does not check it of course), while `void *` does not.

I understand that people can have many things to say about C++, and I do as well, but `std::span` should have been there decades ago and is such a life saver in these situations. A truly zero-cost abstraction which effectively saves you from a lot of troubles.

trumpdong a day ago

There's lots of UB in C-family execution models. Some of which is not actually UB because the implementation defines it - e.g. aligned DWORD-sized memory access is atomic on Windows because Microsoft said it is.
By choosing to use this language you choose to navigate the UB. Otherwise you'd be writing in Go, or Python.
It is possible to write reliable code despite the presence of UB in a language just like it's possible to drive to work every day for 20 years despite most of the directions you can point the car leading to an immediate crash. That's a needle with a much thinner eye than UB in C, and most people manage it. Mainly it means being very careful about lifetime and ownership. The Linux kernel manages it 99% of the time simply by being careful about lifetime and ownership, and that's a project with a huge number of contributors who don't intimately know each other's modules. I'm the Linux kernel you can't just say "new whatever" - you must have a plan for a lifetime of that whatever, and other people will review it.
I agree with you about std::span.
- arcticbull a day ago
  Yeah but also, quick question:
  struct S { char c; int i; }; struct S a = {0}; struct S b = {0}; memcmp(&a, &b, sizeof(a)) == ...
  If you answered 0, you'd be wrong, the answer is undefined, thanks to padding, initialization and alignment rules. Padding bytes are undefined, and not guaranteed to be initialized to zero even if the variable is declared static (where the members would be zeroed).
  This is why the compiler is angry at the post writer, and why the reinterpret_cast is needed. Ideally if they wanted to do something with the data, they'd unbox the structure.
  That's why it's not a good idea to use void* to pass arbitrary data interchangeable with bytes. It's a location, it makes no representation as to what's there and how to interact with it. Let alone who owns it.
  std::span solves two problems here. One is the ownership problem. The other is that span<T> is a T[]. void* is god only knows.
  The post asserts:
  > The code is very clear and straightforward: you pass a pointer to the custom data structure, and its size in bytes. That’s it. Simple and clear.
  This is unfortunately entirely false in C thanks to the aforementioned alignment/padding UB (and of course inner pointers). This is addressed with std::span. You'd still have to reinterpret_cast your structure to get the UB.
  > Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??
  tl;dr: because it doesn't. It just kinda looks like it does if you squint, and it's going to lead to the gnarliest bugs in the world.
  - saagarjha a day ago
    
    Padding bytes are initialized to zero if you zero initialize the aggregate. It is hard to keep those bytes as zero but at initialization this much is guaranteed.
    
    arcticbull 16 hours ago
    
    I looked into it some more and it's actually worse.
    For static or thread storage, in C11 and later, ={0} will guarantee padding is zeroed. For automatic storage, per C11 6.7.9, only subobjects are required to be zeroed. Padding is not. [1]
    In C23 initializing with ={} will give you zeroed padding, initializing with ={0} will not.
    [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
  - porridgeraisin a day ago
    
    > even if the variable is declared static
    No, for static even padding bytes are zero.
    For automatic, yes it may effectively turn a = {} to a.member = 0, leaving the padding bytes uninitialised. Or on copies like a = b it may not copy padding bytes.
- pjc50 a day ago
  
  > Some of which is not actually UB because the implementation defines it
  No - if something is UB in the spec, it's UB. The implementation will do something, sure, but what it does is not fixed and may even change based on compiler version and optimization level.
  > DWORD-sized memory access is atomic on Windows because Microsoft said it is
  Well, Intel said it is. Mind you I don't think there are any 32-bit native architectures where aligned dword access isn't atomic. Unaligned, on the other hand ...
  - trumpdong a day ago
    
    "Undefined behavior" in the C standard literally means "behavior which this C standard does not put any requirements on" - it says so in the definitions section of the C standard. Other things can still put requirements on it. MSVC isn't just a C++ compiler - it's a C++ compiler for x64 Windows and therefore follows the rules of C++, x64, and Windows all at once.
  - simiones a day ago
    
    > No - if something is UB in the spec, it's UB.
    A compiler is still free to ignore the spec and declare that something is not UB. However, this is very much compiler based, not platform based. Windows might guarantee that aligned DWORD-sized memory accesses are atomic, but that doesn't mean Clang when compiling for Windows would respect this - but MSVC might.
    
    singpolyma3 a day ago
    
    No, a compiler obviously cannot do this. nothing is undefined behaviour under a known compiler, version, and settings. UB means you can't know what the code does in general not that you can't know what it does in a very specific case.
    
    simiones a day ago
    
    UB has 2 very different implications:
    1. It means that even if your program happens to work, it can't be portable
    2. It means that even if your program happens to work today, it might stop working tomorrow when you add some new code, when you change some compiler flags, or when you do even a minor compiler upgrade
    Of course, a compiler can't address 1. However, a compiler can very much address item 2. If Microsoft were to say "in MSVC, we define integer overflow to wrap", then they would guarantee that `INT_MAX + 1` will produce `INT_MIN` regardless of any optimization settings, any compiler upgrades, any other changes to the code. Of course, compiling the exact same program with Clang or GCC might cause it to crash or corrupt memory or anything else - but as long as you stuck with MSVC, your program would have perfectly defined semantics.
    This is similar to using compiler extensions or intrinsics - they are not portable and not defined by the standard, maybe even explicitly defined to NOT be supported per the standard (such as variable length arrays in C++ in GCC), but they are nevertheless perfectly safe as long as you stick to your chosen compiler.
    Edit to add: the integer overflow example is not just a theoretical possibility - lots of C++ compilers provide the `-fwrapv` flag; when using that flag, signed integer overflow is no longer UB for that program, it is defined just the same as unsigned integer overflow.
- repelsteeltje a day ago
  
  There is a difference between UB in C, and something being undefined in some version of Microsoft C on Windows.
  Many of C's UB is specifically, intentionally left undefined in the standard to express code that relies on some specific way it is handled, is not proper, portable C. Indeed, the DWORD-sized memory access being atomic doesn't apply to MS Windows prior to version 3.0 running on a 80286.
  It's UB because the ISO C spec says it's UB.
pjmlp a day ago

That is quite common among C developer culture, play loose and brace for impact.
delta_p_delta_x a day ago

> A truly zero-cost abstraction
Sadly the MSVC ABI makes std::span and std::string_view a pessimisation:
https://github.com/tringi/win64_abi_call_overhead_benchmark
https://godbolt.org/z/7baaox7re
- usrnm a day ago
  
  Sounds like a compiler bug to me. It is a valid reason to avoid them in some rare cases right now, but it doesn't make the feature itself bad
  - j16sdiz a day ago
    
    Those are ABI. Unless it is inlining them, the overhead is to stay.
    
    usrnm a day ago
    
    ABI changes do happen. gcc had an ABI change in std::string because of C++11. It was long and painful, but everyone survived, the world did not end
    
    delta_p_delta_x a day ago
    
    > ABI changes do happen
    Will never happen on Windows, especially not in user-mode libraries, and especially not something this pervasive.
    
    pjmlp a day ago
    
    Contrary to the FOSS compile from source culture, other platforms have a different point of view on ABI breaks.
    Which is why Valve ended up using Proton.
    
    gpderetta a day ago
    
    I'm pretty sure GCC has been ABI stable far longer that MSVC which used to break ABI every release.
    GCC was forced to break the std::string ABI by the C++11 standard and they have been lobbing ever since against ABI breaks.
    
    wahern a day ago
    
    GCC/libstdc++ just changed the ABI for std::variant: https://gcc.gnu.org/gcc-16/changes.html#libstdcxx
    
    pjmlp a day ago
    
    You haven't used Windows in a while I imagine.
    MSVC has stabilised the ABI since VS 2015, we are on VS 2026 now.
    Due to customer pressure to stop doing exactly that, to the point some ISO C and C++ stuff that requires breaking the ABI has not been implemented thus far.
    I am quite certain that I will find ABI breaks in GCC release notes since Slackware 2.0, when I used it for the first time.
spacechild1 a day ago

> but `std::span` should have been there decades ago
Absolutely! I now use it consistently in all new projects where I can afford to mandate C++20. I guess nobody bothered to make a proposal before...
- pjmlp a day ago
  
  They did in C, from one of the language authors even, and it was not accepted.
  https://www.nokia.com/bell-labs/about/dennis-m-ritchie/varar...
  By the way, both Extended Pascal, Mesa/Cedar and Modula-2 have them, under the name of open arrays.
  Basically it took Go, C# and others for C++ to finally get its span.
  C probably never will.
  - spacechild1 a day ago
    
    Everybody knows that C++ did not invent the concept of spans and that it was late to the party. It doesn’t change the fact that (presumably) nobody made a proposal to the C++ standard.
    
    tialaramex a day ago
    
    > It doesn’t change the fact that (presumably) nobody made a proposal to the C++ standard.
    There were proposals about this for many years. C++ is just a terrible programming language, standardized by a committee (WG21) which exists in large part to boost the ego of one man, Bjarne Stroustrup.
    N3851 for example wants to name this idea "array_view" which like "string_view" is an impressively unwieldy name for a core language feature, because of course neither of these were actually proposed as core language features even though that's what they naturally should be -- but it is basically the slice type or as you (and modern C++) call it a "span".
    It's true that you can't change facts but what you've got here was a belief which was unfounded, not a fact.
    
    spacechild1 a day ago
    
    > There were proposals about this for many years.
    I wrote "presumably", but you are 100% correct. I'm always happy to be proven wrong.
    N3851 actually deals with multi-dimensional spans and goes way beyond a simple slice/span type. To me it seems closer to std::mdspan than std::span.
    The earliest proposal I could find that does propose something similar to std::span dates back to 2012: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n33...
    I really don't understand why this was not pursued further. At the very least, this should have made it into C++17 together with std::string_view.
    > because of course neither of these were actually proposed as core language features even though that's what they naturally should be
    Should it really? What would this even look like in C++? IMO std::span works perfectly fine as a library type.
    > C++ is just a terrible programming language, standardized by a committee (WG21) which exists in large part to boost the ego of one man, Bjarne Stroustrup.
    That's certainly not the reason why it was standardized. Pre-C++98 was wild west with every compiler offering there own (incompatible) idea of what C++ is. Yes, there are many problems with design by committee in general (and the C++ committee in particular), but there was a very good reason for standardizing the language. The committee is not a one man show and there are many occasions where Bjarne has publicly voiced his frustration and disagreement.
    
    tialaramex a day ago
    
    > The committee is not a one man show
    Of course it isn't, all the great egotists need a parade of sycophants to heap praise on them, you've doubtless seen modern US "Cabinet meetings" in which TV hosts newly elevated to run parts of the US government compete with experienced politicians as they all try to offer the most effusive praise for their snoring God King.
    Personally, I'd throw up, but then I'm very much of Groucho Marx's view on such things.
    
    spacechild1 a day ago
    
    Are your seriously comparing the C++ standard committee to the Trump administration? I know you have an axe to grind, but this is getting ridiculous.
    Where exactly have you seen this "parade of sycophants" in the C++ standards committee?
    As far as I know, Bjarne is just a regular committee members with just as many votes as everyone else and no veto powers. The committee frequently accepts or rejects proposals against his will. For a recent example, see his harsh criticism of the new 'contracts' feature in C++26.
    
    tialaramex 19 hours ago
    
    Yes, I am seriously making that comparison. It's not as bad of course but it's certainly enough to make me cringe.
    > Where exactly have you seen this "parade of sycophants" in the C++ standards committee?
    AIUI The committee itself operates under the "Chatham House Rule" in which participants agree not to tell anybody who said anything and so we can only see group outcomes for the committee itself. For example 100% affirmative votes for Bjarne's "Profiles" proposal. At 100% everybody who had the opportunity to vote "Against" has to admit that er, they didn't, because that's just maths - but you won't now find anybody who was enthusiastic, somehow a room full of people who all now remember being uncertain voted affirmatively anyway. How about that.
    > Bjarne is just a regular committee member
    For almost a decade, WG21 has a "Direction Group" with a handful of members which insists that while as you say everybody is just a "regular committee member" their group ought to set the "direction" for the language and thus the committee. The exact membership of the Direction Group varies over time, but of course Bjarne Stroustrup has always been a member of this group. The group (whatever its present membership) writes only unanimously, which means everything it says has been agreed by Bjarne Stroustrup, and it cites as its reference for how to set the direction several books about C++ all written by that same Bjarne Stroustrup.
    So, sure, Bjarne is "just a regular committee member" in the same way that Britain's Prime Minister is "just a regular Member of Parliament" that is, very much in theory but not at all in practice.
    
    spacechild1 11 hours ago
    
    If Bjarne was so powerful, how come they voted contracts into C++26 despite his strong concerns? How come he publicly vents his frustration with the direction the language is taking?
    
    tialaramex 7 hours ago
    
    Bjarne isn't god and I didn't say he was. So no, he isn't all-powerful.
    Bjarne has always been frustrated by the failures of C++ and has always blamed them on other people. He's an egotist, they're always like that, I find it exhausting.
    
    spacechild1 5 hours ago
    
    You have some valid points, but I think you could have made them without the gross hyperbole and slander.
    
    pjmlp a day ago
    
    Microsoft made the proposal for C++, after Midori project, and Office security improvements.
    Which by your comment, you have no clue about how it came to be.
    Proposal is linked in another comment of mine.
    
    spacechild1 a day ago
    
    Well, you could have linked an actual proposal instead of dropping some cool facts about C, Extended Pascal, Mesa/Cedar and Modula-2, as if that explained anything.
locknitpicker a day ago

> I understand that people can have many things to say about C++, and I do as well, but `std::span` should have been there decades ago (...)
Decades is kind of a stretch. C++11 introduced smart pointers, and finally getting C++0x out of the door was already a major victory. Given the history of C++, it would be unrealistic to introduce something like std::span before C++17.
Meantime, some organizations are still struggling to migrate to something like C++14.
- pjmlp a day ago
  
  It could have been there since the beginning, given that open arrays (aka spans) already existed in other languages, and there was even a failed proposal from Denis Ritchie regarding C.
  The C++ span proposal came from Microsoft,
  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p01...
  - locknitpicker a day ago
    
    > already existed in other languages
    This argument is moot. The issue with spans is not that they require cutting edge technology to deliver.
    Before commenting, perhaps you should research why even Denis Ritchie himself could not sell his idea to C.
    It's funny how every single idea that's rejected is blindly lauded as brilliant but silenced due to some kind of conspiracy, and only the ideas that emerged are somehow bad, unacceptable, or late. Is the point to feel outraged?
    
    pjmlp a day ago
    
    Easy, even one of the author's could not change WG14 mind towards security.
    Governments,related cybersecurity agencies, and companies are the ones getting outraged when looking at money spent in cyber attacks due to memory corruption issues.
    
    wahern a day ago
    
    WG14 adopted variably modified types, a kind of dependent type. From a security standpoint it offers all the same qualities. It also in principle was easier to integrate from a backwards compatibility standpoint, with the exception of struct member analogs (which we now have but aren't yet standardized).
    Maybe we would have been better off with Ritchie's counter proposal. But neither proposal was chiefly concerned with security, thus no proposals for, e.g., automatic bounds checking.
    
    wahern a day ago
    
    Just to be clear, I often think we would have been better off with Ritchie's proposal, assuming it would have seen at least as much adoption in implementations and usage as variably modified types, which sadly remained poor for many years after C99, and arguably still poor. But being better off doesn't mean being in a drastically better situation than we are today from a security perspective. The proposed alternatives were prerequisites for substantively improving security, but far from sufficient. And the delay in adopting and refining variably-modified types has cost much more than whatever marginal benefit Ritchie's proposal offered. Ditto for other gaps, like better facilities for handling arithmetic, e.g. overflow and mixed type comparisons. The first step in addressing overflow only came with C23 (overflow checking routines), and the latter only in the forthcoming C2y (typesafe, mixed-signedness min/max, etc).
    
    uecker 18 hours ago
    
    The support for variably modified types is excellent, if you discount MSVC which is lacking support for modern C anyway (it seems to catch up a bit though).
    
    wahern 17 hours ago
    
    Real-world usage certainly remains poor. Using pointers to VM types remains annoying, and I wish the committee would settle on a solution to the ordering of VM parameters. But, yeah, the VM types are solid in GCC and clang and should be used more.
    
    locknitpicker a day ago
    
    > Easy, even one of the author's could not change WG14 mind towards security.
    Your comment conveys a hefty dose of ignorance on the topic. I recommend you read the proposal's arguments, including how it required breaking the ABI.
    
    pjmlp a day ago
    
    Are you asserting that WG14 never had the necessary skills among all the members to help improve this proposal, or dare to bring another one during the last 40 years?
    
    uecker 18 hours ago
    
    This has not much to do with skill. Standardization does not work like this, and I told you this before.
    
    pjmlp 8 hours ago
    
    Yeah, I keep missing the existing practice in secure C compilers that is supposed to land on a WG14 paper.
    You can tell me the times you wish for, I will keep playing my trumpet until I see something in C that resembles to Modula-2 was offering in safety, nowadays brought back by Zig.
    
    locknitpicker 14 hours ago
    
    > Are you asserting that (...)
    No, I called out your opinionated ignorance on the topic.
    > WG14 never had the necessary skills among all the members to help improve this proposal (...)
    Frankly I don't think you even understand what you're arguing. I mean, to start off, if an idea is feasible then do you think it needs a full blown committee joining forces to magically fix all the problems?
    > (...) or dare to bring another one during the last 40 years?
    Again, you are showing a hefty dose of ignorance in the topic. Do you even understand that anyone can put together a proposal? That's how the process works. If you feel so strongly about it, where is yours? What have you been doing for the past 40 years?
    
    pjmlp 8 hours ago
    
    Who is more ignorant here, you repeating calling me that, or WG14 not doing anything to improve security since the Morris worm in 1988?
- gignico a day ago
  
  Afaik std::span does not need anything that was not in C++98 already, or am I missing something?
  - locknitpicker a day ago
    
    > Afaik std::span does not need anything that was not in C++98 already, or am I missing something?
    You're missing the fact that following C++98 it took around 13 years to get the next version of the standard published delivered.

IX-103 a day ago

Ick. The entire article starts from the fundamentally flawed premise that "you want a function that takes a blob of memory as an argument". Then they discuss bytewise access into structures..

Passing around void pointers is simply not a safe thing to do in C++. You can't do anything with a void pointer, so you're probably going to cast it as something else. Use that type instead, so that your caller knows they need to pass a valid pointer to that type. If the pointer has the wrong alignment then that will result undefined behavior. If you need to support multiple pointer types, use templates.

And, unless there are some really weird circumstances, you actually don't want to access your structures bytewise. Offsets can shift with compiler flags/versions. If you want serialization , please use a serialization library that correctly handles all of the odd cases. These can be quite efficient.

I've only actually had to munge bytes in a class once. Somebody decided that a previously POD class that was passed between processors with different memory spaces needed a virtual function, so I had to overwrite the vtable when I received it to make it valid.

AnimalMuppet a day ago

In general, you're right.
The exception that I could think of is a "dump memory" function. You take a pointer to something (who cares what it is), and print out the bytes there. That I could see taking a void*.
But that's a really limited case. In general, yes, you do not want to be dealing with blobs of memory as arguments. You want to be dealing with things that are known to be the right kind of thing as arguments.
- eddd-ddde a day ago
  
  But parent is right, you have to cast it anyways before reading from it, so might as well take the right type from the beginning.
  - TuxSH a day ago
    
    Anything can be aliased by char, unsigned char, std::byte (as well as signed char in C), and usually uint8_t == unsigned char, thus by extension any valid void pointer can be cast to u8*.
    Thus void*+size is usually the right type if ones only care for the memory representation of an object (cstring functions like memcpy, etc.)
    Most likely one would have both overloads:
    void Hexdump(const void *p, size_t size); // (1) template<typename T> // (2) inline void Hexdump(const T &obj) { return Hexdump(&obj, sizeof(T)); }
    With (2) being a wrapper to (1) that compilers will almost always inline, avoiding monomorphization costs (and (2) can also accept rvalues as argument).
    (1) could also take std::span<const u8>, but (void*, size) is the more common idiom, more convenient to use and to read , as it is unambiguous which overload it is.

kstenerud a day ago

The real question here is: WHY are you passing a blob of memory rather than a struct that uses the type system to describe and enforce what the contents are?

I don't mean dressing up an anonymous pointer, which the author rightly complains about. I mean WHY are you making an API that takes such a pointer to an unknown type to begin with? Whenever you change the structure within that blob, your type checker won't flag that the receiver hasn't been updated to handle it.

Even worse: nothing's stopping you from accidentally passing in the wrong type.

And now you have a SEGV. Or a security hole.

scott01 a day ago

For type erasure (it’s sometimes useful), custom allocators, I/O, for example.
- okanat a day ago
  
  Then pass an uint8_t with size aka a span<uint8_t>.
fp64 a day ago

I think the article names hashing as a use-case, which I can somewhat still agree. Operations that only depend on the bytes, I guess. But yeah, most things worth saying about this article have been said here already
- kstenerud a day ago
  
  Sure, if the function is expected to not treat the data as anything but bytes, then it might be acceptable in narrow circumstances.
  But in such a case I'd argue FOR the ceremony, as a way of declaring from the API "The input is a sequence of bytes that I won't treat as anything other than a sequence of bytes", and declaring from each and every call site: "This is not a mistake; we really are 'converting' this struct to a series of bytes for this function to consume".
  Then anyone auditing the code knows the intent by the shape of the types, and would quickly flag any typecasting shenanigans within the receiver function.
  But even then, hashing a struct will rapidly bring you into the land of dragons and fairies. Abandon all hope if you have floats or UTF-8 (which have multiple representations for the same values).
  Far better to remain type-aware if you value your sanity.
  - account42 6 hours ago
    
    A more immediate concern for hashing by treating a struct as a bag of bytes is padding.
  - fp64 a day ago
    
    I agree, the original article is rather questionable. I do not write code like the article advocates for. I would probably go for overloads for each data type I have considered and tested, or maybe something fully templated, or std::span/boost::span (hash function is, interesting enough, the very example boost docs give to illustrate boost::span).
- xigoi a day ago
  
  Hashing everything based on the byte representation breaks when you have a type where equality does not imply byte equality. Such as… floats (+0 and -0 are equal, but have different byte representation).
  - fp64 a day ago
    
    Depends on the use-case, hashing can also be used for checking integrity/change in which case you exactly want the behavior that only bit-exact-equality is desired, even for arbitrary structs. Maybe that's somewhat niche, I mention it as I have such a use-case actually.
- throwaway27448 a day ago
  
  Even then, accepting a uint8_t* would make this intent clearer.
9rx a day ago

> Whenever you change the structure within that blob, your type checker won't flag that the receiver hasn't been updated to handle it.
The relevant type is "blob". There is no further structure. If the function that accepts void* is trying to extract structure out of the blob, there is a bug in that function and the type checker should already catch you trying to extract structure from something that isn't there.
> I mean WHY are you making an API that takes such a pointer to an unknown type to begin with?
It's not unknown in any meaningful sense. It is known to be a sequence of 'arbitray' datums of a given length, which is the exact type of input required for the scenario given.
As the article explores, some argue that you should define that sequence with a concrete type, but the article states that it doesn't offer any additional value as is posits that void* already communicates the same. In other words, it suggests that void* is the concrete type for that type already.
locknitpicker a day ago

> The real question here is: WHY are you passing a blob of memory rather than a struct that uses the type system to describe and enforce what the contents are?
I completely agree. It's particularly egregious when the blogger complains that the complexity and ugliness lies in the type casting to force an incompatible type where it doesn't belong, and use a reinterpret_cast of all things.
This doesn't even feel like a strawman argument anymore. This sounds like a coding horrors entry.

cherryteastain a day ago

In C++ you'd just do

    template <typename T>
    void DoSomething(T const& data);

or if T* is supposed to point to a tightly packed buffer

    template <typename T>
    void DoSomething(std::span<T> data);

as the author pointed out. I don't see how that is ugly or more complicated than the original void* approach.

There is no need to pass the size of T or length of the span, former is just a sizeof(T) away and latter is a data.size(); away.

In fact, a lot of codebases would outright ban the uint8_t* and reinterpret_cast trick the author is complaining about via clang-tidy rules.

delta_p_delta_x a day ago

The blogger and the blog says:

> BTW: As a nice addition, if you use SAL annotations

> Windows C++ Programming

Not everyone will see the irony, but the Windows user-mode application and library suite and the kernel now very heavily rely on the safety mechanisms of C++ that the author calls 'complex', 'uglif[ied]', and has 'los[t] the taste for good readable code'. I'm of course referring to the Windows Implementation Library: https://github.com/microsoft/wil This is explicitly an effort from MS WinDev to make Windows C++ code safer. User-mode applications writing native Windows code can and absolutely should use it, too.

Any time I see `void*` in C++ I ring-fence it as a C-ism and make sure I `reinterpret_cast`. For me, a bag of bytes is `std::span<std::byte>`. void* is a memory location with no provenance, no ownership, no size information, nothing. Do I even know if it is this program's memory, or some shared memory construct, or maybe even a pointer into GPU memory? No for all.

C likes to play fast and loose and its proponents call it 'beautiful and simple', I call it a segfault/use-after-free/double-free waiting to happen.

pjmlp a day ago

It goes even further their beloved C code is compiled in compilers written in C++, including the standard library, exposing C++ implementations as extern "C" functions.
It is a pity that Microsoft backtracked on their C support.
WWDC is happening this week, one set of announcements at State of the Union was how Apple replaced a few C, Objective-C and C++ components, including at OS level with Swift.
- repelsteeltje a day ago
  
  Interesting. I'm know nothing about Apple, but maybe you can explain how idiomatic Swift handles Blobs and how that interfaces with C or C++ around void ptrs, std::spans etc.?
  - pjmlp a day ago
    
    Those are unsafe buffers, and have specific primitives to handle them, Swift also has span, and interoperability with Objective-C and C++ code.

voidUpdate a day ago

> "An interesting question you may ask in C++ is: “How would you declare a function that takes a blob of memory as input?”"

> "Now, suppose that you want to pass to this function a custom structure, like this:"

You would create another function that actually works based off that structure, rather than using your first function which operates on a set of bytes in memory. That way it's readable, like they want, and type-safe

trumpdong a day ago

I find this to be a snarky non-answer. You really think everyone should write their own memcpy for every POD type they want to memcpy?
- mfost a day ago
  
  There's no need: there's std::copy already.
  Or maybe the idea was to create a typesafe template wrapper around the generic function which is also very common and really nice. No need to create one wrapper per type, a single template should work.
  - adrian_b a day ago
    
    And how was std:copy implemented?
    Your answer is valid only for a programming language which assumes that the standard library is implemented in another language.
    C/C++ are supposed to be languages in which any program can be written, unlike languages like Java or Python.
    
    pjmlp 8 hours ago
    
    You cannot implement modern C++ in pure C++ nowadays, as the standard library requires compiler intrisics, just like a language like Java.
    
    AnimalMuppet a day ago
    
    Yes, C++ has those parts. You can use them to write something like std::copy. But use them, once, to write std::copy, and get that one implementation right (or have the library writers do it for you), and then use std::copy everywhere, rather than using void* everywhere. This makes your program much more type safe and less buggy, while still letting you write the parts where you really need to use the down-and-dirty stuff.
- AnimalMuppet a day ago
  
  In addition to what mfrost said, there's also no need because C++ assignment is member-by-member copy unless otherwise specifically implemented for the type. If you have a POD, then that's what you get with assignment; there's no need to call memcpy at all.
  (The difference is that memcpy will copy padding bytes, and the assignment operator may not. But if you depend on the values of the padding bytes, you have major problems...)

adrian_b a day ago

The "void*" of C solves a frequently encountered problem, but it has an inappropriate and misleading name, because such a pointer points to something, it does not point to nothing. Moreover, it does not have a size.

The correct solution in a programming language is to have a primitive bit string type (with a length that is a byte multiple) and to have a concise way (e.g. with dedicated symbolic operators) to write a type conversion from any data type to a bit string and a type conversion from a bit string to any data type.

Then the operations that make sense for arbitrary bit strings, e.g. copying, moving, input/output operations (e.g. file read and write), applying Boolean functions, shall have formal parameters of this type.

Much of what I have described here already existed in the language IBM PL/I, more than 60 years ago, except that in it only the conversion towards a bit string was explicit, with the built-in function "bit", while the conversions from bit strings to other data types were done implicitly, upon variable assignment.

Like any kind of array, a bit string must have an associated size, so there should be no need to specify it explicitly as a separate parameter.

gpderetta a day ago

Making DoSomething a template because span is a template is a non-sequitur.

If DoSomething works with untyped bytes, it should require a std::span<byte> (or const byte if read only). Incidentally the standard provides a convenient as_bytes(std::span<T>)->std::span<byte>; There isn't an as convenient helper to convert a singular object to a span of bytes, but it is easy to write.

As to why one should use span, is that a) it helps making sure that the size travels together with the pointer for some additional safety, b) it is more convenient to work with byte ranges than void ptrs (which do not support pointer math), c) helps a bit communicating intent: in C++ void* are used more often for type erasure than for byte related things.

Panzerschrek 14 hours ago

  void DoSomething(
   _In_reads_bytes_(numBytes) const void * p,
   _In_ size_t numBytes );

It's an anti-pattern in C++, which causes a lot of bugs and security vulnerabilities. That's why C++ Core Guidelines recommend to use std::span.

arcadialeak a day ago

char* is an exception to strict aliasing rules of C++ precisely to facilitate the author's use case. You would still need a reinterpret_cast to make it work, but it's actually good because it makes the intent clearer, and the cast would have still happened either way to read the raw bytes.

quietbritishjim a day ago
That was my first instinct too, but nothing the author said indicates they actually need non-strict aliasing. If the function had been:
```
   void DoSomething(void* src, void* dst, size_t numBytes);
```
... then it would be a different matter since maybe you want to allow src and dst to alias. Although, even then, they're still allowed to alias so long as the function accesses them both through char*, so the function signature can still use void*.
(Going deeper, non-strict aliasing applies to any pointers of the same type passed to a function. So if src and dst were both cast to float* inside the function, and if they really are both of that type (technically "an object of type float exists at the pointed-to location) then they can still alias. The char* exception is the only case that you can access a memory location through two different types of pointer and they can still alias.)
It's interesting the author mentions uint8_t. It's certainly more explicit than char, but it doesn't have the same aliasing guarantee (very strictly speaking - in practice it's almost always an alias for unsigned char or char, which does).
- myrmidon a day ago
  
  This is actually pretty annoying in embedded programming in C, because you'd often really prefer to use a uint8_t buffer[] for serialization functions (e.g. to write arbitrary data on some bus etc.) over char*, but you'd actually lose the aliasing permissiveness that you need (if you are strictly sticking to the standard-- this is often ignored in practice).

rfgplk a day ago

> void DoSomething(const void* p, size_t numBytes)

would be something like

template <typename T> void DoSomething (const T& ref) or void DoSomething(const T& ref, size_t numBytes) or C++20-y void DoSomething (const auto& ref)

If the class you're passing in already qualifies a size like member fn, template<typename T> requires requires(T t){ t.size(); } void DoSomething(const T& x){ ... x.size(); }

> void DoSomething(const uint8_t* p, size_t numBytes)

This is awful you lose type info irreversibly.

> template <typename T> void DoSomething(std::span<T> data)

You can do this but the above examples work just as well.

> Or maybe something even more complicated, like this?

template <typename T, std::size_t N> void DoSomething(std::span<T, N> data)

// Or this? template <typename T, std::size_t N> void DoSomething(std::span<const T, N> data)

This is more explicit, not more complicated...

> In this way, we still keep the clarity and simplicity of the function invocation: > DoSomething(&data, sizeof(data));

Stripping types is not a good idea, especially because you'll run into object lifetime issues _REALLY QUICKLY_. You need to guarantee that the object is trivially copyable.

Bizwen 8 hours ago

Don’t people remember the real history anymore? The original C did not include the void type, and therefore void* did not exist either. Original C used char* to represent pointers of any type and directly cast char* to double*. You can read the first edition of The C Programming Language from 1978 to learn about this. void and void* were essentially invented based on practices from early C++ and were eventually incorporated into ANSI C.

mwkaufma a day ago

If you know you want bytes -- A void* of unknown provenance cast to anything other than char* is UB so just skip the middleman and use char*.

s28l a day ago

char* can also be a C-style string. std::byte has the same special treatment in the standard as char and unsigned char, with the added benefit of not being used for other purposes (i.e. ASCII character or uint8, respectively).
- mwkaufma a day ago
  
  I was trying to appeal to OPs maladaptive "C++ that looks like C is more legitimate" aesthetic preference, haha.

jurschreuder a day ago

I would argue that you would not really know what the compiler would be up to with the memory void* points to.

To make sure I would put it in some kind of container.

Ailrk 17 hours ago

Lots of ways to criticize modern c++ but I don’t think void* vs span is a valid one.

newsoftheday a day ago

> On the other hand, with the “safe and modern” uint8_t prototype, the function call gets more complicated, as you need to add a type cast

To me, that is a feature, not an issue.

akkaygin a day ago

> In fact, std::span is a class template, and somebody would suggest to make the function that processes the generic memory blob a function template! Really? Something like this??

Yes.

delegate a day ago

It depends on what your function does with that memory. If the fn expects any kind of structure at that address, you and your callers are on your own, compiler can't help if the caller passes the wrong thing. Worse, acessing that memory might not immediately crash, but lead to strange side effects in your program.

Dynamic languages can handle this with reflection, but with void* you can only pray nobody makes the mistake..

zahlman a day ago

Many years ago I remember going through the Boost library and seeing C-style casts that seemed entirely gratuitous. I tried replacing them with what I was pretty sure were the equivalent C++ reinterpret_casts, and the result didn't compile. I never did figure it out.

s28l a day ago

C style cast can be either a static_cast or reinterpret_cast, but it can also be a const_cast or a static+const or reinterpret+const. Finally, it will perform a static_cast that bypasses private inheritance (because the alternative would be to fall back to a reinterpret_cast, which is wrong if the static_cast needs to apply an offset to the pointer)
- zahlman a day ago
  
  > a static_cast that bypasses private inheritance (because the alternative would be to fall back to a reinterpret_cast, which is wrong if the static_cast needs to apply an offset to the pointer)
  That may well have been it, then. I would think that if it could have been expressed naturally, it would have been.
  (I don't think I ever used private inheritance in my own C++ code. I'm not a huge fan of inheritance at all, so.)
pjmlp a day ago

To add on top of sibling comment, C style casts are too lose, and the main reason for the new C++ style casts is improved type safety.
So instead of anything goes, there is some additional type checking depending on the type of cast being made.

jayd16 a day ago

Here's a thought experiment. Is void* something we should add to other languages?

Would anyone argue yes?

ndesaulniers a day ago

I find this point to be generally why C can typically beat C++ in terms of code size; generic functions operating on void* are much less type safe, but the tradeoff is code size. Those template instantiations add up.

arka2147483647 a day ago

The best part of void* is that it is very terse. Both in definitions, and in access.

All cpp alternatives are more wordy.

I wonder how this conversation wound go if the was an as terse, but also typesafe cpp alternative.

nine_k a day ago

I first wanted to compare the use of void * to the use of a chainsaw. But then I realized that a chainsaw was many more safety features.

DennisL123 a day ago

<snark> Arguably the one thing C++ is great at is its type system. Makes total sense to cast it away. </snark>

pwdisswordfishq a day ago

> “Hey, why do you use the unsafe old C-style void* pointer?

Exactly, one should avoid unnecessarily erasing pointer target types. Luckily, C++ gives much better tools for that than C. This should have been a tem—

> Use some safe explicit type like uint8_t, which clearly represents an 8-bit byte!”

Sigh. Out of the frying pan, into the fire.

girfan a day ago

Partly serious, partly in jest: so type systems are no good?

void-star a day ago

This!

… Is why I picked my name.

themafia a day ago

I'm not a fan of C++ precisely because of template noise but what you gain with span, in that the pointer and the length are joined together, seem to outweigh the complaints on style.

Isn't there a way to make this an alias anyways?

singpolyma3 a day ago

Next realize you can just use C instead of C++ at all!

jeffbee a day ago

While I sympathize with the aesthetic theme of this post, I warn against the temptation to do what the post proposes, which is to try to compute the checksum of an object represented by void pointer and extent. There are many dangers here, one of which is that the checksum may read uninitialized memory, making the checksum meaningless. Another is that the pointer implicitly converted to void may be a different address depending on the type of the object in the calling function, if the type has multiple base classes. Further, your void reader may be reading derived class data you were unaware of, such that hashing a Base pointer twice yields different results because a member of Derived was placed by the compiler in the tail padding of Base.

In other words, don't do this. C++17 introduced has_unique_object_representations type trait which tells whether it is safe to do this to a given type. It is pretty much always false.

drysine a day ago

>a function that hashes some input data (using SHA-256, or whatever hash algorithm)

Along with padding bytes.

> Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??

That was the intention of reinterpret_cast - make ugly code look ugly.

squirrellous a day ago

One could argue the reinterpret_cast makes the intent more explicit which is a good thing.

That said I don’t have much against the use of void* or even char* here. If it works in C, it works in C++ just fine. std::span is not the right tool for this.

api a day ago

Real programmers use uintptr_t for pointers.

_the_inflator a day ago

I think that the author is right in everything he says and yes, there is beauty in it.

However, the antithesis is also correct that there exist better solutions to solve the issues.

Both premises hold true.

I have an extensive assembler coding background on 6510, M68000, and i486. I had a very hard time accepting that something could be solved faster and more stable in a higher order language while the downside is more memory, more CPU etc.

More and more it turns out that programming languages are something accidentally read by machines and written by humans, even though this premise got destroyed lately by AI.

However, what I love about C++ is, that it has a basic canon of commands that can be used to build nearly everything while looking extremely ugly and hard to grasp if you don't read very slowly and accurately - so it is a very error prone and dangerous thing that rightfully got substituted by better constructs that allow for better distinctions as well as usage.

I could do everything in assembler (Hey Python users: you know that in the end everything ends up as machine code, don't you?) but it takes 100x times longer and is constantly reinventing the wheel.

Have you ever started to get into the intricacies of bit signs? No? Well, you should definitely, and to this day it gave me a lasting impression when I started wrapping my head around it, when I was 10 to 11 years old hacking my way into the world of assembler programming on C128.

You don't want to take every concept into consideration. You don't want to take interoperability into consideration. All the time!

You want to focus on the problem to solve, not the implications of the implementations all the time.

I am having such a blast very often using Python since it just works with much cognitive distraction about which language construct to use in order to get the machine doing what you want. It is so capable, enable it, to simply ensure within boundaries that the compiler uses the best decision given the context, which is up to analysis.

That's why I stopped using C++ or more precisely stopped any attempts and trying to be smart or fancy. I got to re-read and maintain the code month to years later and history showed, I don't marvel at how magic the line works and brutally smart I was at the time, but simply hate me for obscuring something in a line, that could be well understood if I had used 10 lines, while the compiler gives a damn anyway.

C++ is still necessary but every discussion to this day is about the point you made: every digit counts - and also which position, context etc. You got to be very prolific in order to put into a line what other put into 10.

Is it worth it? No.

In early days it was the correct decision. Memory was sparse, CPU power slow, and the language was small compared to today.

The last time I felt comfortable with a "assembler kind feeling" was with JavaScript before ES6. Peak jQuery level, with the most coolest concept only JavaScript has: Function.prototype.toString()

John Resig will have his place in my programming heroes olymp, who revealed this secret for me, and it opened my eyes for the beauty of higher order languages.

I admire C++, but so do I Python.

But I hope I won't have to ever use C++ again.

zahlman a day ago

> Hey Python users: you know that in the end everything ends up as machine code, don't you?
I don't understand where you're trying to go with this call-out, especially if you're also describing yourself as a Python user.
But, like, no, not really; ordinarily, Python is bytecode-compiled and then the bytecodes are interpreted. There's machine code doing the interpretation, but that interpretation is not transformation.
pjc50 a day ago

One of these days I want to do a "typesafe macro assembler" that actually is the language people think that C is.
- bcjdjsndon a day ago
  
  You could call it rust and just rename the remaining UB as "unsafe" and call the problem solved
- rfgplk a day ago
  
  I'm actually working on something similar to this. Basically works as an additional preprocessed layer over C++. You can get some crazy results, but it's tricky to integrate with existing build tools without causing havoc.
- pjmlp a day ago
  
  It already exists, MASM and TASM, and related derivatives, with high level control flow pseudo instructions and less UB.
- trumpdong a day ago
  
  You could even give it identical syntax to C!
repelsteeltje a day ago

There will always be cases (like audio processing, car brakes, pace makers) where hard real-time constraints prohibit GC languages (as well as l1 cache, instruction reordering and other optimizations). Also, consider that Python's performance frequently originates in it's bindings to libraries written in C, C++, Fortran, Rust.
I recently ran a few Java benchmarks and found that an array holding a bunch of objects incurred approx 3x the number of bytes compared to the expected number based on underlying class data structure. With current RAM prices, that is something to consider if you're building software that's meant to scale. Mileage may vary, but I expect JavaScript or Python will be similar.
So, sure. There is a case to be made that ergonomics and dev velocity. And premature micro optimizations might take your focus away from good systems architecture. But I've frequently found the need to peal of leaky abstractions and having to understand and be savvy at low level stuff too. Nothing wrong with studying the guts of a C64 or Amiga, today.
Python, Java or TypeScript are good educational tools, but you'd be doing yourself a disservice if you'd confine yourself to them without understanding computers.
- simiones a day ago
  
  Note that, while complex, there exist GCs that can handle both soft real time and even hard real time constraints - especially for Java. Memory overhead is a problem with GC languages, though, and that one is by design.
- zahlman a day ago
  
  > There will always be cases (like audio processing, car brakes, pace makers) where hard real-time constraints prohibit GC languages (as well as l1 cache, instruction reordering and other optimizations). Also, consider that Python's performance frequently originates in it's bindings to libraries written in C, C++, Fortran, Rust.
  Sure. And every Python programmer who has any interest in those use cases learns about the issues quickly. Or more to the point: a big chunk of them are things you'd only do if you were employed to do them, and employers are setting the language requirements already. And Python programmers in particular are well aware of compiled-language bindings; that's the reason they're trying to use the packages that make package installation non-trivial.
  Huge swaths of use cases don't require performance.
  > Python, Java or TypeScript are good educational tools, but you'd be doing yourself a disservice if you'd confine yourself to them without understanding computers.
  This is an extremely strange thing to say when replying to someone who just described having extensive experience with C++ and multiple flavours of assembly.
  > I recently ran a few Java benchmarks and found that an array holding a bunch of objects incurred approx 3x the number of bytes compared to the expected number based on underlying class data structure.
  It was also strange to say that if you also had this experience yourself. A solid "understanding of computers" would have given you a better mental model of what Java needs to allocate. Results like this are because "the expected number" was not well thought out.
  > if you're building software that's meant to scale.
  ... And yet everyone just keeps pumping out Electron apps. Curious, that.
- bcjdjsndon a day ago
  
  What's your point here? The higher the language level, the less performant they are? Why are using anything but assembler then?

adev_ a day ago

This post post is honestly speaking a bag of garbage and ill advises:

> Some good old habit from C can still be positively used in C++, like the void* pointer and the size parameters.

That's garbage.

There is a clear interest of passing both size AND pointer in a single parameter like `std::span<std::byte>: It bind both value together and guarantee that you do not mess with the size of your buffer.

Pass "data" and "size" parameters through a chain of 5 function calls and there is a non-null probability that you passed "other_size" instead of "size" somewhere. This pattern happens everywhere in old C codebase and has been the source of countless security vulnerabilities and random buffer overflows for decades.

All modern languages (including freaking minimalist Golang) have now a "slice/span" concept built in.

It is not just to annoy programmers (and allow them to complain about 'complexity' in blog posts) but because it is a major improvement in term of memory safety and in term of reducing user errors.

> It seems that some people are really losing the taste for good readable code.

If 'span<std::byte>' or 'span<char>' are unreadable for you. The problem is not span, the problem is you.

These are concepts that has been existing for decades in almost all modern programming languages.

Even in conservative C++, it exists since 2014 in the GSL, in Qt and in boost.

And the interface is no different from vector...no excuse here... It is itself the most basic data-structure in C++.

> Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??

Sure. Let's extend the logic: I do propose also to replace all typed arguments with a void* pointer.

Because after all: 'It will just works fine' right ?

Type-safety and clear interface are overrated, we could all use only bytes and remove interface all together to get a closer experience of Fortran 77.

/irony

> Or maybe something even more complicated, like this? > template <typename T, std::size_t N> void DoSomething(std::span<T, N> data)

First that is non-sense.

If you want to pass a mutable buffer of byte, the correct signature is:

``void DoSomething(std::span<std::byte> data)``

There is no need for template signature here. You are making things up.

Second, there is also no need for the N parameter

``span<Type,N>`` is only used when enforcing a buffer with its size known at compile time is desirable. It can be for vectorization (e.g buffer is a multiple of the SIMD line) or to make it explicit in the interface (e.g for bloc cipher for instance)

> states that the pointer points to input read-only memory (_In_reads_)

You do that by using `std::span<const std::byte>` in any C++ codebase.

The fact he brags about that as "an advantage" for separated parameter passing just show currently how little is known here.

> My Pluralsight Courses

The kind of C++ code proposed in this blog post would be straight be refused in any PR in almost any serious organization with a proper review process.

So bragging about it on a blog while proposing some C++ teaching is audacious to say the least.

> To finish on that.

The sad thing is that there would be very valid criticism on `std::span<std::byte>`:

- Span does not do boundary check on access by default. Which is a bad design decision in 2026.

- It has an impact on compilation time due to the header inclusion

- std::byte is annoying to work with because it is a hack around an enum instead of a proper C++ builtin type.

But the blog post misses all these points entirely and sticks to complaining about 'Old C being better' the same way your family Grand-Uncle still brags about 'lead gasoline being better' for his 70s Pontiac.

bcjdjsndon a day ago

Makes you wonder why OP is using cpp to begin with if theyre suggesting void*

bcjdjsndon 11 hours ago

How in Nora does that get downvoted? It's the mainstream view on it

Settings

The beauty and simplicity of the good old C-style void* in C++

Keyboard Shortcuts