Mysterious Moving Pointers

55 points by camblomquist 2 years ago · 67 comments

Reader

IMHO this is another case where C++'s hidden layers of complexity hides bugs that would've been obvious in plain C. In fact for this particular use-case I'd probably use indices instead of pointers.

limaoscarjuliet 2 years ago

I remember the C++ ca. 1994 year when I started my career. It was C with Objects back then. And it was great! C++ was better C, it was easy for any C dev to convince him/her to jump to it.
I recently had to work with C++ code and... it is not a happy story anymore:
- Lots of magic like described here
- F**ing templates - for those who like them, did you ever see a C++ core file? Or tried to understand a single symbol?
- Standarization that feels like pulling more Boost into the language, which means more templates. Which makes core files incomprehensible.
It used the be that average dev who knew C could read and work with simple C++. This is no longer true. C++ is no longer a better C.
P.S. Example from recent core file, one line in the stack trace: boost::asio::asio_handler_invoke<boost::asio::detail::binder2<core::AsyncSignalCatcher<2, 15, 17>::waitForSignal<XServer::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<lambda(int)> >(XServer::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<lambda(int)>&&)::<lambda(const boost::system::error_code&, int)>, boost::system::error_code, int> > (function=...)
- tialaramex 2 years ago
  
  > I remember the C++ ca. 1994 year when I started my career. It was C with Objects back then
  It is possible, and perhaps even likely that the C++ you wrote in 1994 was indeed this "better C" and maybe even the C++ you read, the language Stroustrup wrote about in 1985 although "C with Objects" is much older still. The ISO document (C++ 98 aka ISO 14882:1998) was four years into your future in 1994, but the committee to write that document had existed for quite some time. Stroustrup's 1991 Second Edition of his appropriately already big book "The C++ Programming Language" explains that the committee have accepted Templates, although I believe at that point they didn't realise they'd inadvertently thus added an entirely new meta-language which is programmed differently than the rest of C++
  Boost comes much later, it's only about as old as the actual ISO document.
  - thriftwy 2 years ago
    
    I believe that not many developers really wanted C++ and craved Object Pascal instead.
    I don't like Pascal but I have to admit that Object Pascal is a succint and successful addition of OO to a manually memalloc language, whereas C++ is neither.
    
    pjmlp 2 years ago
    
    As someone that went from Turbo Pascal to C++, while enjoying Borland's ecosystem, the reasoning was clear.
    A programming language that at a time offered similar security and features, with much better portability.
    Never was a big C fan, with Object Pascal and C++ on my toolbox.
  - pjmlp 2 years ago
    
    Turbo C++ for Windows 3.1, released in 1993, already had support for templates in its initial design form.
    Borland C++ 2.0, or 3.0, rewrote BIDS from preprocessor magic into templates.
    CSet++ for OS/2, also early 1990s also offered template based collections.
    By 1996, MFC introduced template based collection classes as well.
  - limaoscarjuliet 2 years ago
    
    I started with Stroustrup's C++, it already had templates back then.
- hyperman1 2 years ago
  
  Templates as in C++ are a historical accident.
  Erwin Unruh demonstrated to the C++ commitee how they had accidentally created a second, compile time programming language inside the primary language. He did this by calculating primes and printing them in error messages.
  People got carried away a bit with the power made available by this new language. Unfortunately there are no basic ergonomics in template metaprogramming because it was not intended to be a programming language.
- forrestthewoods 2 years ago
  
  boost is just awful. Don’t use boost.
  templates are totally fine for basic containers. Much better than C macro shenanigans.
  The world is still looking for a “better C”. Zig, Odin, Jai and probably more are all trying. Of the three I think Jai is the closest. But it’ll be awhile.
- inetknght 2 years ago
  > - F*ing templates - for those who like them, did you ever see a C++ core file? Or tried to understand a single symbol?
  I have 20+ years of experience writing C++.
  Yes, I've looked at "core" C++ headers and source. The most annoying part to me is style (mixed tabs and spaces, curly braces at wrong places, parenthesis or not, the usual style complaints). But other than that they're very readable to a seasoned C++ engineer.
  I've also tried to understand symbols. You're right, they're difficult. But there's also tooling available to do it automatically. Even if you don't want to use the tools, there is a method to the madness and it's documented...
  Let me ask ChatGPT:
  > What tool lets me translate an exported symbol name to a C++ name?
  C++filt
  It's categorized as a demangler. That's your search term to look for (I had to remember what it was).
  Then I asked:
  > Is there a function in the standard library which allows to mangle or demangle a given name or symbol?
  It tells about `__cxa_demangle` for GCC. While I had forgotten about that, I'm pretty sure there is (or perhaps something similar) in the standard library.
  It also suggests to use a library such as `abi::__cxa_demangle`. Hah, that's what I was looking for. It's an implementation-specific API (eg, compiler-specific) API used as an example. It was mentioned on `std::type_info::name()` page here:
  https://en.cppreference.com/w/cpp/types/type_info/name
  So, to continue replying to you: yes, it's annoying but it's solvable with tools that you can absolutely integrate into your IDE or command-line workflow.
  > - Standarization that feels like pulling more Boost into the language, which means more templates.
  The boost libraries are open source and their mailing lists are active. If you don't like a given library because it has too many templates then you could make one with fewer templates.
  And, as standardization goes, it's also quite open source. The C++ committee is very open and receptive to improvements. The committee are volunteers (so their time is limited) and (usually) have their own improvements to the standard that they want. So you have to drive the changes you want (eg, actively seek feedback and engagement).
  > P.S. Example from recent core file, one line in the stack trace:
  I've seen much longer -- I've seen templates entirely fill a terminal buffer for a single line. That's extremely rare, definitely not fun, and debuggability is absolutely a valid reason to refactor the application design (or contribute library changes).
  I find it useful to copy the template vomit into a temporary file and then run a formatter (eg clang-format), or search/replace `s/(<\({[)/\1\n/g` and manually indent. Then the compiled type is easier to read.
  Some debuggers also understand type aliases. They'll replace the aliased type with the name you actually used, and then separately emit a message (eg, on another line) indicating the type alias definition (eg, so you can see it if you don't have a copy of the source)
  - GrumpySloth 2 years ago
    
    >> - Fing templates - for those who like them, did you ever see a C++ core file? Or tried to understand a single symbol?*
    > I have 20+ years of experience writing C++.
    > Yes, I've looked at "core" C++ headers and source.
    https://en.wikipedia.org/wiki/Core_dump
    
    inetknght 2 years ago
    
    Ahh, core dump, not "core" file. Ok, slightly ambiguous and I feel sheepish about it.
    I've also looked at core files in GDB, not directly. They're about as unintelligible as a core dump from a C program. It all depends on -fno-omit-frame-pointer: with it, it's usable. Without it, good luck because it involves un-convoluting the optimizer. That's both for C and C++.
quotemstr 2 years ago
In far more cases, zero-cost abstractions make obvious or impossible bugs that would be hard to spot in C programs, e.g. memory lifetime rule violations. And you could make a similar argument that C obscures bugs that would be obvious in assembly. High level languages are a blessing, and programmers who avoid them are only decreasing their productivity and those around them.
The problem the article highlights appears to be an implementation defect: in my libstdc++ test just now, we do, in fact, mark the list as nothrow move constructible. The standard should mandate that std::list be infallibly moveable.
Are we going to indict a whole programming model based on an isolated implementation bug? If so, well, isn't doing that from a "C is better" perspective the galactic black hole calling the kettle black?
```
    #include <cstdio>
    #include <list>
    #include <type_traits>
    #include <vector>

    struct Node;

    struct Connection {
        Node *from, *to;
    };

    struct Node {
        std::vector<Connection> connections;
    };

    int
    main()
    {
        printf("%d\n", std::is_nothrow_move_constructible_v<std::list<Node>>);
        return 0;
    }
```
- camblomquistOP 2 years ago
  
  I mentioned it in a side note that I trimmed because there were so many that it spilled into the footer (faster to trim the article than to fix the CSS,) but Microsoft is the only implementation of the big three that doesn't mark the move constructor here as nothrow. The standard doesn't require it so it's valid for MSVC to do things the way they do, it just creates problems like this that would arguably be harder to find the cause of if one had to build code for multiple platforms.
  - quotemstr 2 years ago
    
    Right. My point is that 1) this is a quality-of-implementation issue in MSVC, 2) the standard should be phrased such that the MSVC implementation is illegal, and 3) the C++ standard library solves a lot more problems than it creates despite having warts like this and C++ having some unfortunate defaults (e.g. mutability by default).
mhh__ 2 years ago

On the flip side C projects often end up with things like linked lists and pointers to [it depends] because that lack of abstraction, while obvious, encourages the programmer to take shortcuts (otherwise no time for actual logic)

TillE 2 years ago

Storing a pointer to memory that you did not explicitly allocate is always a red flag, I think. You really need to understand how everything works, and be very careful.

I would default to just using std::unique_ptr<Node> in a situation like this, especially since using std::list suggests performance isn't critical here, so the additional indirection probably doesn't matter.

kentonv 2 years ago

unique_ptr seems inappropriate here since the pointers aren't unique. shared_ptr doesn't even work because it looks like this data structure is representing a graph and would expect to have cycles. Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed, but that would not have fixed the bug here, just replaced segfaults with some more controlled exception or panic.
In fact, full-on garbage collection wouldn't have prevented the bug here, and could arguably have made it worse. The problem is that nodes were unexpectedly being copied and then the originals deleted. With GC, you'd still have the copy, but never delete the originals, so you end up with split brain where there are multiple copies of each node and various pointers point to the wrong ones. That'd be pretty painful to debug!
IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead. This is, of course, for backwards compatibility with the before-times when there was no move and copying was the only logical thing to do. But I think life would be better these days if all non-trivial copies had to be requested explicitly.
- inetknght 2 years ago
  
  > Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed
  std::shared_ptr comes with std::weak_ptr. Referencing counting is rather ham-fisted approach but is certainly a solution.
  > IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead.
  IMO that's not a problem in the language but a problem with the engineer (misunderstanding when std::move is necessary) and the tooling (linter/static analyzer not clearly identifying that something should be moved instead, and raising a linter warning for it).
  For that matter, the places where I see std::list used aren't places where "performance isn't important" but rather places where an inexperienced engineer was put in charge of implementation and a senior engineer accepted it. I can't remember the last time I accepted someone using std::list in a code review because there has always been a better design available even if it necessitated some teaching. If a stable pointer address is needed then indeed a smart pointer is the correct solution (perhaps std::vector<std::unique_ptr>). There are other reasons I've had coworkers cite for using std::list (eg constant allocation time) but that's generally resolved with std::vector.reserve(upper bound to size) or eg a slab allocator (unfortunately, I'm not aware of a standard-provided slab allocator, though to be fair I'm not very familiar with C++ standard allocators in general).
  > I think life would be better these days if all non-trivial copies had to be requested explicitly
  While I don't agree superficially (smells like bringing along deep-copy problems), I think the idea merits some thought experiments.
  It would be fairly trivial to do that for non-plain-old-data types by deleting the copy constructor/operator (so it cannot happen implicitly) and providing a `make_copy(...)` function instead.
  - kentonv 2 years ago
    
    Agreed that std::list is never the best solution. In every case where I want linked list semantics, it's because I want to be able to dynamically add and remove objects from the list without any allocations at all. The only way to achieve that is an intrusive linked list design, which std::list is not...
    
    inetknght 2 years ago
    
    > I want to be able to dynamically add and remove objects from the list without any allocations at all. The only way to achieve that is an intrusive linked list design
    No it isn't.
    std::array<std::optional<std::pair<T, std::pair<std::size_t /* prev index */, std::size_t /* next index */>>>, 256U /* or whatever your maximum size is \*/>.
    Indexing does come with a wart: you'd need a sentry value for a "no prev" or "no next" value, I'd just use std::numeric_limits<size_t>::max() for that. And of course when you move objects within the container you'd need to update the indices.
    If you don't know your upper size bound at compile time, then replace `array` with `vector`, and reserve your runtime-known upper bound. As long as you never violate your upper bound then no (re-)allocations occur (unless T's constructor allocates).
    
    kentonv 2 years ago
    
    In most (quite possibly all) cases I don't know the upper bound even at runtime. So, this ends up requiring allocation.
    
    inetknght 2 years ago
    
    You don't think it's useful to calculate some reasonable upper bound if only to avoid resource exhaustion?
    
    kentonv 2 years ago
    
    Not really:
    * In my experience arbitrary limits intended to prevent "resource exhaustion" tend to lead to incidents where the limit was hit but resources weren't really exhausted, so it just caused failures for no reason. Very common example is the ulimit on open file descriptors. The default limits feel like they were set last century and haven't been updated to account for the fact that we have way more memory these days. We should really just be enforcing an overall limit on memory usage (per-user or per-cgroup) and expect that to include the file descriptor table.
    * If I did choose a limit, it would be quite high (to avoid aforementioned unnecessary incidents), but in practice most of the lists I'm thinking of would never get near that size, so pre-allocating arrays would waste a lot of memory.
    
    inetknght 2 years ago
    
    So you don't know how much to allocate and you also don't want to allocate too much.
    It sounds like you want something built on top of std::vector but with your own rules about when to reserve() and how much.
    
    kentonv 2 years ago
    
    No, for the use cases I'm thinking of, I really do just want an intrusive linked list. Why would I want to invent a convoluted way to adapt std::vector when an intrusive linked list does exactly what I want?
    (Example use case: Some number of objects want to register themselves as observers on some event, and unregister themselves later, in arbitrary order. The same object may register and unregister itself many times. std::unordered_set<Observer*> would be the best fit if performance doesn't matter but an intrusive linked list requires no allocation at all on behalf of the list.)
    
    inetknght 2 years ago
    
    Let me remind you what you said which I quoted:
    > I want to be able to dynamically add and remove objects from the list without any allocations at all. The only way to achieve that is an intrusive linked list design
    So with that, let me reply to your most recent comment:
    > No, for the use cases I'm thinking of, I really do just want an intrusive linked list. Why would I want to invent a convoluted way to adapt std::vector when an intrusive linked list does exactly what I want?
    You said an intrusive linked list is the only way to achieve that design. I'm telling you it's not the only way to implement a zero-allocation linked list.
    If an intrusive linked list is exactly what you want then great for you. You haven't clearly articulated exactly why you must have an intrusive linked list. I've given you an alternative. You're rejecting it out-of-hand because you don't want to "invent a convoluted way to adapt" already-existing containers.
    > Why would I want to invent a convoluted way...
    You only think it's convoluted. I argue that it's not convoluted at all.
    There are many reasons you would not always be able to use an intrusive linked list. Intrusive linked lists are... intrusive. They require that you:
    - directly modify T (again, might not be possible if you don't own the code for T, eg it comes from a third party library)
    - or have T as a member (which also might not be possible, particularly whether T must be created by a factory and how convoluted that factory is)
    - or else have capability to inherit from T (which might not be possible, see `final` keyword).
    You haven't stated any of these (or any other) reasons. You simply stated that an intrusive linked list is the "only" way to achieve zero-allocation linked lists. Without stating why you must have an intrusive list, I have merely offered another option. If you think using the standard library is convoluted here then I wonder what other data structure concepts you will have trouble understanding and suggest that C++ isn't the right language for you.
- o11c 2 years ago
  
  The good news is that since C++ containers aren't special to the language, you can just implement your own wrapper classes that disable the copy ctor (and provide an explicit `.clone()` instead). Coupled with `#pragma GCC poison` it is pretty easy to blacklist legacy footguns in source files at least (though not in headers without some aggressive work).
  ... and yet, almost all vulnerabilities in C++ code are still written in C style, not even legacy C++.
  - kentonv 2 years ago
    
    Yeah I pretty much only use my own alternate container implementations (from KJ[0]), which avoid these footguns, but the result is everyone complains our project is written in Kenton-Language rather than C++ and there's no Stack Overflow for it and we can't hire engineers who know how to write it... oops.
    [0] https://github.com/capnproto/capnproto/blob/v2/kjdoc/tour.md
    
    mabster 2 years ago
    
    Every work place I've worked at has their own container classes for performance reasons.
    We did use some C++ standard library but very sparingly. For example, std::unique_ptr<> was fine, but std::shared_ptr<> was not because of the way it implicitly stores it's reference, so we had our own implementation that exposed the reference.
    When I was doing console games this was a must. You would often have restrictions where the container itself should live in a different area of memory to the items which is much easier to manage with your own classes.
    
    o11c 2 years ago
    
    The problem with that is that it is not providing the standard library API, but rather its own API. A good alternative should only remove footguns (ideally, to the point that you can replace with typedefs and get correct behavior just with less protection in case of future changes).
    Aside, it's embarrassing that all these alternative libraries fail to implement automatic correct signed/unsigned mixing. It's not like it's particularly hard to implement!
    
    kentonv 2 years ago
    
    Many std library footguns are inherent to the API...
im3w1l 2 years ago

I mean the article explains the original author did take explicit steps to keep the Nodes fixed in memory (std::list), because they knew it could be a problem if the Nodes moved.
They just got tripped up by an obscure language feature that made the Nodes move anyway.

jnwatson 2 years ago

This is a great reminder of the pox that was Microsoft of the early part of the millennium. Besides an allergy to investing in web standards, they were woefully behind in their language support. Their non-adoption of modern C++ standards held client security back for a decade, and arguable held language standards development back.

pjmlp 2 years ago

There is a certain irony complaining about Microsoft, while praising everyone else in regards to C and C++ compilers, as if outside the beloved GCC, in a age where clang did not exist, the other proprietary compilers were an example of perfection.
Apparently the folks didn't learn their lesson with Web standards, given the power they gave Google to transform the Web into ChromeOS.
chaboud 2 years ago

That hardly seems fair.
(Microsoft was doing this to C++ well before the early part of the millennium…)
Edit: and in non-joke fairness, Microsoft has really come a long way on this regard.
jinchengJL 2 years ago

Having only worked with gcc and clang, OP’s code looked completely fine to me and I was baffled why many comments think the code is at fault. Judging by this page [1], I agree this is entirely MSVC’s doing.
[1] http://howardhinnant.github.io/container_summary.html
- Joker_vD 2 years ago
  
  Yeah, Microsoft absolutely should've made their containers use throwing move-constructors in violation of the standard, in order to better adhere to the standard.
  In reality, it's a defect in the standard, plus a quality-of-implementation issue in the Microsoft's C++ Standard Library.

fransje26 2 years ago

I must be misunderstanding something from this article. With:

struct Node {

    std::vector<Connection> connections;

};

struct Connection {

    Node* from, to;

};

Does this mean that to create the vector of connections, Nodes are created, and references are taken to store in the Connections? And then the Nodes are stored in the list, with std::move()?

I don't understand why you would want to go down that road. Intuitively, I would assume that you are not safe from an object copy somewhere down the line and your graph then comes crashing down like a house of cards. Wouldn't it make more sense to store the nodes as pointers? If you like to live dangerously, something like:

struct Graph {

    std::vector<std::list<Node\*>> nodes;

};

Or better:

struct Graph {

    std::vector<std::list<std::unique_ptr<Node>>> nodes;

};

The later will give you plenty of warnings if you do not copy Nodes around with std::move().

Or less performant, but maybe safer, std::shared_ptr<Node>, together with:

struct Connection {

    std::weak_ptr<Node> from, to;

};

so that you have some check guarantees before access?

camblomquistOP 2 years ago

I don't think you're misunderstanding, it's a strange choice to make even if the example here loses context to make it easier to make the point. It wasn't code that anyone currently here wrote and because it worked with the old compiler, nobody really touched it. I believe the nodes are pushed into the list before adding connections but I don't think that changes the point you're trying to make. That said, one's intuition is different from another's and I don't think it's unfair to assume that pushing into a vector of lists won't cause every existing list to be copied. For that to be the actual behavior is kind of disgusting. It may make more sense to create pointers here but that's a larger change to deal with and ensure correctness versus just swapping the vector out for another list. I don't claim to like that solution but it seems to me like legacy C++ code in general is fragile enough as it is, the less I have to change to fix bugs the better.
- fransje26 2 years ago
  
  Thank you for the clarification!
  > It may make more sense to create pointers here but that's a larger change to deal with and ensure correctness versus just swapping the vector out for another list. I don't claim to like that solution but it seems to me like legacy C++ code in general is fragile enough as it is,
  That's more than fair enough. In such code, more often than not, you make what you think is a small change, and you end up with an entire cascade of unexpected side-effects that pop-up because the original assumptions and the testing scope of those assumptions were lost. A can of worms best left unopened.
  By the way, wouldn't such an error be something that gets detected by the "new" ASAN functionality that has been added to the newer MSVC toolchain you are using?

wakawaka28 2 years ago

This is a noob mistake, not a huge mystery. It's not always wrong to store raw pointers to STL container elements, but if you do then you must take care of reallocations.

If you find storing pointers to elements too perilous, you should probably just make a container of pointers instead.

D13Fd 2 years ago

> This is a noob mistake, not a huge mystery.
The interesting part is why it worked in VS2013 but failed in VS2022.
- wakawaka28 2 years ago
  
  >The interesting part is why it worked in VS2013 but failed in VS2022.
  I thought the explanation he came up with was interesting but such a change could have easily been brought about by changes to reallocation parameters in the implementation.

gsliepen 2 years ago

There is a lesser known cousin to std::vector that doesn't have to move nor copy its elements when adding new elements, and that is std::deque.

vardump 2 years ago

Right. But of course std::deque comes with a cost; iteration is slower and memory footprint is bigger.

olliej 2 years ago

I know C++ the language but not the STL (the overwhelming abundance of UB and total lack of safety make it an anathema), so my question is why the STL allows/requires non-move here copying here dependent on whether an object has a no throw move constructor?

Note I’m not asking about move constructor vs memmove/cpy but rather the use of copy constructor vs move depending on exception behavior? Is it something like prefer no throw copy constructor over “throwing” move?

chaboud 2 years ago

That’s a bit like saying you know C++ but not streams or templates, or C but not floating point operations. It’s probably worth learning STL.
Anyway, the reason to use move instead of copy is for performance. Move constructors are faster because they can leave the source object modified (e.g., take over control of a pointer to deep contents). This falls apart when the move constructor can throw, because the container might be part way through a resize when this happens, leaving the object before the exception modified and the code in an unrecoverable state.
Basically, unless we can be super duper 100% certain we’re going to make it through the operation without throwing an exception, we’re going to copy, leaving the objects in question in an unaltered state, and holding to the promises of the standard.
- olliej 2 years ago
  
  I phrased that badly - it should have been “I don’t know every edge case in the STL, and so I don’t know why this would have different behavior”.
  However thanks for explaining the issue. This one is obvious and I just completely failed to think about how you ensure the source object is in a safe state if an exception occurs part way through moving the source data. It seems to imply the old MSVC behaviour was incorrect in such a scenario, but I hadn’t considered that possibility so assumed it was correct and therefore didn’t think of why this behaviour is required.
  My solution is of course to simply not allow exceptions because the c++ model of everything implicitly throwing is just as annoying as Java’s “let’s explicitly annotate everything” model albeit with different paths to sadness.
  - chaboud 2 years ago
    
    Fair enough. Honestly, there are very few people in the world who could confidently claim that they know all of the STL. The first place I worked at disallowed it (MSVC 5 was fresh then, so it was somewhat understandable), and we had our own performance centric data structures. But the value of container classes and promises about performance first really hit me when I went to my next job and dug in on the STL. Truly eye opening stuff, and absolutely available for reading (which was pretty cool at the time).
- inetknght 2 years ago
  
  > leaving the object before the exception modified and the code in an unrecoverable state
  It isn't likely to leave the code in an unrecoverable state even if recovery is calling std::terminate (or worse).
  It is likely to leave the data in an unrecoverable state. Imagine that a vector of 4 items was resized -- the first two objects move successfully, but the third one throws an exception. Then in your move function, you catch that and decide to undo your changes before propagating the error. Then when you're undoing the changes the first object throws an exception when it's being moved back. Oof! At best you've got multiple active exceptions (legal if you're in the catch handler, but should be rare and definitely should be avoided) and at worst your data is indeed unrecoverable (thus one of many reasons why std::terminate is the default option when multiple exceptions are alive on the same stack).
  - chaboud 2 years ago
    
    Sure, but assumptions about the state of data are made in real world code. It’s just a mess, which is why the code in question needs to break into jail and very explicitly indicate that it isn’t going to fire off exceptions (and hold to it). Honestly, C++ move semantics and default behaviors could be its own very lengthy conversation. It’s why the bulk of my C++ code over the years has been explicit about references and copies (since most of my C++ code has been about high performance real time rendering or data analysis).
tialaramex 2 years ago

The overwhelming C++ priority beating both safety and performance, often to the consternation of the performance people, is backwards compatibility with dusty archaic code. If it was written by somebody whose funeral was last century, WG21 thinks it's important that it still compiles in your C++ 23 compiler whenever you get one of those. Not crucial. Not so that they actually defined the language sensibly to avoid compatibility problems, but important enough to trump mere performance or safety concerns.
Last century move didn't exist. The terrible C++ move (which is basically an actual "destructive move" plus a default create step) was invented for C++ 11 which was, as the name suggests, standardised only in 2011.
So back then everybody is using copy assignment semantics. Your compiler might be smart enough, especially for trivial cases, to spot the cheap way to deliver the required semantics, but it might not especially as things get tricky (e.g. a std::vector of std::list) and semantically it's definitely a copy, not a move.
As a result the "non-move" that you're astonished by is how all C++ code last century was written, the semantics you're just assuming as necessary didn't even exist in ISO C++ 98 and it is considered important that such code still works.
kentonv 2 years ago

I think the other replies may have misunderstood your question. I think you are asking:
Why does std::vector<T> require T's move constructor to be noexcept (or else it falls back to copying instead)?
The reason goes something like this:
When std::vector<T> grows, it needs to move or copy all of its elements into a new, larger-capacity array. It would prefer to move them, since that's a lot more efficient than copying (for non-trivial types). But what happens if it moves N elements, and then the move constructor for element N+1 throws an exception? Elements 0-N have been moved away already, so the vector is no longer valid as-is. Should it try to move those elements back to the original array? But what if one of those moves fails?
The C++ standards body decided to sidestep this whole problem by saying that std::vector<T> will refuse to use T's move constructor unless it is declared noexcept, so the above problem can't happen.
In my opinion, this was a huge mistake. Intuitively, everyone expects that when an std::vector<T> grows, it's going to move the elements, not making a ton of copies. Often, these copies result in hidden performance problems. Arguably the author of this post is lucky than in their case, the copies resulted in outright failure, thus revealing the problem.
There seem to be two other possibilities:
* std::vector<T> could simply refuse to compile if the move constructor was not `noexcept`. I think this could have been done in a way that wouldn't have broken existing code, if it had been introduced before move constructors existed in the wild -- unfortunately, that ship has now sailed and this cannot be done now without breaking people.
* std::vector<T> could always use move constructors, even if they are not declared `noexcept`, and simply crash (std::terminate()) in the case that one actually throws. IMO this would be fine and is the best solution. Move constructors almost never actually throw in practice, regardless of whether they are declared as such, because move constructors are almost always just "copy pointer, null out the original". You don't put complex logic in your move constructor. And anyway, C++ already has plenty of precedent for turning poorly-timed exceptions into terminations; why not add another case? But I think it's unlikely the standards committee would change this now.
- Joker_vD 2 years ago
  
  Honestly, trying to move the elements back and calling std::abort if that fails seems fine. It is indeed an exceptional happenstance, and how quickly you can recover from it is probably not as important as being able to recover correctly. And who catches exceptions around resize()/push_back() anyway?
inetknght 2 years ago

Throw specifications do not change function call binding behavior.
Move constructor and move operator will bind to an R-value reference if the move constructor or move operators are available. Conversely, if those functions which declare to not throw anything do end up throwing something then the result is std::terminate.
The only things that determine whether to use a move or copy is whether the reference is an R-value and whether the source or destination is const.
You can declare a {const R-value move operator (not a constructor) for the left-hand} and/or {const R-value move operator or constructor for the right-hand side} of the argument. But you won't be able to modify anything not marked mutable. You shouldn't do that though: that's a sure way to summon nasal demons. That said, I see it fairly often from less experience engineers, particularly when copy-pasting a copy operator intending to modify it for a move operator.
wffurr 2 years ago

What do you use instead of std::vector, map, unique_ptr, etc?
I have a hard time thinking of C++ and the STL as separate. Even our internal utilities and such tend to be STL-like although often with safer defaults.
- tialaramex 2 years ago
  
  Lots of the C++ standard library, including the STL containers isn't provided in freestanding C++. Now, in reality freestanding C++ has been kind of a joke - the committee for years barely bothered to keep it working - especially compared to freestanding C (which is well defined and used all over the place) and say Rust no_std (likewise) - and so many embedded systems may have the entire standard library notionally available even though parts of it are definitely nonsense for them and they've got local rules saying not to use the parts that would definitely explode in their environment... but many C++ programmers who have worked under such rules just reflexively avoid the STL's containers and maybe its algorithms even in an environment where those would work.
- olliej 2 years ago
  
  Essentially the same things but reimplemented safely - see WTF in webkit.
  There are still issues (the iterator API used by for(:) is very hard to make safe without terrible perf issues, though I was looking at this recently and the compilers are doing much better than they used to).
  Things like unique_ptr and shared_ptr do not meaningfully improve the security of c++ despite being presented as if they did (all serious c++ projects already had smart pointers before the stl finally built them in so presenting them as a security improvement is disingenuous), and because of the desire to have shared_ptr be noninvasive it’s strictly worse than most other shared ownership smart pointers I’ve used.
  - mabster 2 years ago
    
    Yep, we've always had our own implementation of std::shared_ptr<> for this reason.
    Either the reference is elsewhere (and now you have to dereference another area of memory occasionally which is the worst case for cache performance), or its alongside your object. If its alongside your object it's better to know it's there for padding, etc.
    And it's easy to forget to allocate for the alongside case, so you can have hidden poor performance.
quuxplusone 2 years ago

I have a blog post on the topic here: https://quuxplusone.github.io/blog/2022/08/26/vector-pessimi...
The TLDR is: Using `move_if_noexcept` instead of plain old `move` can help you provide the "strong exception guarantee." For what _that_ is, see cppreference: https://en.cppreference.com/w/cpp/language/exceptions#Except...
and/or the paper by Dave Abrahams that introduced the term, "Exception-Safety in Generic Components: Lessons Learned from Specifying Exception-Safety for the C++ Standard Library." https://www.boost.org/community/exception_safety.html
> Is it something like prefer no throw copy constructor over “throwing” move?
Almost. If move won't throw (or if copy isn't possible), we'll move. But given a choice between a throwing move and any kind of copy, we'll prefer copy, because copy is non-destructive of the original data: if something goes wrong, we can roll back to the original data. If the original data's been moved-from, we can't.
mgaunard 2 years ago

UB is a feature; people who keep on fighting it are such a pain.
Regarding your question, nothrow operations are essential to maintaining invariants. And maintaining invariants is how you make code correct in a world where UB exists.
- Joker_vD 2 years ago
  
  Yes, if the programmer maintains certain invariants, the C's flavour of UB allows the compiler to take advantage of those invariants for performance gains, by omitting run-time checking for those invariants.
  The problem with this flavour of UB being a programmer's promise "this is fine, trust me, no run-time checks needed" to the compiler is that a) it's made by the programmer by omitting said run-time checks ― and that often happens accidentally, not intentionally; b) the compilers are really bad at pointing out to the programmer places where they took advantage of such promises, which really complicates the task of writing conforming programs. Every time I add two int's, I promise to the compiler that an overflow won't happen: and of course, the moment an UB happens, all invariants cease to hold, so trying to find the initial bug where you've accidentally broke one of invariants turns into a nightmare.
  - mgaunard 2 years ago
    
    A program's correctness goes way beyond memory safety, and is entirely at the mercy of thr programmer doing a good job.
    This is true regardless of whether the language has undefined behaviour or not.

SleepyMyroslav 2 years ago

This is another nice reminder to people who build systems where there is only one language implementation. Implementations always change.

beyondCritics 2 years ago

Don't assume! The basic mantra of software development.

Settings

Mysterious Moving Pointers

Keyboard Shortcuts