Weird compiler bug – Same code, different results
blog.zaita.comIf you read the article you find that this is a bug in MinGW64 libraries.
It's not a compiler bug, it's not a problem in C/C++, or even in IEEE-754 floating point math. The MinGW64 thread library simply forgot to initialize the FPU correctly.
He'd have had the same problem if he wrote his code in assembly language and then ran it in a MinGW64 thread.
Yes this is technically true, but it manifests itself through use of the MinGW compiler. When I first noticed the problem, there was very little indication as to the cause.
TBH I accredit dumb luck more than anything to finding the cause of this. It took many hours.
> This means that floating point arithmetic is non-associative. In that A + B != B + A.
The equation is the commutative property, not associative. IIRC addition in IEEE-754 is commutative.
The property that fails is (A+B)+C = A+(B+C).
Op Here. Yes you are correct. Been very sleep derived when I wrote this (newborns eh). I'll update this.
I didn't want to go too far into the background of floating point math in the post as it wasn't ultimately the issue, just something I was mindful of.
I think you made the wrong correction in the article:
> Update: Thanks to gus_massa and wiml @ HackerNews for pointing out I had used associative instead of commutative.... This means that floating point arithmetic is noncommutative [i]n that A + B != B + A.
Floating-point arithmetic is commutative, but not associative. So for floating-point numbers,
A + B = B + A always, but
A + (B + C) != (A + B) + C in many cases.
The problem comes about from significant digits and rounding - A + B might round up to 1, but B + C might round down to zero, etc. The point remains that the order in which you do things actually does matter in floating-point arithmetic, but it's the order you compute the "inner parentheses," not the actual order of numbers in (e.g.) the FPU registers.
Thank you for your reply. I'm going to just lift your explanation as it's much better than my own.
I would have never thought that the output of some floating point operations depend on the „reset of the floating point package“. What gives?
The processor itself supports configuring some aspects of floating point behavior; on x86 this is the MXCSR register. The most important one is rounding mode: you can choose round-to-nearest, round-down, round-up, or round-towards-zero. Less common options include whether to raise an exception on various out-of-range conditions, and whether to treat denormal numbers as zero (faster but less accurate and not IEEE 754 compliant).
The C standard library has functions to do this configuration; search for fesetenv. _fpreset is not a standard function but, at least in Wine's implementation [1], just resets the relevant configuration registers to an initial state.
[1] https://github.com/wine-mirror/wine/blob/e909986e6ea5ecd49b2...
Very informative! It even sets SSE2.
At a minimum, when there is a switch between threads the floating point state of the old thread needs to be saved and the floating point state of the incoming thread loaded. This includes such things as rounding mode.
If the thread package was not setting up the "saved" state for a new thread before switching to it then that state could be not what was desired. The best thing to do would probably be to copy the current FP state of the parent at the moment the thread is created.
On a long series of calculations changing the rounding mode could well be enough to cause the difference in results the poster saw.
I would tend to the opinion that redoing the calculation with different rounding modes is in fact a not bad way to figure out how reliable the results are in the first place.
(the rounding mode is not the only thing affected by not saving and restoring the FP state correctly)
Things like rounding mode and denornal behavior could be controlled by CPU flags. Imy not familiar with x86, but this is defiy the case in PPC.
Yes, that makes sense, thank you!
Some kind of internal state specific to Microsoft's libm? It doesn't look like _fpreset() is present anywhere else.
Though in general, IEEE-754 math does have a small amount of global state, for stuff like rounding modes and subnormals. Perhaps the spawned thread's fenv was unexpected.
I'm not super convinced by this blog post anyway, since it immediately confuses associativity with commutativity.
Op here. Yea this only occurs on Windows with MinGW. Visual C++ and Clang do not have this issue. None of the Linux compilers I tested also had this issue.
And yea you're right on the associative mistake. Will correct this :)
I currently build Windows binaries via cross-compilation on Linux using gcc-mingw-w64; I assume that is affected by the bug?
Most likely. You can check with the pastebin code https://pastebin.com/thTapSgn . Local and Thread outputs should be the same.
TBH. I had no idea this was a thing either. I was incredibly confused when I identified a single piece of code that was producing an variation in result.
Probably the author is accidentally getting 80-bit "extended double" arithmetic.
That should be clear from looking at the assembly, shouldn't it? Don't think the floating point environment can change that but I could be wrong.
Floating-point rounding modes seem more likely to me. The author should be able to dump the floating point configuration to confirm, I'm sure.
The same x86 instructions are used for arithmetic on float/double/extended types; the difference is a precision setting in the x87 control word.
Well in most cases I've looked at generated assembly (not that often), the xmm registers are used even for scalar operations, which I thought was the default option for gcc on x86-64, but I suppose it might differ on different systems (or perhaps 32-bit mode was used for some reason).
Right, if you're using xmm regs you're getting double or single precision. Sometimes the x87 regs get used and that's when "accidentally computed with extra precision and then double-rounded" comes up.
>"Compiler bug? But how… Looking at the code, the only things that could create the bug were the operators and pow() call. These are part of the C++ standard offering so it’s highly unlikely that either of these would have an error. What next? Off to GodBolt to try some other compilers to see if I get wonky results on them. MSVC++… nope, Clang… nope, GCC-Trunk on Linux… nope. I am only getting this weird behaviour in TDM-GCC. Time to try some other MingW64 compilers. Nuwen… Yes, Mingw64.. yes, TDM-GCC… yes.
Confirmed: It’s a bug in MinGW64.
When I create a new thread and run floating point operations in that thread, I get slightly different answers."
Interesting!
Do not use C/C++ for numerical code where accuracy is needed. These languages are not specified to conform to IEEE 754 and you are absolutely asking for trouble.
My copy of C11 says
Annex F (normative) IEC 60559 floating-point arithmetic F.1 Introduction This annex specifies C language support for the IEC 60559 floating-point standard [...] previously designated ANSI/IEEE 754-1985.> Support for Annex F (IEEE-754 / IEC 559) of C99/C11
> The Clang compiler does not support IEC 559 math functionality. Clang also does not control and honor the definition of __STDC_IEC_559__ macro. Under specific options such as -Ofast and -ffast-math, the compiler will enable a range of optimizations that provide faster mathematical operations that may not conform to the IEEE-754 specifications. The macro __STDC_IEC_559__ value may be defined but ignored when these faster optimizations are enabled.
If you make use of Clang, you won't find support for Annex F, and in point of fact you can't even macro check to see if it even will support Annex F.
The story with GCC is... More complicated. It guarantees it'll follow Annex F for only some operations [0].
So the upshot is... Most people can't tell when and how Annex F might be followed.
[0] https://gcc.gnu.org/onlinedocs/gcc-7.4.0/gcc/Floating-point-...
C "supports" whatever the hardware and compiler feel like supporting. It absolutely does not mandate anything. In particular, there are a number of particularly simple compiler optimizations that are not forbidden, though they are not technically correct according to IEEE 754, such as algebraic reassociation and simple commutativity. Moreover, C allows subexpressions to be computed in higher precision (e.g. 80 bit "long double"), which is observable. That last one is primarily due to the x87 FPU coprocessor design that has given us a good 35 years of headaches. Good riddance to that!
Keep in mind that C++ doesn't exist. The compilers all have ways of enforcing compliance.
Intel C++ seems to be pretty naughty in that it effectively has fast-math on by default.
In practice, all of the C compilers will (or can be made to) conform to IEEE 754, although #pragma STDC FENV_ACCESS ON support is very spotty (and consequently non-default rounding-mode support), although Clang/LLVM has been working on this for the past several months.
(That reminds me, I need to harangue the llvm-libc folks to add that pragma to their fpenv code for correctness sake).
The clang docs explicitly say that they don't support Annex F at all. Is this story finally going to change? So that you have at least partial support?
No, they consistently perform optimizations that violate IEEE 754 rules, such as reassociation and commutation of expressions. These are difficult to observe (unless you are in the habit of observing -0 vs 0), but yes, you can observe them. They are very precisely specified by IEEE 754 and C/C++ compilers perform illegal optimizations.
They do those optimizations when you enable fast-math flags. Some compilers (e.g., icc) enable those by default. When fast-math is not enabled, then the IEEE 754-violating optimizations are disabled.
Source: I've been working on such optimizations this past week. And that means establishing precisely which flags I need to have enabled in order for optimization to kick in.
This is very true. Once upon a time -o3 would enable fast-math. We're very careful to ensure we use test models to verify the outputs of release binaries. Enabling fast-math as you have said optimises the equations in a way that produces different results.
Op here. There is a huge amount of scientific code written in C++. Just have a look through https://www.coin-or.org/
This code was specifically C++ because we had the intention of integrating with libraries like CppAD and ADOL-C.
So, what should be used?
The correct - but useless - answer is: any language that's IEC 60559:xxxx conformant.
Examples are: Fortran, R, C#, Java, D and many others.
You'd need to refer to a language's specification to find out, though some aren't very well defined. zig comes to mind; it supports the data types and offers strict mode, but doesn't explicitly state its IEE 754:2008 compliance.
Last I checked C# allowed the compiler to perform the computation with higher precision than what the code requires an then narrow down the the expected precision whenever it wants (you can trigger a narrowing by explicitly casting from a floating point type to itself). Which makes writing correct floating point code practically impossible.
COBOL
Seriously? Anything but floating point if you care about precision and accuracy.