Beware of Fast-Math

simonbyrne.github.io

317 points by blobcode 3 days ago


orlp - 3 days ago

I helped design an API for "algebraic operations" in Rust: <https://github.com/rust-lang/rust/issues/136469>, which are coming along nicely.

These operations are

1. Localized, not a function-wide or program-wide flag.

2. Completely safe, -ffast-math includes assumptions such that there are no NaNs, and violating that is undefined behavior.

So what do these algebraic operations do? Well, one by itself doesn't do much of anything compared to a regular operation. But a sequence of them is allowed to be transformed using optimizations which are algebraically justified, as-if all operations are done using real arithmetic.

smcameron - 2 days ago

One thing I did not see mentioned in the article, or in these comments (according to ctrl-f anyway) is the use of feenableexcept()[1] to track down the source of NaNs in your code.

    feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
will cause your code to get a SIGFPE whenever a NaN crawls out from under a rock. Of course it doesn't work with fast-math enabled, but if you're unknowingly getting NaNs without fast-math enabled, you obviously need to fix those before even trying fast-math, and they can be hard to find, and feenableexcept() makes finding them a lot easier.

[1] https://linux.die.net/man/3/feenableexcept

emn13 - 3 days ago

I get the feeling that the real problem here are the IEEE specs themselves. They include a huge bunch of restrictions that each individually aren't relevant to something like 99.9% of floating point code, and probably even in aggregate not a single one is relevant to a large majority of code segments out in the wild. That doesn't mean they're not important - but some of these features should have been locally opt-in, not opt out. And at the very least, standards need to evolve to support hardware realities of today.

Not being able to auto-vectorize seems like a pretty critical bug given hardware trends that have been going on for decades now; on the other hand sacrificing platform-independent determinism isn't a trivial cost to pay either.

I'm not familiar with the details of OpenCL and CUDA on this front - do they have some way to guarrantee a specific order-of-operations such that code always has a predictable result on all platforms and nevertheless parallelizes well on a GPU?

Sharlin - 3 days ago

> -funsafe-math-optimizations

What's wrong with fun, safe math optimizations?!

(:

teleforce - 2 days ago

“Nothing brings fear to my heart more than a floating point number.” - Gerald Jay Sussman

Is there any IEEE standards committee working on FP alternative for examples Unum and Posit [1],[2].

[1] Unum & Posit:

https://posithub.org/about

[2] The End of Error:

https://www.oreilly.com/library/view/the-end-of/978148223986...

storus - 2 days ago

This problem is happening even on Apple MPS with PyTorch in deep learning, where fast math is used by default in many operations, leading to a garbage output. I hit it recently while training an autoregressive image generation model. Here is a discussion by folks that hit it as well:

https://github.com/pytorch/pytorch/issues/84936

leephillips - 2 days ago

This part was fascinating:

“The problem is how FTZ actually implemented on most hardware: it is not set per-instruction, but instead controlled by the floating point environment: more specifically, it is controlled by the floating point control register, which on most systems is set at the thread level: enabling FTZ will affect all other operations in the same thread.

“GCC with -funsafe-math-optimizations enables FTZ (and its close relation, denormals-are-zero, or DAZ), even when building shared libraries. That means simply loading a shared library can change the results in completely unrelated code, which is a fun debugging experience.”

Sophira - 3 days ago

Previously discussed at https://news.ycombinator.com/item?id=29201473 (which the article itself links to at the end).

cycomanic - 2 days ago

I think this article overstates the importance of the problems even for scientific software. In the scientific code I've written, noise processes are often orders of magnitude larger than what what is discussed here and I believe this applies to many (most?) simulations modelling the real world (i.e. Physics chemistry,..). At the same time enabling fast-math has often yielded a very significant (>10%) performance boost.

I particularly find the discussion of - fassociative-math because I assume that most writers of some code that translates a mathetical formula to into simulations will not know which would be the most accurate order of operations and will simply codify their derivation of the equation to be simulated (which could have operations in any order). So if this switch changes your results it probably means that you should have a long hard look at the equations you're simulating and which ordering will give you the most correct results.

That said I appreciate that the considerations might be quite different for libraries and in particular simulations for mathematics.

chuckadams - 2 days ago

I haven't worked with C in nearly 20 years and even I remember warnings against -ffast-math. It really ought not to exist: it's just a super-flag for things like -funsafe-math-optizations, and the latter makes it really clear that it's, well, unsafe (or maybe it's actually funsafe!)

datameta - 2 days ago

Luckily outside of mission critical systems, like in demoscene coding, I can happily use "44/7" as a 2pi approximation (my beloved)

zinekeller - 3 days ago

(2021)

Previous discussion: Beware of fast-math (Nov 12, 2021, https://news.ycombinator.com/item?id=29201473)

quotemstr - 2 days ago

All I want for Christmas is a programming language that uses dependant typing to make floating point precision part of the type system. Catastrophic cancellation should be a compiler error if you assign the output to a float with better ulps than you get with worst case operands.

Affric - 3 days ago

For non-associativity what is the best way to order operations? Is there an optimal order for precision whereby more similar values are added/multiplied first?

EDIT: I am now reading Goldberg 1991

Double edit: Kahan Summation formula. Goldberg is always worth going back to.

hyghjiyhu - 2 days ago

One thing I wonder is what happens if you have an inline function in a header that is compiled with fast math by one translation unit and without in another.

- 2 days ago
[deleted]
cbarrick - 2 days ago

This page consistently crashes on Vivaldi for Android.

Vivaldi 7.4.3691.52

Android 15; ASUS_AI2302 Build/AQ3A.240812.002

boulos - 2 days ago

I've also come around to --ffast-math considered harmful. It's useful though to help find optimization opportunities, but in the modern (AVX2+) world, I think the risks outweigh the benefits.

I'm surprised by the take that FTZ is worse than reassociation. FTZ being environmental rather than per instruction is certainly unfortunate, but that's true of rounding modes generally in x86. And I would argue that most programs are unprepared to handle subnormals anyway.

By contrast, reassociation definitely allows more optimization, but it also prohibits you from specifying the order precisely:

> Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result.

I haven't followed standards work in forever, but I imagine that the introduction of std::fma, gets people most of the benefit. That combined with something akin to volatile (if it actually worked) would probably be good enough for most people. Known, numerically sensitive code paths would be carefully written, while the rest of the code base can effectively be "meh, don't care".

JKCalhoun - 2 days ago

> Even compiler developers can't agree.

> This is perhaps the single most frequent cause of fast-math-related StackOverflow questions and GitHub bug reports

The second line above should settle the first.

dirtyhippiefree - 2 days ago

I’m stunned by the following admission: “If fast-math was to give always the correct results, it wouldn’t be fast-math”

If it’s not always correct, whoever chooses to use it chooses to allow error…

Sounds worse than worthless to me.

eqvinox - 3 days ago

I wish the Twitter links in this article weren't broken.

razighter777 - 2 days ago

The worst thing that strikes fear into me is seeing floating points used for real world currency. Dear god. So many things can go wrong. I always use unsigned integers counting number of cents. And if I gotta handle multiple currencies, then I'll use or make a wrapper class.

KingLancelot - 2 days ago

[dead]

hipgoat - 2 days ago

[flagged]

sholladay - 2 days ago

Correctness > performance, almost always. It’s easier to notice that you need more performance than to notice that you need more correctness. Though performance outliers can definitely be a hidden problem that will bite you.

Make it work. Make it right. Make it fast.

mg794613 - 2 days ago

Haha, the neverending cycle.

Stop trying. Let their story unfold. Let the pain commence.

Wait 30 years and see them being frustrated trying to tell the next generation.

rlpb - 3 days ago

> I mean, the whole point of fast-math is trading off speed with correctness. If fast-math was to give always the correct results, it wouldn’t be fast-math, it would be the standard way of doing math.

A similar warning applies to -O3. If an optimization in -O3 were to reliably always give better results, it wouldn't be in -O3; it'd be in -O2. So blindly compiling with -O3 also doesn't seem like a great idea.

bsenftner - 2 days ago

[flagged]