NaN, the not-a-number number that isn't NaN
piccalil.li77 points by tobr 7 days ago
77 points by tobr 7 days ago
> That’s also the reason NaN !== NaN. If NaN behaved like a number and had a value equal to itself, well, you could accidentally do math with it: NaN / NaN would result in 1, and that would mean that a calculation containing a NaN result could ultimately result in an incorrect number rather than an easily-spotted “hey, something went wrong in here” NaN flag.
While I'm not really against the concept of NaN not equaling itself, this reasoning makes no sense. Even if the standard was "NaN == NaN evaluates to true" there would be no reason why NaN/Nan should necessarily evaluate to 1.
I definitely support NaN not being equal to NaN in boolean logic.
If you have x = "not a number", you don't want 1 + x == 2 + x to be true. There would be a lot of potential for false equivalencies if you said NaN == NaN is true.
--
It could be interesting if there was some kind of complex NaN number / NaN math. Like if x is NaN but 1x / 2x resulted in 0.5 maybe you could do some funny mixed type math. To be clear I don't think it would be good, but interesting to play with maybe.
Maybe the result of NaN === NaN should be neither true nor false but NaB (not a bool).
It should throw a compile-time error. Anything like this which allows an invalid or unmeaningful operation to evaluate at compile-time is rife for carrying uncaught errors at run-time.
NaN is, by definition, not equal to NaN because they’re not comparable, it does have a definitive Boolean representation - false is correct
The concept of NaN long predates the language that uses ===, and is part of a language-agnostic standard that doesn't consider other data types. Any language choosing to treat the equality (regardless of the operator symbol) of NaN differently would be deviating from the spec.
In R, NA (which is almost, but not quite like NaN) actually has separate types for each result, so you can have NA_boolean, NA_integer etc. Its super confusing.
It is a minor nuisance, but I think there's ultimately a pretty good reason for it.
Old-school base R is less type-sensitive and more "do what I mean", but that leads to slowness and bugs. Now we have the tidyverse, which among many other things provides a new generation of much faster functions with vectorized C implementations under the hood, but this requires them to be more rigid and type-sensitive.
When I want to stick a NA into one of these, I often have to give it the right type of NA, or it'll default to NA_boolean and I'll get type errors.
> When I want to stick a NA into one of these, I often have to give it the right type of NA, or it'll default to NA_boolean and I'll get type errors.
Yeah, I know. I hit this when I was building S4 classes, which are similarly type-strict.
Again, I think this was the right decision (pandas's decision was definitely not), but it was pretty confusing the first time.
I think a better reasoning is that NaN does not have a single binary representation but in software, one may not be able to distinguish them.
An f32-NaN has 22 bits that can have any value, originally intended to encode error information or other user data. Also, there are two kinds of NaNs: queit NaN (qNaN) and signalling NaNs (sNaN) which behave differently when used in calculations (sNaNs may throw exceptions).
Without looking at the bits, all you can see is NaN, so it makes sense to not equal them in general. Otherwise, some NaN === NaN and some NaN !== NaN, which would be even more confusing.
I don't think that logic quite holds up because when you have two NaNs that do have the same bit representation, a conforming implementation still has to report them as not equal. So an implementation of `==` that handles NaN still ends up poking around in the bits and doing some extra logic. It's not just "are the bit patterns the same?"
(I believe this is also true for non-NaN floating point values. I'm not sure but off the top of my head, I think `==` needs to ignore the difference between positive and negative zero.)
Something like:
// Optimize special case if (x == y) return 1; else return x/y;
Because NaN is defined as a number and two equal numbers divided by themselves equal 1
> two equal numbers divided by themselves equal 1
That's not true. For example: 0 == 0, but 0/0 != 1.
(See also +Infinity, -Infinity, and -0.)
If you're going to nitpick this comment, you should note that infinity isn't on the number line and infinity != infinity, and dividing by zero is undefined
We're commenting on an article about IEEE 754 floating point values. Following the IEEE 754 standard, we have:
>> isNaN(Infinity)
false
>> Infinity == Infinity
true
>> Infinity / Infinity == 1
false
>> isNaN(0)
false
>> 0 == 0
true
>> 0 / 0 == 1
false
Also, you say NaN ("not a number") is "defined as a number" but Infinity is not. I would think every IEEE 754 value is either "a number" or "not a number". But apparently you believe NaN is both and Infinity is neither?And you say 0 / 0 is "undefined" but the standard requires it to be NaN, which you say is "defined".
It doesn't really matter if NaN is technically a number or not. I find the standard "NaN == NaN is true" to be potentially reasonable (though I do prefer the standard "NaN == Nan is false"). Regardless of what you choose NaN/NaN = 1 is entirely unacceptable.
The D language default initializes floating point values to NaN. AFAIK, D is the only language that does that.
The rationale is that if the programmer forgets to initialize a float, and it defaults to 0.0, he may never realize that the result of his calculation is in error. But with NaN initialization, the result will be NaN and he'll know to look at the inputs to see what was not initialized.
It causes some spirited discussion now and then.
In the same spirit, the `char` type default initializes to 0xFF, which is an invalid Unicode value.
It's the same idea for pointers, which default initialize to null.
> That’s also the reason NaN !== NaN. If NaN behaved like a number and had a value equal to itself, well, you could accidentally do math with it: NaN / NaN would result in 1,
So, by that logic, if 0 behaved like a number and had a value equal to itself, well, you could accidentally do math with it: 0 / 0 would result in 1...
But as it turns out, 0 behaves like a number, has a value equal to itself, you can do math with it, and 0/0 results in NaN.
Try subtraction. But also, not all calculations are purely using mathematical operations. You might calculate two numbers from two different code paths and compare them.
Shouldn't an operator on incompatible types return undefined? ;)
Equality on things that it doesn't make sense to compare returning false seems wrong to me. That operation isn't defined to begin with.
By shipping with undefined, JavaScript could have been there only language whose type system makes sense... alas!
My understanding is that the reasoning behind all this is:
- In 1985 there were a ton of different hardware floating-point implementations with incompatible instructions, making it a nightmare to write floating-point code once that worked on multiple machines
- To address the compatibility problem, IEEE came up with a hardware standard that could do error handling using only CPU registers (no software, since it's a hardware standard) - With that design constraint, they (reasonably imo) chose to handle errors by making them "poisonous" - once you have a NaN, all operations on it fail, including equality, so the error state propagates rather than potentially accidentally "un-erroring" if you do another operation, leading you into undefined behavior territory
- The standard solved the problem when hardware manufacturers adopted it
- The upstream consequence on software is that if your programming language does anything other than these exact floating-point semantics, the cost is losing hardware acceleration, which makes your floating-point operations way slower
This reminds me of an interesting approach a student had to detecting NaNs for an assignment. The task was to count no-data values (-999) in a file. Pandas (Python library) has its own NaN type, and when used in a boolean expression, will return NaN instead of true or false. So the student changed -999 to NaN on import with Pandas and had a loop, checking each value against itself with an if statement. If the value was NaN the if statement would throw an exception (what could poor if do with NaN?) which the student caught, and in the catch incremented the NaN count.
NaN is just an encoding for "undefined operation".
As specified by the standard since its beginning, there are 2 methods for handling undefined operations:
1. Generate a dedicated exception.
2. Return the special value NaN.
The default is to return NaN because this means less work for the programmer, who does not have to write an exception handler, and also because on older CPUs it was expensive to add enough hardware to ensure that exceptions could be handled without slowing down all programs, regardless whether they generated exceptions or not. On modern CPUs with speculative execution this is not really a problem, because they must be able to discard any executed instruction anyway, while running at full speed. Therefore enabling additional reasons for discarding the previously executed instructions, e.g. because of exceptional conditions, just reuses the speculative execution mechanism.
Whoever does not want to handle NaNs must enable the exception for undefined operations and handle that. In that case no NaNs will ever be generated. Enabling this exception may be needed in any case when one sees unexpected NaNs, for debugging the program.
This is a matter of choice, not something with an objectively correct answer. Every possible answer has trade offs. I think consistency with the underlying standard defining NaN probably has better tradeoffs in general, and more specific answers can always be built on top of that.
That said, I don’t think undefined in JS has the colloquial meaning you’re using here. The tradeoffs would be potentially much more confusing and error prone for that reason alone.
It might be more “correct” (logically; standard aside) to throw, as others suggest. But that would have considerable ergonomic tradeoffs that might make code implementing simple math incredibly hard to understand in practice.
A language with better error handling ergonomics overall might fare better though.
>A language with better error handling ergonomics overall might fare better though.
So what always trips me up about JavaScript is that if you make a mistake, it silently propagates nonsense through the program. There's no way to configure it to even warn you about it. (There's "use strict", and there should be "use stricter!")
And this aspect of the language is somehow considered sacred, load-bearing infrastructure that may never be altered. (Even though, with "use strict" we already demonstrated that have a mechanism for fixing things without breaking them!)
I think the existence of TS might unfortunately be an unhelpful influence on JS's soundness, because now there's even less pressure to fix it than there was before.
To some extent you’ve answered this yourself: TypeScript (and/or linting) is the way to be warned about this. Aside from the points in sibling comment (also correct), adding these kinds of runtime checks would have performance implications that I don’t think could be taken lightly. But it’s not really necessary: static analysis tools designed for this are already great, you just have to use them!
> And this aspect of the language is somehow considered sacred, load-bearing infrastructure that may never be altered. (Even though, with "use strict" we already demonstrated that have a mechanism for fixing things without breaking them!)
There are many things we could do which wouldn't break the web but which we choose not to do because they would be costly to implement/maintain and would expand the attack surface of JS engines.
> Shouldn't an operator on incompatible types return undefined? ;)
NaN is a value of the Number type; I think there are some problems with deciding that Number is not compatible with Number for equality.
We just need another value in the boolean type called NaB, and then NaN == NaN can return NaB.
To complement this, also if/then/else should get a new branch called otherwise that is taken when the if clause evaluates to NaB.
JavaScript has also TypeError which would be more appropriate here. unfortunately undefined has never been used well and it's caused much more pain than it has brought interesting use cases
It should return false, right? They are different types of thing, so they can’t be the same thing.
Or, maybe we could say that our variables just represent some ideal things, and if the ideal things they represent are equal, it is reasonable to call the variables equal. 1.0d0, 1.0, 1, and maybe “1” could be equal.
"return undefined" is incoherent in almost every language, and IEEE754 predates JavaScript by a decade.
>Shouldn't an operator on incompatible types return undefined? ;)
Please no, js devs rely too much on boolean collapse for that. Undefined would pass as falsy in many places, causing hard to debug issues.
Besides, conceptually speaking if two things are too different to be compared, doesn’t that tell you that they’re very unequal?
Interesting. So, having a comparison between incomparable types result in false -- what we have now -- is functionally equivalent, in an if-statement, to having the undefined evaluate to false... with the difference that the type coercion is currently one level lower (inside the == operator itself).
It kind of sounds like we need more type coercion because we already have too much type coercion!
I'm not sure what an ergonomic solution would look like though.
Lately I'm more in favour of "makes sense but is a little awkward to read and write" (but becomes effortless once you internalize it because it actually makes sense) over "convenient but not really designed so falls apart once you leave the happy path, and requires you to memorize a long list of exceptions and gotchas."
NaNs aren't always equal to each other in their bit representation either, most of the bits are kept as a "payload" which is not defined in the spec it can be anything. I believe the payload is actually used in V8 to encode more information in NaNs (NaN-boxing).
Also remember that NaN is represented in multiple ways bitwise:
https://en.wikipedia.org/wiki/NaN
Also you even have different kinds of NaN (signalling vs quiet)
Equality is a very slippery mathematical relationship. This observation formed the genesis of modern Category Theory [0].
NaN is an error monad.
[0] https://www.ams.org/journals/tran/1945-058-00/S0002-9947-194...
In the error monad NaN = NaN (or Nothing = Nothing or None = None, depending on your terminology) because mathematical equality is an equivalence relation. There are many foundational debates about equality, but whether or not it is an equivalence is never the question.
The root of the problem, completely overlooked by OP is that IEEE 754 comparison is not an equivalence relation. It's a partial equivalence relation (PER). It does have its utility, but these things can be weird and they are definitely not interchangeable with actual equivalence relations. Actual, sane, comparison of floating points got standardized eventually, but probably too late https://en.wikipedia.org/wiki/IEEE_754#Total-ordering_predic.... It's actually kinda nuts that the partial relation is the one that you get by default (no, your sorting function on float arrays does not sort it).
NaN comes from parsing results or Infinity occurring in operations. I personally ends up more to use Number.isFinite(), which will be false on both occurrences when I need a real (haha) numeric answer.
I'm sometimes wondering if a floating point format really needs to have inf, -inf and nan, or if a single "non finite" value capturing all of those would be sufficient
Not at all sufficient. NaN typically means that something has gone wrong—e.g. your precision requirements exceed that of the floating point representation you've selected, you've done a nonsensical operation. inf and -inf might be perfectly acceptable results depending on your application and needs.
console.log(new Array(16).join("wat"-1) + " Batman!")
Opened Web Inspector in Safari and pasted the above. (I knew what to expect but did not know how it would work … me trying to figure out what subtracting 1 from a string (ASCII?) would give you. But very related to this post.)
Indeed, this shtick was funnier 13 years ago. https://www.destroyallsoftware.com/talks/wat
JavaScript is a quirky, badly-designed language and I think that is common knowledge at this point.
A similar issue occurs in SQL, where NULL != NULL. [0] In both bases, our typical "equals" abstraction has become too leaky, and we're left trying to grapple with managing different kinds of "equality" at the same time.
Consider the difference between:
1. "Box A contains a cursed object that the human mind cannot comprehend without being driven to madness. Does Box B also contain one? ... Yes."
2. "Is the cursed object in Box A the same as the one in Box B? ... It... uh..." <screaming begins>
Note that this is not the same as stuff like "1"==1.0, because we're not mixing types here. Both operands are the same type, our problem is determining their "value", and how we encode uncertainty or a lack of knowledge.
SQL more elegantly introduces ternary logic in this case, where any comparison with NULL is itself NULL. This is sadly not possible in most languages where a comparison operator must always return a (non-nullable) boolean value.
Another use for NaN's is, suppose you have an array of sensors. Given enough sensors, you're pretty much guaranteed that some of the sensors will have failed. So the array needs to continue to work, even if degraded.
A failed sensor can indicate this by submitting a NaN reading. Then, and subsequent operations on the array data will indicate which results depended on the failed sensor, as the result will be NaN. Just defaulting to zero on failure will hide the fact that it failed and the end results will not be obviously wrong.
My gut reaction is that both NaN == NaN and NaN != NaN should be false, it to put it another way, NaN != NaN returns True was a surprise to me.
Does Numpy do the same? That’s where I usually meet NaN.
For the built-in float type in Python, the behaviour is a bit funny:
>>> nan=float('nan')
>>> a=[nan]
>>> nan in a
True
>>> any(nan==x for x in a)
False
(Because the `in` operator assumes that identity implies equality...)Well no, the in operator is just defined to produce results equivalent equivalent to any(nan is x or nan==x for x in a); it is counterintuitive to the extent people assume that identity implies equality, but the operator doesn't assume that identity implies equality, it is defined as returning True if either is satisfied. [0]
Well, more precisely, this is how the operator behaves for most built in collections; other types can define how it behaves for them by implementing a __contains__() method with the desired semantics.
[0] https://docs.python.org/3/reference/expressions.html#members...
In a perfect world, in my opinion is that they are incompatibles, and the equality operation should return False in both cases.
But equality is a very complicated concept.
I guess if the values are incomparable != preserves PEM
But Boolean algebra is a lattice, with two+ binary operators were two of those are meet and join with a shared absorbing property.
X == not not X being PEM, we lose that in NP vs co-NP and in vector clocks etc…
But that is just my view.
Yes, in numpy we also have that `np.float64(nan) != np.float64(nan)` evaluates to true.
tl;dr:
- NaN is a floating point number, and NaN != NaN by definition in the IEEE 754-2019 floating point number standard, regardless of the programming language, there's nothing JavaScript-specific here.
- In JS Number.isNaN(v) returns true for NaN and anything that's not a number. And in JS, s * n and n * s return NaN for any non empty string s and any number n ("" * n returns 0). (EDIT: WRONG, sée below)
> And in JS, s * n and n * s return NaN for any non empty string s and any number n ("" * n returns 0).
No? It is easy to verify that `"3" * 4` evaluates to 12. The full answer is that * converts its operands into primitives (with a hint of being number), and any string that can be parsed as a number converts to that number. Otherwise it converts to NaN.
I always thought of NaN as more of the concept of not-a-number the way that infinity in math is not a specific value but the concept of some unbounded largest possible value.
Therefore, trying to do math with either (for example: NaN/NaN or inf./inf.) was to try to pin them down to something tangible and no longer conceptual — therefore disallowed.
You can use some form of real extensions, e.g. the extended real line (+inf, -inf is often useful for programmers) or the projectively extended real line (+inf = -inf).
This is not about infinity in math not being a _specific_ value, it can certainly be (the actual infinite instead of potential).
It's simply about design and foresight, in my humble opinion.
random thought: To see if something equals NaN,can't you just check for the stringified form of the number equaling "NaN"?
after all, the usual WTF lists for JS usually have a stringified NaN somewhere as part of the fun.
It can usually be implemented like this. No need for strings.
fun isNan(n) = n != n
well there is also one weird quirk I I assumed will be also included in this article:
because a <= b is defined as !(a > b)
then:
5 < NaN // false
5 == NaN // false
5 <= NaN // true
Edit: my bad, this does not work with NaN, but you can try `0 <= null`
IEEE 754 specifically prohibits that definition, and JavaScript indeed evaluates `5 <= NaN` to false.
Yep, my memory was incorrect here and I didn't had access to computer, but it is true with `0 <= null`
This is because null coerces to 0 in JS so this is effectively 0 <= 0. NaN is already a `number` so no coercion happens.
Note that == has special rules, so 0 == null does NOT coerce to 0 == 0. If using == null, it only equals undefined and itself.
Tangentially related: one of my favourite things about JavaScript is that it has so many different ways for the computer to “say no” (in the sense of “computer says no”): false, null, undefined, NaN, boolean coercion of 0/“”, throwing errors, ...
While it’s common to see groaning about double-equal vs triple-equal comparison and eye-rolling directed at absurdly large tables like in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guid... but I think it’s genuinely great that we have the ability to distinguish between concepts like “explicitly not present” and “absent”.
Slightly off topic, I hate that typescript bundles NaN under the number type and the signature for parseInt is number.
See also: Tom7's "NaN gates and Flip FLOPS": https://www.youtube.com/watch?v=5TFDG-y-EHs
Imagine that society calls the people who have to work with these toys during office hours, engineers
What's wrong with that? No engineering is 100% strict; there is always ambiguity at the edges
> type(NaN) -> "number"
NaN should have been NaVN, not a valid number.
What's the difference between something that's "not a number" and something that's "a number but not a valid one"?
I'm now remembering the differences between "games" and "numbers" in Surreal numbers. :D