Learn about SQL three-valued logic before it bites you
twitter.comWhen you filter out a value using !=, it's easy to forget that NULL won't come along for the ride since it's the database's way of saying 'value unknown'. To ensure NULL values don't slip through the net, you've got to explicitly fish for them with IS NULL or broaden your WHERE clause to catch everything with OR Val IS NULL. It’s SQL’s subtle reminder that NULL requires special attention.
I don't find this behaviour crazy or unexpected at all.
NULL doesn't refer to any specific 'missing value' state, instead it represents a larger group of possible values of which we don't know the actual value.
So in a pile of values 'foo' and 'bar', NULL could mean either of them, so there is no way of knowing if a NULL row is foo or if it is bar. We can't even know if two NULLS are equal.
This reads like you have worked with SQL a lot and has a skewed perspective. As a newcomer to SQL, this could be very surprising given that this diverges from every language that Im aware of having nulls.
no resultsSELECT * FROM Foobar WHERE Val = NULL;I'm admittedly still learning SQL, but I'm very surprised by this.
The query should be:
SELECT * FROM Foobar WHERE Val IS NULL;Which is why these comparisons should be errors, rather than producing gibberish results. Trinary logic stuffed into Boolean operations is just psychotic
Yeah, I agree. The focus here should be the principle of least surprise so your database works correctly. Throw an error to ensure that unintuitive edge cases are handled explicitly. This is SQL after all. It’s used for business purposes, not to demonstrate elegant type systems built around relational algebra.
Is NULL = 'crazy', True?
If no one would expect NULL != 'crazy' to be True.
That's just intuitive, Null is not the string 'crazy'.
Everyone is surprised by this when they realize it.
The crazy thing is that this person was using DBs for 15 years. Have they only ever used != with non-null columns before? Hard to imagine.
How can it be unexpected behaviour when it is the behaviour you want!
You must not qualify a boolean question for a sgbd like Schroedinger's cat.
A boolean condition in Sql has 3 values : - true - false - unknown ( aka NULL)
For every "unknown" value, the sgbd will NOT decide for you if it's true or false, you have to decide, because it trust you to know what you're doing.
> For every "unknown" value, the sgbd will NOT decide for you if it's true or false
But it did decide. It could either return the row, or not return the row - there's no middle ground. By not returning the row, it implied that null is not not 'crazy'.
If it actually wanted not to decide, it could return an error.
Rather than make you deal with those pesky errors, we could suppress the errors and only process the records we know how to decide.
Win-win?
The definition and use of markers for missing values belongs in the application, not the database. My 0,02€.
It is called Structured Query Language, it allows for complex logic that can be processed close to the data to speed up operations and reduce complexity and increase reliability of client applications. Sure, if you require Google scale of connectivity, it might not work but 99.9999% of people do not and their applications would be simpler, cleaner, more performant and more reliable if they learned and used their database correctly instead of re-implementing all that functionality (but badly) in their application layer.
Agree. Either that or then it's clear that NULL and empty have a different meaning, which in turn means the query should respect that.
While I’m sure NULLs nary be found in the ivory tower, you should know they are pervasive in the rest of the kingdom amongst us common folk.
The comment you're replying to acknowledges this and suggests we plan accordingly.
Even from a “planning” perspective, that comment is about as unconstructive as they come.
How does it suggest we actually go about modeling missing values in the database?
I can think of a dozen ways to go about it and each one has its own tradeoffs and consequences.
When it comes down to it, is actually it worth it? Just so you don’t have to consider NULL scenarios for columns that are marked as nullable?
It seems like a silly overcorrection to me.