Ask HN: Do You Use Enum for Yes, No, Unset?
I've seen different arguments on this topic and thought I'd ask HN if you all have a standard for this.
Say you have a form that's filled out and saved to a DB. One required field is a dropdown with Yes and No. But the field begins as blank to show that a value hasn't been selected yet, ensuring that the User pays attention to the field.
Do you use a Nullable Bit (DB) and Nullable Bool (Code)? OR an Enum: Yes, No, Unset?
You wouldn't use a boolean to represent states like say New, Processing, Done. But do you consider NULL a separate "state"?
Thanks The "ternary boolean" strikes me as a smell because it has to do with how you initialize your code/state and enforce pre- and post-conditions. If you allow users to complete the form without choosing T/F on your field, what happens? If you fail or prevent the submission, then I would not write the schema to capture the value as a nullable field. If you do accept the form without the user selecting T/F, then you are defaulting to either T/F and the field should likewise not be nullable. If the value is really nullable and you are using the presence of null to decide anything, then it seems reasonable to enumerate the state space explicitly with an enum so that the next dev riding by on a horse doesn't mistake your ternary boolean for a simple null. You can make null work in any of those cases, it just seems like a headache and potential hazard, but could also just be personal aesthetics on my part. You need a third state to know that user has not made a choice. If you don't want to make a choice for the user, for instance, you don't want to default to F or T, then third state tells exactly that user has to make a choice. Agreed. Sorry if I wasn't clear, but I consider that to fall into the "value is really nullable and you are using the presence of null to decide anything..." scenario, in which case, this is important to the domain, so explicitly model it so the next dev doesn't trip over the special meaning of null in this case. Again, personal judgment and aesthetics, probably. If it isn’t selected, the browser literally won’t send anything for that field (undefined in js parlance). Your code/language of choice has to make a decision on what to do with that state. If you specify true/false, the page will reload with that state instead of “undefined.” For some languages it is exactly as you describe, for lower level languages (as in closer to the raw HTTP protocol) like PHP, you have to make an explicit choice before rendering the response. I would say using null as 3rd state is only ok iff it's part of the domain logic and the 3rd state is literally no choice was made, i.e. nothing, i.e. null. I've seen some terrible implementations in Java abuse Boolean as a tri state value: true, false, null. However, this always caused bugs because you'd end up with terrible code patterns. For one, what does "unset" mean? Does it equal false, because it wasn't enabled? What's the default? I think "unset" is a state to be generally avoided because it's bound to create problems for you down the line. Personally, I prefer enums for anything that's not an actual boolean. A boolean should, in my opinion, always be a yes/no variable. Other states should probably be more specific. I've seen people use booleans and other types to represent limited options (i.e. animal.IsCat to determine if an animal is a cat or a dog. I'd go so far as to create enums that will only have two or even just one option if expansion is expected soon enough, with an optional state encoded in the type system as well. If your database can't store optional/maybe values, I'd add an enum variable. Of course I'd make an exception if you're terribly resource constrained. If you need to save every bit, document the hell out of it and optimize your code any way you want. > For one, what does "unset" mean? Does it equal false, because it wasn't enabled? What's the default? This is a common strawman used to advise against null. Null only means one thing. Null. It doesn't convey any further meaning. It is a "not set" or "not initialised" state. It isn't a default, it isn't false, it isn't not enabled. If you have a user interface tied to this variable. Null means not yet set. This is entirely logical. The only people getting their knickers in a bunch are those that don't want to admit there is uncertainty and context required to determine what null means in your application. My advice: don't apply further meaning to null other than null or not set. I think people are conceptually fine with null. What they’re annoyed with is the inability to know if null is a possible value for a variable in many programming languages, e.g. Java. I understand this but just basic code hygiene of null checking at boundaries and edges makes this a non issue. You shouldn't be setting nullable variables to null so far away from where they're used that they're a problem. E.g. if you're passing around an object with a possibly null value attached you should instead set it to a default empty state, empty string, zero, false etc. Its only in rare circumstances that you should be needing to worry about null values. I will however concede that non nullable types are super useful. I just don't think languages should remove null entirely. E.g. some rust projects have so many options types null would have been easier to code around. I know this is down to inexperience but at that point you've lost me - inexperience with null types is the only thing that makes null types a big problem. Indeed. Optional values are literally the same as null, but you know that it _can_ be null. There are two common "nulls" in this situation, the "unset" one and the "unknown" one. Unset would imply no one has ever set the value, unknown means someone has set the value to unknown which is sometimes an important distinction. Javascript this is easy undefined can represent unset, and null can represent unknown. This is very useful for updating the state as well, undefined means do not change the value while null means update the value to unknown. I have used code (enums) for yes/no/unknown that allow null value for unset as well in relational db's. If I had a choice I prefer JavaScripts data model of a boolean that can be undefined or null as well. A nullable boolean in general has been better for a tristate than enums, but like I said many times I need a quadstate. I don't have general advice, but you do have 3 states to represent there; not acted upon by user, yes, and no. That's what enums are. I think "we'll pass in a pointer to the actual data and if it's unset the pointer won't point to a valid memory location" is a weird pattern that the industry should probably stop doing. So no *bool type here. Database nulls are maybe reasonable, but mean many things that an explicit enum doesn't. "This column was added after the fact and we didn't want to set a default value", "this column is optional", etc. The more ambiguous the meaning, the more mistakes that can be made. (I think someone is going to say, "don't write the form to the database until it's valid", but you have to have some way to save work in progress between website visits or "my Internet died", right? If you just dump a JSON blob in local storage, you still have the same representation problem; the field has 3 possible values, and you can't just pretend like it only has two because there is a built-in type that has two possible values.) I work in a slightly different context but have dealt with a similar problem. I build Unity3D apps and I write a lot UI for the editor that affects how things are serialized. I've stopped serializing enums due to being burned too many times. When people: -Add entries not at the bottom
-Rename entries
-Remove entries
-Rearrange entries So instead I made a kind of data-driven enum that you create through the editor. Each entry (called a key) in the "enum" is backed by a guid. You can associate a name with each key. When you want to save info like on/off/unset. You create an enum. Then you declare a key in your data model. The UI will then show that key as a drop-down with the mapped names as the choices. But in the end, the associated guid is what is recorded. Not only does this largely solve the, add, rename, re-arrange problems, it solves some other persistent issues as well. I've gotten designers to pickup this tool so they stop using ints and string to signal events. This way, they define their signal once as a data-driven enum and then they see it as a choice throughout the app rather than as something they have to type in matching. It also causes engineers to write components which are more configurable for designers. Instead of checking a literal token like: if (currentEvent.appState == MyAppStateEnum.Start)
doSomething() They declare the Key and check its value:
public Key appPhaseToDoSomething; if (currentEvent.appState == appPhaseToDoSomething)
doSomething() Since appPhaseToDoSomething is exposed as a dropdown in the editor, it means a designer can change the phase when something might happen without an engineer. And engineers basically have to write in this modular way because there's no token to check. It also means that there is no way to guarantee that you have covered all possible values for the enum, whereas if an enum value was removed in your older example, you would get a compile error. Covered in what sense? As in tested? I can easily open the data driven enum amd push all values through some switching code if desired. To alert the user about removed values I have code which checks key values against the definition and errors on deleted values. Using tokenized enum values is convenient for the programmer but again I want to discourage it as it often blocks the non-coding designers on my team when they need to extend a component to handle more app states. Compiler token checking is nice but IMO data driven behavior is nicer. We dont avoid the use of strings and ints or any other kind of data because they don't have a compile time tokenized representation. Do you load this entire table into memory or do a db/cache lookup every time? I load all the names at once. If this ever became a problem I'd segment them with bloom filters or something like that. Note that in my applications the names are not known at runtime because comparing the guids is sufficient and these drop downs are not shown to the user but only the designer. Were I employing this system in a more distributed fashion I might reconsider packing the names with guids understanding of them to be frozen after build. I'm also considering a method to allow an engineer to limit the scope of key so that the designer is presented with a subset and not aall of the key options when setting the value. I'd probably avoid using NULL in DB to convey a borderline-value for the simple reason that NULL has weird semantics in SQL and it can be pretty confusing. If you're not using NULLs in DB, it's weird to translate an enum into a nullable Boolean. In the end, there is very little downside to using enums in this case. It's by far the least confusing option. Yes/no/unspecified is a good match for something like `Option<bool>` or `bool?`. Their usage covers every case and nothing needs to be elucidated. With that said, in a language that supports null and has no compile-time mechanism to detect potential null dereferencing (like modern C# for instance) I might use an enum. JS/TS dev here. I tend to stick with a nullable boolean. For me, "no value" is usually pretty equivalent with "null" in general, and the data type I choose to use depends only on the non-null possibilities. Yes, I'd have an enum with Yes, No and Unset. Many languages have a typed "no value" value that is composable with other types: Maybe<T>, Option<T>, Nullable<T>. In other situations I might have an Option<Yes, No>, i.e. when I want a non-null value to indicate that it isn't Yes or No. The reason why I don't want a null, in general, is that it's the billion dollar mistake: https://hackernoon.com/null-the-billion-dollar-mistake-8t5z3... But if the user can actually choose "Unset", then that's a choice, too. "Unset" might, for example, have implications that other null-ish answers don't have in the future. As for SQL: I might go for a NULL for performance, knowing that if another neither-yes-or-no option with different implications came, I might have to migrate the NULLs. When the final record should have either Yes or No, but the there is an intermediate state where the answer could be missing, you want something like a Maybe<Bool> which you can map to a simple Bool as part of validation. Ideally you'd use a functor-style parameter for this, for example in Haskell: In the DB you would need to store incomplete records in a separate table, since they have different validation rules. Queries against the main table should be able to assume that all the records are complete. I use the convention that an 'null' value is a value that was not explicitly set by the user. So in C#, I would always use a nullable boolean in this case (choices yes and no). If the field is a required value, I would annotate the field as such. When using the field in an ORM, the required annotation would lead to a not-null field in the database. When using the field in a dto, the required annotation could be used for validation of incoming input. If the user should have the choice "yes", "no", "unknown", I would use a nullable enum to express this. An 'null' value means that the value was not explicitly set. The enum value "unknown" indicates that the user explicitly chose the "unknown" value. So, in both cases the 'null' is not a separate state, it simply indicates that no explicit value was given to the field. TypeScript is nice because its null type is explicit. So if a type claims to be bool, it is really bool (either true or false, nothing else). For a nullable variable of type T (including boolean), the type must be explicit about it: But those (a null, or a None, respectively) would be only choices if internally, at the code level, the Unset case must be handled as some kind of extraordinary scenario. If it was possible for users to make a conscious decision to leave it unset, i.e. if Unset is part of the valid choices offered by the user-facing API, then I'd encode it as a proper possible state, in an enum. And it's terrible because there's both `null` and `undefined` which have largely the same meaning but aren't equal. Depending on what you're interacting with, you may need to use one over the other or coerce. Especially common with developers from other languages who don't even know of the gotcha I've never found a use for null where undefined wouldn't be simpler to use. Especially with the ?? Operator to set defaults along the way This is a very curious and fun way to put it: https://stackoverflow.com/a/57249968 In essence, `undefined` is non-existence. And `null` is existence with a non- (or "invalid", or "unknown") value. Although using them like that depends on the needs of the application. If "invalid" or "unknown" was an expected user choice, I would still rather encode it as an enum better than a nullable boolean, as mentioned in my comment above. This is slightly different and a funny, but you reminded me of https://thedailywtf.com/articles/What_Is_Truth_0x3f_ Ive used UNSET in enums before, to describe things for a one-big-object state in workflows before the relevant step has been run. I don't think I've used it to describe true or false though. Possibly as a third option, but the names for the options end up well described, not just true and false. Looking back, I'd much prefer to make lots of types, describing the object along the way. In typescript it would be real easy to describe too, just &-ing the steps' results to be main object along the way For SQL there is usually an optimization where nullable columns are simply a bit set; same with boolean/bit columns. This does sound like a good use of a nullable bit. You may want to distinguish things further in the future such as adding a new column and you can determine if it was unset/undefined by the user versus null because the user never saw it. In that case an enum makes sense. Personally I think the logic maps directly to nullable bit. In some cases I encode the enum in the field name like `isThisOrElseThat BOOL` if you have clear opposites it often doesn't make sense to do so, but if it's non clear opposite either A or B choices or nullable and NULL isn't equal to false it can be helpful. It also is annoying to use. So if the DB supports a reasonable overhead enumeration that's a good choice, too (but that's not always the case, e.g. sqlite). In the code, it's considered something of a smell to use a bool at all. Look up "Boolean blindness". Basically instead of a bool field saying whether the person wants the deluxe upgrade, you'd have an enum whose values are Regular | Deluxe. How you would represent that in a db would depend on how your implementation works. Neither? If the user responses are in a table of (field_id, value_selected, ...) then you just use an outer join vs the available field_id for the form and get back NULLs automatically for the unpopulated selections. That way if you (say) add new selections to forms that are partially complete, reasonable things happen without any further logic. Why record data to say you don't have data? The Yes No Unset enum seems intriguing to me -- in Rust term it is an Option<bool>. Usually this mean I have to unwrap_or but depending on the context it can be true or false In C# I would definitely use the nullable "bool?" then you can use null coalescing operator to default to either true or false: "value ?? false" I use integer surrogate keys everywhere. So 1 yes, 2 no, 3 unset, 4 yes but no every other Friday, 5 use this option for customer x, 6 no with notes, 7 zebra, 8... I'm pretty sure I've worked on this codebase... With some 4.2 billion integers to choose from, you can pretty much put everything in one enum. Genius! I would just use NULL You could be using Objective-C with NSBoolean, so you have @YES, @NO, and nil :D In JS/TS, I use `undefined` for unset, then `true` and `false`.
This way you can define your record type once and handle all the missing fields in a uniform way, and functions can easily indicate whether they expect a partial record full of Maybes or a complete record with no potentially missing fields. You can use the same type definition for other things, too, by substituting different functors in place of Maybe or Identity. For example, `Record ToString` where `ToString a` is a newtype over `(a → String)` could be a record of functions describing how to render each field as a string. data Record f = Record { …, someField ∷ f Bool, … }
type PartialRecord = Record Maybe
type CompleteRecord = Record Identity
getCompletedRecord ∷ PartialRecord → Maybe CompleteRecord
getCompletedRecord (Record { …, someField, … }) =
Record <$> … <*> (Identity <$> someField) <*> …
Another nice alternative is an Option type, like Rust has. I think it would be something like Option<bool>. let choice: T | null;