Tests aren’t enough: Case study after adding type hints to urllib3
sethmlarson.devI love static typing/type hints if for only 1 thing - code maintenance.
Even code I wrote six months ago.
Not having to dig through 6 functions deep to try to figure out whether "person" is a string, or an object, and if it's an object what attributes it has on it etc. is huge. And not to mention that some clever people decide - hey, if you pass a string I'll look up the person object - so you can pass an object or a string - which makes all sorts of convoluted code paths when someone else was looking at "person" and only saw one type so now their function doesn't work on both types etc.
I hate having to waste time figuring out the type of every variable and hold it in my head every single time I read a piece of code.
The main argument for dynamic typing is speed in prototyping but I find that's opposite for me. I'm much more comfortable rapid prototyping and ripping stuff apart when I have a strongly static typed environment telling me what I just broke.
Doing radical refactoring often involves just making those changes and then fixing all the IDE or compiler errors until it runs again.
Hm, growing somewhat experienced, I find myself adapating an old quote more and more: Sufficiently advanced static typing is indistinguishable from dynamic typing.
Now, I know, it's not true. It's entirely possible to build weird things in python that are provably impossible to typecheck statically. But modern language servers and their type inference capabilities in rust, terraform, or even straight up python are very impressive.
> Sufficiently advanced static typing is indistinguishable from dynamic typing.
Static typing type checks are compile time, dynamic typing doesn't. I don't see how these two could be indistinguishable, in one you can't run the program with type errors, in the other you can.
This is true if you're focused on the compiler but if you include the adjacent question of what is reported to you in your editor while working it's a bit blurrier. If I write some Rust code and pass a string into something which expects an integer, I get a hard error preventing compilation. If I do the same thing in Python, however, and there's a prominent error displayed in my editor before I even run the code, how different is that from the perspective of anyone who isn't watching my screen? In either case the bug was caught before the code even ran and I likely have type-aware autocompletion to reduce the odds of making that mistake in the first place.
That's not to say that there aren't quite reasonable questions about how effective the different approaches are or how easy it is to fix the error, how complete the checks are, how deep the checks go, etc. A lot of how you feel about that is going to be subjective based on the languages you use and the quality of the code you work with — Rust has an advanced type system and great developer ergonomics providing unusually helpful error messages, Python has weaker typing but also a culture about simplicity which discourages some classes of bugs, Java has a lot of mushy-typed code where people got tired of language / compiler drawbacks and came up with ways to improve ergonomics at the expense of defeating the type checker, etc.
Both your examples are the same static typing. One just has better IDE integration than the other.
In the first case, there would be a hard compiler error — rustc would refuse to compile the code until I fixed it.
In the second case, Python would allow the code to run but would potentially produce a runtime TypeError when it reached that point depending on exactly the code does. It might also run fine (e.g. I'm just passing that variable to json.dump()) or produce unexpected output (e.g. I'm passing that code to print() and that worked for int, and str, but then someone called it with None and I didn't want "None" in the output).
The point was that while those differ in how they're implemented, the experience can be fairly similar when you're in the middle of the code-test cycle. My example wasn't the most complicated dynamic typing scenario but it's an example of why this works pretty well: most Python code isn't highly dynamic or dynamic everywhere — typically there are a few places which might be challenging for analysis but there's also a LOT of code which only ever works with a single input type. If your IDE provides feedback on all of that code, you're going to avoid a fair number of other bugs and free up time for the hard parts.
I believe the author was talking about the act of writing the software. Modern type inference means that you can mostly code without needing to write down the types in many cases. This line of code is the same in JavaScript or C#:
In function definitions, where you are definitely going to need to provide parameter types it's extremely common to document those types in a docblock in a dynamically typed language. At least I always did. So making the types part of the definition is not a significant difference while writing the code.var instance = new SomeClass();c# even has target-typed expressions which might be useful aswell, i.e. instead of var instance = you write SomeClass instance = new(); this might be preferable so that you have all types on the left.
Please write a type for the following function:
There is no mainstream typed language which can write a fully general type for the vararg compose function. TypeScript is probably the one that comes closest, but last I checked it still was unable to write a sufficiently powerful array type. You can write a type for a version of compose with a fixed number of arguments, but not for one working over an arbitrary number of arguments.def compose(start, *args): def helper(x): for func in reversed(args): x = func(x) return start(x) return helperVararg functions also have limited use. Especially considering that most of the time, your args will all have the same type, and therefore could just be passed in an array or similar. The one mainstream exception I know of is print functions, and we have* ways to statically check those.
Your toy example, even generalised, has no practical use. If I can write this:
Then I can write that instead (Haskell):compose(f, f1, f2, f3)
Or this (F#):f . f3 . f2 . f1
And now we’ve reduced the problem to a simple function composition, which is very easy to define (Ocaml):f1 |- f2 |- f3 |- f
This generalises to any fold where the programmer would provide the list statically (as they always would for a vararg function): instead of trying to type the whole thing, just define & type the underlying binary operation.let (|-) f g = fun x -> g (f x) (|-): (’a -> ’b) -> (’b -> ’c) -> (’a -> ’c)This still technically reduces the generality of the given function since you are specifying that each function cannot have multiple overloads.
let f be a overload set matching the signatures {a -> b, i -> j} let g be a overload set matching the signatures {b -> c, j -> k}
compose(g, f) could be given a to return c or i to return k
> This still technically reduces the generality of the given function
My point was that we are almost never hurt by that reduction.
> you are specifying that each function cannot have multiple overloads
Haskell has type classes, and if we restrict ourselves to local type inference it's fairly easy to have C++ style overloads without even that. So no, I'm not specifying such a thing.
> Vararg functions also have limited use.
This is only because people are using statically typed language that place arbitrary restrictions on such functions and make them harder to use. In dynamically typed languages, vararg functions are widely used and enable patterns that are pretty nice.
Perhaps. But then I want to know what those patterns are, what are their actual benefits compared to not using them, and most of all I want to know if such benefits outweigh the significant costs that comes with the lack of static analysis¹.
[1] The need to test much more, the need for a better, more accurate documentation, the higher cost of refactoring, even the higher prototyping times (I prototype faster with a REPL that has static typing, because I don't to debug type errors).
> I prototype faster with a REPL that has static typing, because I don't to debug type errors
On the other hand, because Common Lisp has resumable exceptions and on-the-fly redefinition of just about everything, I prototype significantly faster in CL because I can just let the debugger stay open until I fix the issue and then hit “continue”.
I no longer believe the “static faster for development than dynamic” thesis because I think a lot depends on how the programmer thinks about programming and which tools are available.
Yeah, I've heard about image based programming, that enables editing your program as you run it. It scares the shit out of me.
See, if I have an unexpected error, that's because I fucked up my program. And because of that, my runtime state is likely screwed as well. So not only do I have to correct my error, I have to correct its consequences before I restart the program. I can't just resume its execution and hope for the best, I need to know that whatever state I keep is not rotten.
On the other hand, that way of doing things is not exclusive to dynamic languages. There's thing things called "dynamically loaded libraries", that you can use even in C. Game programmers routinely recompile & reload specific dlls just so they can correct their mistakes without restarting the whole game. And those who have written in-game editors have a very powerful stop-debug-restart cycle. On top of a statically typed language.
C++ is an extremely mainstream language that can write a fully general version of compose with variable arguments.
https://godbolt.org/z/h7n8Y7qf1
Like sure, you can't write out a type for the entire overload set. Overload sets don't have types, but functions do. However, I don't think you'd ever actually want to write out the type of the compose function. Instead, I think it would be more reasonable to request that every intermediate function call is type-checked with fully specified types. In C++ this is the case.
Great implementation of what the parent comment asked. Not only that, but the compiler managed remove all the abstractions.
(very minor nitpick: I'd pick `auto&& x` over `auto x`)
> There is no mainstream typed language which can write a fully general type for the vararg compose function.
True. A more interesting question might be what level of static safety and performance benefits you'd be willing to sacrifice to be able to write functions like this.
Personally, I don't find the kind of code I can't fit into static types particularly appealing, but I find the code navigation, error checking, and optimizations of static types to be priceless.
As I said, there are infinite possible programs that cannot be type checked statically. This is derived from the halting theorem trivially - lambda(p) = if p.halts() then A() else B().
However, the intersection of programs I encounter in practice with the number of programs that can be statically checked is rather large.
You are probably thinking of Gödel's incompleteness theorems. I do not think that you can trivially derive it from the halting problem.
He just did derive it from the halting problem. In words: It's impossible to statically check the type of a function that takes a program P as its input, and returns the integer 1 as its output if P halts, and returns the string "1" if it does not halt. It would require solving the halting problem, which is undecidable.
The theorems predate Turing's theorem. As for the halting problem, it is solvable with an oracle, unlike the incompleteness theorems.
How does that follow? The input type would be vague, e.g. ByteArray or Program or something.
It's the return type that isn't decidable. You can't statically check, in the arbitrary case, whether the function returns an int or a string.
No, but that isn't required for a static type system. Most languages would just unify to the top type.
Yes, but it is not required that a type system has a singular top type. It is entirely valid for a type system to have any number of types which are not in any sub- or supertype relationship.
And once there are two types T1 and T2 which are neither subtypes, nor supertypes of each other, and 2 expressions A1 and A2 of types T1 and T2, and a statically undecideable expression p, then statically typing "if p then A else B" is a problem.
And yes, I agree: Most if not all practical type systems will not accept an if-else statement if they cannot unify both branches of the conditional into a single type. Which makes sense. Because you have to act on the result of the expression, and then it needs to have some common type.
But on a purely theoretical basis, it is entirely possible and valid to have an undecideable type system. Which, btw, happens for a lot of languages: https://3fx.ch/typing-is-hard.html . There are C++ programs which are provably impossible to type at compile time.
Could you elaborate on that? It sounds like I may have an overly simplified understanding of the topic here—wouldn't be the first time.
Static type systems don't imply fully dependent types. In Kotlin:
This program will type check just fine. The inferred return type will be Program because that's the nearest shared ancestor type of both possible return types. Good luck implementing the halts() function of course, but that's not the type system's problem.sealed class Program class HaltingProgram : Program() class InfiniteProgram : Program() fun checkIsHalting(p: ByteArray) = if (halts(p)) HaltingProgram() else InfiniteProgram()
This is really a fairly trivial exercise in Haskell, provided you pass the arguments as a heterogeneous list—which is semantically equivalent to a variable argument list. Here is my implementation: https://gist.github.com/nybble41/c459c6927a3bad8ec350d227193...
Here I defined a simple `Pipeline` GADT for the argument list, which is just a list of functions with some extra type constraints to ensure that they can be composed. You could do the same thing with a more general type like HList but the type signature for the `compose` function would be much more verbose since you would need to define the relationships between each pair of adjacent function types through explicit constraints involving type families, whereas the `Pipeline` type handles that internally.
Perhaps you don't consider Haskell "mainstream" enough?
I took a stab at it, there's not enough information to figure out anything more specific:
from typing import Callable, Any def compose(start: Callable[[Any], Any], *args: Callable[[Any], Any] -> Callable[[Any], Any]: def helper(x: Any) -> Any: for func in reversed(args): x = func(x) return start(x) return helperSure, that's probably as close as you can get, but ideally it would be possible to write a type which guarantees the input functions are compatible as well as knowing what the type is of the returned function.
I think you can get close implementing it as a macro in typed racket, expanding the type out based on how many arguments you give it. But then it's not a first class function until you expand it.
Found this implementation which also provides pre-expanded forms that are first class functions for specific lengths of arguments docs: https://docs.racket-lang.org/typed-compose/index.html implementation: https://git.marvid.fr/scolobb/typed-compose/src/branch/maste...
I think you can in rust as long as args is a slice. Rust doesn't have varargs except for c interop. A slice or Vec of function pointers is the idiomatic way to do the same thing.
Something like:
fn compose<X, T>(start: Box<Fn(T) -> X>, args: Vec<Box<Fn(T) -> T>>) -> Fn(T) -> X { move |x: X| { let mut x = x; for func in args.iter(). reversed() { x = func(x); } start(x) } }This only works if all of the functions return the same type. However, you can write a compose macro which operates as expected.
I don't understand your argument. Could you please explain it?
Type systems simply have matured a lot.
It's not too long ago that you either had very clumsy type systems - C, Java. These type systems were more of a chore than anything else. Especially the generic transition in java was just tedious, you had to type cast a lot of stuff, and the compiler would still yell at you, and things would still crash.
Or you had very powerful and advanced type systems - Haskell and C++ with templates for example. However, these type systems were just impenetrable. C++ template errors before clang error messages are something. They are certainly an error message. But fixing those without a close delta what happened? Pfsh. Nah.
In those days, dynamic typing was great. You could shed the chore of stupid types, and avoid the really arcane work of making really strong types work.
However, type systems have matured. Today, you can slap a few type annotations on a python function and a modern type inference engine can give you type prediction, accurate tab-completion and errro detection. In something like rust, you define a couple of types in important locations and everything else is inferred.
This in turn gives you the benefit of both: You care about types in a few key locations, but everything else is as simple as a dynamically typed language. And that's when statically typed languages can end up looking almost - or entirely - like a dynamically typed language. Except with less error potential.
> a modern type inference engine can give you type prediction, accurate tab-completion and errro detection (emphasis mine)
"errros" are my nemesis in languages which automatically create a new symbol with every typo!
I would argue that the really big benefit of dynamic typing is that it enables a really nice interactive interpreter shell experience. I think it's also important from a prototyping standpoint that Python's static typing model does a lot of inference -- you don't have to add an explicit type annotation on every single variable.
This is possible with statically typed languages with Haskell being one of the examples where it's encouraged to use the REPL to work towards a solution and/or qucikly test ideas without having to write unit tests or full program tests. Ocaml and friends fall within this category too and none require extensive type annotation due to type inference.
I feel that too much focus is on static languages like C/C++ where types become a chore and judging it on that rather than looking at the plenty of languages with type inference brought by ML-style languages.
> languages like C/C++ where types become a chore
It's been a long while since I used those languages, but I remember the chore part wasn't so much of typing Int or String and more so having to care if it's an Int, Short, or Long or if the float is single or double precision. I believe that those micro-optimizations are no longer popular, but manually thinking low-level is not something I enjoy.
> I would argue that the really big benefit of dynamic typing is that it enables a really nice interactive interpreter shell experience.
I use the Python REPL quite often, and have non trivial experience with Lua’s. But the best experience I’ve ever got was with OCaml: I type the expression, or function definition, without giving type annotations, and I get the type of the result in the response.
You wouldn’t believe the number of bugs I caught just by looking at the type. Before I even start testing the function. And that’s when I don’t have an outright type error, which a dynamic language wouldn’t have caught — not before I start testing anyway.
When I'm prototyping I tend to go inside out in a layered fashion - some days I am really feeling the data layer - other days I like to work closer to the fringes. To this end type hinting serves as a quick and dirty code contract before all my pieces are in place. I can splat out a bunch of low level definitions that I know I'm going to need and then come back the next day to add in struts - remembering my choices easily as I go.
I know this isn't the approach of choice for most folks but hey - I'm working with ADHD so I've got to make some allowances for some neurodiversity.
It depends on the language but I find I'm also far more productive with strong types.
When I use a dynamic language I get no errors in dev, I need to run/invoke the program to see if it works. It may appear to work fine as I haven't executed a specific code path hence dynamic languages have extremely high test coverage. With dynamic languages I am delaying my feedback loop, I may get some visual output quicker but that doesn't mean my program is correct.
With a strongly typed language and utilizing types you use the compiler to guide you. The compiler says hey, this isn't correct, fix it, you go fix the error and recompile and repeat.
I've used Elm before and it's the only time I had a complex Javascript UI just compile and work first time. It's like a wow, did that just happen.
With Typescript it's not quite to the level of Elm but find my experience working with React etc far more productive. Typescript says hey, that's wrong, I expect ... you gave ..., you work through the errors and when it runs generally there's less silly mistakes than when I just use Javascript.
I'm learning Rust, the compiler error messages have greatly helped. When you compile it says hey, you tried to do ..., maybe you want ... instead. Not to sure what the suggestion is I try it and 9 times out of 10 it works, compiles, program runs.
With types you generally get better IDE auto complete support etc.
Now i'm using Python for my day job. My experience has been painful, discovering what arguments functions take, passing in wrong values, needing to run slow test suites, finding errors at runtime. Yes you can use type hints and I do but I find them far less reliable.
I guess I'm not a very good programmer so learnt to lean on a compiler to do the hard work for me, and if you have good type support you can lean on types more to get the compiler to help you more.
In Haskell I can write complex logic by writing out the types and ADT's. I've written whole programs with tests to verify the logic without writing a program. I find this incredible efficient for prototyping ideas, just write the types, the functions signitures etc etc. Once that is done you implement the functions, hit compile then boom, your shocked it just worked first time running.
The biggest reason why i like type hints is because it force me to reflect on the datatype i want to use before implementing my code.
Last week, i could've done either a dataframe, a list of list, a list of tuple, a dict of tuples, a dict of lists (this was a bad idea that did not survive more than 2s in my head) or a list of dict. I started coding with a dataframe in mind (i guess i wanted to show off my numpy/pandas skills to my devops colleagues), but adding type hints to my prototypes shut down the idea pretty quick: lot of complexity for nothing.
> when I have a strongly static typed environment telling me what I just broke.
Yes, I'm a total scatterbrain. Types let me remind myself later that I did in fact forget what I'm doing and what I did. It lets past-me protect future-me.
I used to have loads of fun abusing Eclipse real time type checker, imagining software live with typed interfaces.
The IDE was my logical buddy, and every idea's possibility was rapidly shown with it. And I need to massage things a bit, I go faster because I know what's missing.
The only time I liked eclipse/java :)
Dynamic typing was great before I knew anything about programming. I'm talking like, at a middle school level. Fewer "Silly" errors.
After university, the opposite became true. No difficult to diagnose undefined behavior because of ambiguity in typing.
I agree with your point on prototyping. I've never been more productive than when I have the (Scala) compiler acting as a second set of eyes, essentially looking over my shoulder, checking my business logic.
I think the only reason dynamic typing can speed up prototyping is that it allows you to make certain type errors that you may never encounter at runtime while prototyping.
Agree. I usually think types first and quickly sketch the whole application without writing any code. So,when I start writing code it just works end-to-end.
the main argument for dynamic typing in particular in the context of object oriented programming is decoupling. It is always the receiving object's responsibility to handle whatever they get.
if you write dynamic OO languages with a static mentality in mind, i.e. you try to enforce some sort of global type expectation before the program runs, then obviously static languages are better, because you're trying to write static code.
Benefiting from dynamic languages means ditching that mindset altogether.
> I hate having to waste time figuring out the type of every variable and hold it in my head every single time I read a piece of code.
If a codebase doesn't have static types, it damn well better be set up to be highly grep-able. Including dependencies and frameworks.
This is why Rails pisses me off so much. No static types to help you out, and you can't grep (can barely google, even!) methods and properties that aren't defined anywhere until runtime. Is this from core? Is it from some 3rd party gem? Well fuck me, this file doesn't even tell me which gems it's relying on, so it could be literally anything in the entire goddamn dependency tree.
> ... grepable ...
This is so important.
It is also the reason why I like global variables. They are accused of making a spaghetti mess but ... in my experience the opposite is true.
Fancy patterns are way worse to reverse engineer than simple flat long functions accessing globals. Easy to debug too!
I agree with that. I despise all the DI things where I can't "goto" to the definition of the actual dependency that was injected, but only to the interface. So frustrating. It makes understanding what is going on so difficult for me.
That's some Rails stupidity there, not a dynamic language problem. Autoloading symbols by name is straight up dumb.
As for greppable though...then you may as well be using a static language. The point of a dynamic language is to be dynamic, ie you can do those things at runtime.
The point of a dynamic language is to be dynamic, ie you can do those things at runtime.
With Rails you have the option of pry-rails, and you can get a list of descendants of important parent classes like ActiveRecord with this: https://apidock.com/rails/Class/descendants
With the combination of vim, rspec, pry, fzf, and ripgrep, it's possible to become quite comfortable refactoring pure Ruby and Ruby+Rails code. But it does take some time to learn how to navigate the Rails runtime code generation magic. The more magic the code, the more you might have to use a debugger to break on method definition, but Ruby's dynamicism lets you do that.
On the topic of frameworks with a lot of magic, having used both Rails and Spring Boot (with Java and Kotlin), I'll take Rails any day. It was way easier to introspect Rails codegen magic with Pry, than Spring's codegen magic with IntelliJ. With Spring Boot, even with Kotlin, we had the burden of semi-manual typing, but lost a lot of the benefits because a lot of DB interaction and API payload handling was still only runtime checked.
This is absolutely how I feel. I've mentioned previously taking over a project, and just not knowing the type of anything took me months to overcome.
Also, type hints really help your IDE, even catching errors before you even run tests.
There's also a visual cue that you are doing something wrong: If a function returns 4 levels of Union[Tuple[List[int]], Optional[str]........ Then you are doing something too complex and the function should be broken up.
I learned the same thing on a project that was using Java 1 non-generics. Not exactly untyped to typed but an analogous experience. Everyone I asked said that it was too big to do. I started anyway by enabling the warnings for nongeneric use. I turned down the reporting limit to 1000 (I think) so as not to be discouraged. After months and months of incremental work alongside my main work, I got under the 1000 warnings. It got a bit trickier after that. In the end, there was exactly 1 bug, where an object.toString was being added to a dropdown box and we'd see it from time to time as Class@hexhash. What I learned then is that it isn't strictly about the bugs, it's the confident way you can navigate the codebase and understand and add in consistent ways. Now I add types to all my Ruby and it's seems normal again.
This is doubly true as experienced programmers argue that designing the data structures is the hardest part of coding. Code follows semi-automatically.
I'd add data flows as another level above data structures. It helps to think about how data flows into, through, and out of a system, then it's more clear how the data needs to be packaged, and from there, the code follows semi-automatically.
Tangentially related: I think it'd be cool if there was a development environment that combined a node-based dataflow editor with normal text editing, so pure plumbing could be implemented visually, but embedded within (and translated to) textual code.
> some clever people decide - hey, if you pass a string I'll look up the person object - so you can pass an object or a string - which makes all sorts of convoluted code paths
Do you have hints on how to avoid being one of those 10x clever programmers while programming a prototype? I find that I am most likely to write functions like that when there's some variables that I don't want to pass 5 layers down the call stack and then, in your example, would accept either a string (in which case those variables use their default values) or the Person object, where the variables are pulled from the Person's attributes.
I don't really, but I guess I could say that I have developed in statically typed languages and dynamically typed languages (professionally) for over a decade and I've always found that using the "power" of dynamic languages always ends up causing (me) more frustration in the long run- basically classes of bugs or time wasted that simply doesn't occur with statically typed languages. So for me, I tend to spend a little more time up front to try not to waste (my) time in the future.
> I find that I am most likely to write functions like that when there's some variables that I don't want to pass 5 layers down the call stack
I agree for a prototype, there are some tradeoffs to be made. However, very often prototypes can end up becoming production. Temporary decisions often become permanent ones. Just something to keep in mind.
I've been working in Clojure for the last few years, and what I learned is that the trick is to reverse the data dependencies, so that instead of your function asking: "What is a "person" and what attributes does it have if an object?". You have your function declaring: "I take a person as a map of keys :name and :age". And it is the caller who needs to ask itself: "What am I supposed to provide to this function?"
This is a very different mindset, but once you adopt this style, the lack of static types isn't as big an issue.
The reason you can do this in a dynamic language is that you can very easily adapt one structure to another, so its okay if not all your functions work directly on the same shared structures.
It also has the advantage that this style really favors making modular independent granular components that can be reused easily, because they aren't coupled to an application's shared domain structures, but to their own set of structures, creating a natural sub-domain.
There are other aspects to make this style work well, like keeping call-stacks shallow, and having a well defined domain model at the edge of your app with good querying capabilities for it.
Concretely it means say you need to add some feature X to the code, you might think, ok this existing function is one place where I could add the behavior, but for my new feature I need to have :age of "person", but I don't know if the "person" argument of this existing function would contain :age or not. Dammit, I wish I had static types to tell me.
Well, in this scenario, instead, what you do is that you don't add the behavior to that function. Instead, in my style you would have:
instead of:A -> B A -> C
That means if after B is the right place for your logic, you don't do:A -> B -> C
And hope that the "person" passed to B had the :age key which is needed by B'.A -> B -> B' -> CInstead you would do:
And when you implement B', you don't even care about "person", you can just say you need person-age, or that you need a Person object with key :age (which you don't care if it is the Person object shared in other places or not).A -> B A -> B' A -> CFinally, you modify A, where A was the function that creates the Person object in the first place, it has direct access to your actual database/payload and so finding whatever data you need is trivial in it.
I never understood this argument. In what kind of shop are you working that passing a string named person to a method expecting an object is tolerated. Or even passing different types that don't share a common interface.
This would never fly in a code review in any of the companies I've worked for.
I've seen essentially this code in so many organically grown codebases (when they grew up without types). It's usually close the the UI, because someone had to quickly add an alternate path to support some new user interaction
and yeah, we all know it's kinda messy, but also that logic has to live somewhere and we need this feature asap so it passes code review. I wrote a test for it, ship it.function find_user(person) { if user is string { query_by_name(person) } else { query_by_name(person.name) } }I came very close to writing almost this exact code just the other day (except it was username or user id for me), but came to my senses. It's just so tempting in a dynamic language...
In a static language, you either can't do it, have to really go out of your way to do it, or at least do function overloading (which is a bit cleaner)
Calling a function like this “ensureUser” is pretty idiomatic and useful, in lisp-style code bases. I think it’s a pattern related to “parse, don’t validate” in static-type lands: rather than _checking_ what the type is and throwing an error, you define a function that knows how to turn various representations of your type into its canonical shape.
Sounds like a brilliant case for multiple-dispatch.
Right so we have:
and also:function find_user(person: string)
how long before someone writes this:function find_user(person: Object)
meanwhile, someone else, not suspecting that they'll be handed a weird half-formed `User` object adds `person.id` somewhere in the body of the Object version of `find_user` and now we have a weird edge-case where very rarely `find_user` panics because the user object we're handed doesn't have an id??? Great, I just lost an hour trying to dig that out of the logs, and the users are starting to think of the product as flakey because the bug has been in prod for over a month before we finally believed them enough to look into it.find_user(person: { name: "dave" })Just. Use. Types. Multiple dispatch won't save you on its own. You NEED compile-time types.
Somebody downvoted you, I'm guessing because they think this is a silly example and have never actually seen something like this. I have, in a production code base.
Multiple dispatch and compile time times are not exclusive at all.
I'm saying the problem isn't solved by multiple dispatch alone, but it is solved by compile time types alone. You can use both together, of course.
The issue whether the language is interpreted or compildd (which would distinguish compile time types from strong types) is in my opinion completely orthogonal to the issue of how dispatch works. Strong types and multiple dispatch fix the issues I see even in an interpreted language.
This was probably just a silly example for a quick explanation.
But all it takes is a method that expects an integer Id to receive a string representation of said id because of some obscure path in code that notwithstanding your 100% line coverage the team is so proud of, was never exercised on tests because nobody can have 100% branch coverageIn C++ you're only ever one missing "explicit" from introducing such problems.
Suppose I call fire(bob). Programmers from other languages might reason that since fire is a function which takes a Person, bob must be a Person. Not in C++. In C++ the compiler is allowed to go, oh, bob is a string and I can see that there's a constructor for Person which takes a string as its only argument, therefore, I can just make a Person from this string bob and use that Person then throw it away.
To "fix" the inevitable cascade of misery caused by this "feature" C++ then introduces more syntax, an "explicit" keyword which means "Only use this when I actually ask you to" rather than as a sane person might, requiring an implicit keyword to flag any places you actually want this behaviour to just silently happen.
This way, hapless, lazy or short-sighted programmers cause the maximum amount of harm, very on-brand for C++. See also const.
If only there was a way to enforce these parameter types automatically
I personally love it, and wish every library worked this way. My argument is why go out of my way to make it not work, when it would be easy to make it work. This is because I think of modules/packages as user facing programs that are easy to tie together, instead of simple building blocks.
What I really wish existed was a built in way to cast and validate, or normalize and validate. I never care if something is a string. I care that if I wrap it in str(), or use it in a fstring, the result matches a regex. Or if I run a handful of functions one of them returns what I need.
The only benefit I can see of type hints on their own is it makes it easy to change a callable's signature, but I think that's best avoided to begin with.
> why go out of my way to make it not work, when it would be easy to make it work
The problem the DWIM approach to APIs is that when you go out of your way to "do something reasonable" with absolutely any kind of argument type, leaving the caller's intent implicit, you will sometimes run into combinations that "work" in unexpected—and often unwanted—ways.
For example, say you have a function which returns either a Person object or, in very rare cases, an error string. Moreover, you fail to check for the error string, and pass the result into another function which expects a Person object but will also take a name and look up the corresponding Person object in a table. Now if the first function fails you're left trying to look up an error string as a name, with no obvious signs (such as a type mismatch error) to show that anything is amiss.
It's important to make the intent explicit, and not just let the function guess. One option compatible with both statically- and dynamically-typed languages is to provide two functions, one requiring a Person object and another taking a name string. This is still perfectly ergonomic for the user and mitigates most of the potential for confusion.
For example, say you have a function which returns either a Person object or, in very rare cases, an error string. Moreover, you fail to check for the error string, and pass the result into another function which expects a Person object but will also take a name and look up the corresponding Person object in a table. Now if the first function fails you're left trying to look up an error string as a name, with no obvious signs (such as a type mismatch error) to show that anything is amiss.
Well I only ever return one type from a function, I'm not a total madman. Sometimes I'll do one type or a None, if I'm trying to replicate the functionality of dict.get(). Any error string would be within an Exception, so that wouldn't be an issue, but even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.
One option compatible with both statically- and dynamically-typed languages is to provide two functions, one requiring a Person object and another taking a name string. This is still perfectly ergonomic for the user and mitigates most of the potential for confusion.
In practice that is usually what I end up doing, but with a 3rd function that takes either and returns a Person object. In this particular case I would probably make the function be a method on the Person object, and have a class method to look up the Person.
Here is the scenario that annoyed me enough to turn me off static typing. I had a class that stored the IP address of a network device as an ipaddr.IPAddress object (now ipaddress in the standard library) and there were various subclasses for specific device types. One of the device types needed an SDK, and the init for the SDK class looked something like this
If they didn't check the type it would have worked fine. Just like every other library we were using to connect to devices.def __init__(self, host, port=1234, scheme='https'): if not isinstance(ip, str): raise TypeError('invalid host') self.url = f"{scheme}://{host}:{port}"So after a bit of frustration we changed our base class
and all was well with the world, but there was a dumb mistake waiting for us. A year or two later, after upgrading to 2.7 we started passing around unicode objects instead of strings to get ready for 3.x, as was the style at the time. Again that SDK broke, and only that SDK, because it insisted on checking the type. Sure it was our mistake this time for not having the original fix to be just casting it to str right before passing it to the SDK, but it was annoying and should have been unnecessary.def original__init__(self, ip_address): self.ip_address = ipaddr.IPAddress(ip_address) def new__init__(self, ip_address): ipaddr.IPAddress(ip_address) # just to validate self.ip_address = ip_addressI understand that type hints are much better in this regard because it would only show an error in your tooling. But that brings me to another point.
I write my packages/classes/modules to mostly be used in a web app, or as scripts that run on a schedule. However, I also need to be able to write one-offs very quickly. When that happens my code that was previously a library for different applications, now becomes an application itself. Using the REPL, a jupyter notebook, or bpython, I will need to quickly get something done. In these scenarios I don't want to waste time remembering how to normalize the data being given to me. Especially If the code that provides such niceties is tucked away at a higher level for end users of the web app.
Like I said, I tend to just make a lookup function, and then have everything else be methods on the object. But that doesn't really help when it's parameters to a function. I really don't know what would make it better. Perhaps some kind of mix between function overloading and interfaces from other languages, and the magic *_validate() methods that Django uses. Maybe instead of type hints for return values we need value hints, that give an idea of what actual objects might look like. Then tooling could take into account if it would still work after validation and normalization. Of course it could be that there is no elegant and reliable way to do what I really want, but I can dream.
> Well I only ever return one type from a function, I'm not a total madman.
I'm sure your APIs are sane (at least to you). It's all the other developers you have to watch out for.
> … even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.
A type mismatch would be caught earlier (even in a dynamic language) and the runtime exception should report the specific objects involved, so you still get the string which caused the problem.
> Here is the scenario that annoyed me enough to turn me off static typing.
To begin with, this example has nothing to do with static typing. It involves a runtime time check. In this case I would agree that the type check is too strict. Some languages have an interface or protocol for "string-like" objects (e.g. the to_str method in Ruby), and it would be better to use that rather than checking specifically for an instance of str. Objects which shouldn't be treated as strings just don't implement the protocol. Python has the __str__ magic method, but unfortunately it's not very useful in this regard since all objects implement it, even ones that are nothing like strings. It's more like Ruby's to_s method, used for formatting and debugging rather than as an indication that you have an actual string. The best recommendation I've seen for checking for "string-like" objects in Python is something like `str(x) == x`, though the extra comparison adds some overhead.
Of course that doesn't really help you since you were trying to pass an arbitrary non-string-like object (IPAddress) to a function expecting a string; the looser `str(x) == x` check would also have failed. The call might have "just worked" without the condition, or it might have failed spectacularly. In assuming that it would work without the type check you're depending on the implementation using string interpolation rather than, say, concatenating the strings with the + operator, which requires actual strings and not IPAddress objects since the + operator doesn't do implicit conversion like f-strings would. Static typing would have helped to limit these dependencies on unstable implementation details, letting you know that you need to fix the issue at the call site by passing `str(self.ip_address)` for the host parameter.
We have tests, and static types, because developers are people and people make mistakes.
You can't say "we simply don't allow bugs!" because it's a lie. Why rely on a another person manually checking for silly mistakes when the computer can do it for you?
You'd think. But I've seen many many many examples of this pattern in production JS code.
> I hate having to waste time figuring out the type of every variable and hold it in my head every single time I read a piece of code.
For the same reason, I’m not a fan of type-inferring variable declarations.
I'm okay with "var = new FooBarBazThingyWithALongName()" because I don't need to see the type name twice there.
In an IDE you can get the type annotation from the IDE over every inferred var type, but I don't like requiring an IDE to see that information and like it showing up in 'less' as well.
I agree it's redundant if the type name occurs twice in the same statement. However, further evolution of the code often causes the instantiation to be moved elsewhere, and I wouldn't have confidence that the one doing that change then also changes `var` back to the type name. Instead, it would be nice to have syntax avoiding the duplication in the fashion of `FooBarBazThingyWithALongName thingy = new(...constructor parameters...);`.
C# 9 now has "target-typed new expressions" https://www.thomasclaudiushuber.com/2020/09/08/c-9-0-target-...
Ideally linting tools on PRs show the refactored code as a violation, and it should be easy to rip a cleanup refactoring across the files before submitting a PR to avoid that.
Yeah, when writing type inference is obviously nice, but it can be annoying to try to go back and read.
I think the best experience is having a language server annotate the inferred types (like how rust-analyzer does it.) But even then, it can become hard to read code on GitHub or somewhere where tools are not available. Granted that's becoming less and less of a problem, and even GitHub allows using some VS Code extensions now.
With good IDE support, writing types isn’t that much of a burden. Either write a function call first and use "assign return value to new variable", or use autocompletion where you only type the initials of a multi-word type name. Plus IDE refactoring actions when a parameter or return type needs to be changed.
I would argue that any function that branches on argument type is straight up doing dynamic typing wrong. Well branching may not be the right word. Something resembling pattern maching is fine, but like you say having a function that takes a string for lookup OR the object is just a disaster, particularly when you start stacking function calls. Dynamic types should closer resemble things that all share an interface, not totally different representations of that data based on the shape of your code.
Javascript is by far the worst offender here with its ignoring extra arguments. Javascript functions that totally change effective type signatures based on number of args are the devil's work.
I'd argue that if the types that a function accepts are not easily defineable than you're doing dynamic typing wrong.
I've only been in the industry for ~15 years, but it still feels like every year, some ecosystem discovers the value of something that another ecosystem has taken for granted for decades - type-checking, immutability, unidirectional data-flow, AOT-compilation, closures, pure functions, you name it. I'm glad we seem to be converging on a set of best practices as an industry, but sometimes I wish we were spending less time rediscovering the wheel and more time building on top of and adding to the actual state of the art.
I've been alive long enough to see that most things are useful, and all things are oversold.
More, the nice easy things to build with major restrictions pretty much gets thrown out the window for complicated things that have constraints that most efforts don't have. This isn't just a software thing. Building a little shed outside? Would be silly to use the same rigor that goes into a high rise. Which would be crazy to use the same materials engineering that goes into a little shed.
The metaphor doesn't really works, as in software lots of high rise start as a little shed.
Quite the contrary, in my opinion. Lots of what makes various parts of the physical hard are in the infrastructure surrounding them. This is very similar to the complexity in software setups.
Take game systems, as an example; far far more effort will be spent in the art and general asset management than is true for many business software setups. Which is why many of the business best practices haven't necessarily moved over to games.
Similarly, look at the general practices around building and maintaining bridges in physical world. We call all bridges by the same name, but reality basically dictates that what works in some locations cannot and will not work in others.
Now, you are right that we can grow large software out of smaller in ways that the physical can't do. But, it is a common fallacy to stall out a project by trying to be at google's scale from the start. Ironic, in many ways, as not even google was built to be at their scale from the start.
I'm not sure what you're trying to say. Something like "the software world is complex, and so is the physical world, so comparing the two makes sense"? If that's what you meant, you're right, but the problem is that comparing the two doesn't lead to better insight in those. If that's not what you meant, then sorry, I didn't understand.
I think that in general we should stop using so much metaphors in the software world. There's no need to go look for a shed. If we had to statically type and test every shell commands we typed, we would lose lots of productivity. On the other hand, maintaining those very large scripts that started as a simple line and are now used for deploying all of our application, and tend to fail in surprising ways, would be easier.
The other problem with metaphors is that they are also hard to refute. I've never built a shed, nor worked on a high rise. I don't see why that experience would be relevant to building software, or necessary in a discussion about static typing.
I'm claiming that most of the complications that will actually influence many of the intrinsic choices of both, will be dictated by external factors. Static typing being an intrinsic fact of software, I couldn't tell you what most of the software I use, used.
Calling for lack of metaphor is interesting. In many ways, our industry is nothing but metaphors, so it is surprising for me to see them called down.
I agree that no metaphor is perfect. But, by that same logic, I would argue that no specified type is perfect. Especially if done so in a taxonomy that does not admit exceptions. (And again, I'm not against types.)
> Calling for lack of metaphor is interesting. In many ways, our industry is nothing but metaphors, so it is surprising for me to see them called down.
I don't think that's true. There are lots of metaphor because people love using metaphors, but it's not inherent to our industry. Abstraction is, but abstraction and metaphors are different. Metaphors seem to mostly come from blog-post type content, where people want to give you an intuition for something in less than 10 minutes. There's a really good article about this, in the context of monad tutorials, which are some of the most proheminent victims of these metaphors https://byorgey.wordpress.com/2009/01/12/abstraction-intuiti....
> I agree that no metaphor is perfect. But, by that same logic, I would argue that no specified type is perfect. Especially if done so in a taxonomy that does not admit exceptions. (And again, I'm not against types.)
I would call types an abstraction rather than a metaphor, though I agree with you that they are not perfect, in that all abstractions trade precision and exhaustiveness for speed. There are interesting alternatives to this with property-based checking and whatever clojure.spec is, and type systems themsleves are getting better, but we're still not at perfection. And even then, I don't think we will ever reach it. The "best" type systems currently all seem to have some structural parts, and some nominal parts, so there's no silver bullet.
I mostly use types to avoid stupid mistakes (I make lots of typos, and Typescript helps a lot here), and to improve developer tooling. I'd like to try some approach with DDD and types, but my current company isn't big on DDD, so I can't really judge it. I also like using unit and integration tests. All of these make me feel safer when doing changes. But some people are fine with catching errors in production and quickly fixing them.
What I meant in our industry being nothing but metaphor is basically me staring at so many OO taxonomies. Even if you ignore deep OO trees (and I think you should), it is hard not to see the way we define most data and simulations as anything other than a very formal metaphor.
That said, I was not trying to say that abstractions and types are directly metaphor. I agree with your points. My argument there was that, like metaphors, types/abstractions are never perfect.
I use types to avoid type errors. Which is a big class of error, to be sure. But they do little to help with logic errors, in my experience. And they are flat detrimental if they require pulling in more and more formalism to cover cases that are of increasingly limited ROI.
If anything, I think our industry would do well to embrace many of the modelling domains that allow use of SAT solvers to find answers. And I don't think I've ever seen a strongly typed one of those that wasn't hard to follow. (I am interested in counter examples.)
> Take game systems, as an example; far far more effort will be spent in the art and general asset management than is true for many business software setups. Which is why many of the business best practices haven't necessarily moved over to games.
You're right that game development involves a lot of asset stuff that other business software doesn't have to worry about as much. (And, conversely, a lot of business software has to worry about large mutable datasets much more than most games.)
But I don't think that has much bearing on why some business software practices haven't made their way to games. I think the reasons are mostly:
* Games are structurally different from business software, so the patterns that work for the latter aren't always great for the former. MVC makes sense when the "UI" is a relatively thin layer insulated from the "logic". In most games, the "UI" (rendering, animation, VFX, audio, etc.) is huge and more deeply coupled to the game state.
* A lot of enterprise software practices are about maximizing developer productivity in a single codebase over a very long period of time at the expense of runtime performance. Game codebases often have a shorter lifespan and can't afford to sacrifice runtime speed for developer speed.
* Game developers can be insular and are often either oblivious to what's going on outside of games or think it's beneath them and not applicable to their "real" code.
I can't say with any authority on why practices are different between the different environments. So, to that end, I should have offered it as /a/ reason, as I don't think it is the sole one. I can't shake that it contributes, though. I mainly meant it is a counter to the implicit "devs at office job are too lazy to learn different ways."
For other examples, I would dip into major logistical simulations/optimizations. Which basically drop into linear algebra as soon as they can, where much of the idea of typing is basically thrown out, so that we can solve equations and constraints, with no real tracking of which is which at different locations. There is a tranlation in/out, but once in, things are effectively "matrix of values.
(As an amusing aside, I love that I get to message with the authors of books I'm reading on places like this. I have your Crafting Interpreters. Working through it at a glacial pace. Plan to pick up the other one next. Kudos and thanks!)
You're welcome! :)
Not that I disagree, but I feel like you are overselling the simplicity of [building a shed](https://en.wiktionary.org/wiki/bikeshedding).
My point is that all methods and techniques probably have worked for someone doing something.
And I should have leaned in on how much is still left to implementation in terms of "shed." From weather, to what is being stored. It isn't like there is a universal shed design that will make everyone happy.
Nor is this saying that some things aren't truly valuable. Just recognize that some places they don't help as much as you would like. This isn't saying they are bad or worthless. Just acknowledging that they are oversold.
Programming is sufficiently complex field that we can find examples when the opposite things are the best: it depends on context whether you need more or less types.
I think the problem is to figure out what the best practices actually are.
What we are observing here is „the market fixing it“.
The process is messy and redundant, but effective.
I think the limiting factor in the case of python getting type hints was that it was never designed for type safety in mind, and that it took a while to establish consensus on a good type-hinting system.
I don't think it's a matter of reinventing the wheel, in this case, more a matter of bolting something like a wheel on a system which didn't start with wheels.
Yes, and when that ecosystem discovers these obvious facts, the discovery is always described as a "journey" in the accompanying blog post. Having sense and good taste at the beginning of a project doesn't warrant a blog post but slowly stumbling over isolated aspects of good taste, now that's a journey.
Honestly will never go back to languages without type checking, it prevents so many bugs and is a huge help in understanding code you haven’t worked with previously.
> Honestly will never go back to languages without type checking, it prevents so many bugs and is a huge help in understanding code you haven’t worked with previously.
I see static types as one of the most powerful communication tools around, as far as code goes. I can't relate at all to people complaining that they waste time. They must work very differently from how I do, is all I can figure. It's that, or they don't realize how much time they're losing to communication-related tasks, or refactoring, or writing (and maintaining!) extra or more verbose tests, or having even one more bug per year make it to production, or whatever, that'd be saved by static types, so aren't correctly accounting for the time savings. One of the two.
Consider these 4 possible combinations for programming languages:
(1) Low-level, static types
(2) Low-level, dynamic types
(3) High-level, static types
(4) High-level, dynamic types
For whatever reason, historically #1 and #4 have been most popular. C, C++, Pascal, Ada, and Java are #1. Python, JavaScript, Perl, and BASIC are #4.
There haven't been a lot of #2 or #3 languages. Some #3 languages (TypeScript and Python with types) have come along, but relatively recently.
A person who experiences only #1 and #4 might notice that they can whip up programs faster in a #4 language than in a #1 language, then falsely attribute that difference to the static types. Whereas the real reason is working at a different level of abstraction.
I don't really see how Java is closer to C than it is to Python in terms of what level of abstraction it's working on, could you elaborate?
It's a matter of opinion whether Java's level is closer to C or Python. But I can name a bunch of high-level features in Python that aren't in Java:
List literals, dictionary literals, tuples, bigint literals, byte strings, f-strings, sequence unpacking assignment, named parameters (kwargs), decorators, closures (functions within functions), metaclasses, generators, async, list/set/dict/generator comprehensions, multiple inheritance, natural JSON support, ...
C# has most of this stuff and I think Java has at least a few of these but I wouldn't say any of these are really much of a differentiator.
However Java do support dynamic typing for every object even if it is a bit cumbersome so putting it at the same level as C++ isn't accurate either.
And at what level of abstraction does ISO C work on?
"C Is Not a Low-level Language, Your computer is not a fast PDP-11."
That would be a good point if you actually could write the final code for CPU's, but you can't since the CPU internals do that for you. So from an application programmers perspective machine code is as low as it gets and C maps really well to machine code so C is a low level language.
> So from an application programmers perspective machine code is as low as it gets and C maps really well to machine code so C is a low level language.
C only "maps really well" to PDP-11 style machine code. If you want SIMD, parallel algorithms, heterogeneous programming, memory hierarchies/domains, etc then ISO C is completely useless.
Only if that machine is a 8 or 16 bit CPU, with code compiled as -O0.
And even then, it's impossible to write something like malloc() without using either an external Assembler or language extensions.
#3 languages have been around since the 80's (SML) and 90's (Haskell)
Exactly. IMO we still don't have a good #3 language - in particular, all of the popular languages that can be compiled into native code are in category #1.
I'd say that f# haskell ocaml and such labguages fall under #3.
I'd say rust and swift are in the #3 category. Really, C++ is pretty damn high level, just not a sound type system. But it's static
I can remember that in the beginning it felt cumbersome to me because I wasn't all too familiar with that language's type system and so I had all kinds of errors thrown at me.
But it's something you just have to get used to, and now that I understand it much better I feel more productive and have more confidence in my code. And the communication aspect is definitely a great help too. No handwritten documentation can be this consistent, completely independent of who touched the code (though to be fair, it's still difficult to get the naming right).
I can't imagine the people calling it a waste of time got over the hump in the beginning. To me it's obviously a timesaver. It does a tedious, difficult (for humans) task and does it quickly & with perfect accuracy. Beforehand worrying about all the type signatures and interfaces felt like 4D Sudoku across various modules, now I can concentrate on the interesting parts.
I've been a types advocate for years, but it wasn't until working with Typescript that I started experiencing some of the downsides...
To me, ideally, types are supposed to be a benefit not only in safety, but in understanding the intent of a piece of code more quickly. For an api or library interface, review the types to see what its intentions are.
But there's something about the typescript type system, with all the picks and keyof and typeof... sometimes it just feels like it's way too easy to go overboard, to the point that it occludes meaning. I understanding struggling with types if you're struggling with figuring out exactly what your boundary does and does not allow, but when you're struggling with types just because you're struggling with the kabillion different ways that some other typescript programmer chose to use the utility types... there are times when I feel like even Scala is easier.
The problem is that typescript is here to type existing JS, and existing JS has been written without thinking about types. "fresh" TS might be better for that.
> with all the picks and keyof and typeof...
Depends of course a lot on the codebase but all typescript codebases that I've seen so far and considered "well-maintained" didn't really use keyof and typeof all that much. The only way I can imagine how one ends up with lots of those keywords is when you start with a dynamic language approach, and then tell the compiler afterwards what that type might be, instead of defining the type beforehand - might that be the issue?
Your criticism boils down to "it's possible to overcomplicate things". Sure if you completely remove static typing then you can't overcomplicate static typing. But is that really an argument against static typing?
No, and I am still in favor of static typing. But I also don't think it's purely up to team discipline. Something about typescript (or more aptly, javascript) incentivizes crazy typing more than other languages. Luckily, the last few years appears to have had more emphasis on designing languages to take these kinds of incentives into account, so I still think the future is bright.
> Something about typescript (or more aptly, javascript) incentivizes crazy typing more than other languages.
Definitely not more than other languages. Check out template metaprogramming in C++ or some of the OTT generics in Rust.
> is a huge help in understanding code you haven’t worked with previously
This is huge for me. As someone who takes on already completed projects, it's a huge help with debugging and understand what's going on without requiring you to know the whole system forward and backwards. Sure, you still need to build a mental map of the general code flow, but you can look at a single function and clearly see the obvious inputs and outputs. Combine that with a a stack trace and you can debug that method as a single unit and then start to look at where it's called and what its downstream effects are. You don't need to start from the very beginning of the call and then follow it through, keeping mental track of what is available and in what form when and where.
It is kind of ridiculous not to have types. I think in the old days handling types felt to heavy for scripting languages, but now with type inference and stuff I don't think it is any longer.
I feel the same way but wonder if I'm right when the majority of jobs are JS and Python.
Both of those languages' communities have essentially admitted that not having type checking was a mistake and try to patch it with TypeScript and MyPy.
I agree with all the benefits of mypy cited in this article. For me, most important thing for the long-term health of a codebase is its readability/maintainability, and mypy static typing makes such a huge difference for that in large Python codebases. I'm really excited to see large libraries doing this migration.
I'll add for folks thinking about this transition that we took a pretty different strategy for converting Zulip to be type-checked: https://blog.zulip.com/2016/10/13/static-types-in-python-oh-...
The post is from 2016 and thus a bit stale in terms of the names of mypy options and the like, but the incremental approach we took involved only using mypy's native exclude tooling, and might be useful for some projects thinking about doing this transition.
One particular convention that I think many other projects may find useful is how we do `type: ignore` in comments in the Zulip codebase, which is to have a second comment on the line explaining why we needed a `type: ignore`, like so:
* # type: ignore[type-var] # https://github.com/python/typeshed/issues/4234
* # type: ignore[attr-defined] # private member missing from stubs
* # type: ignore[assignment] # Apparent mypy bug with Optional[int] setter.
* # type: ignore[misc] # This is an undocumented internal API
We've find this to be a lot more readable than using the commit message to record why we needed a `type: ignore`, and in particular it makes the work of removing these with time feel a lot more manageable to have the information organized this way.
(And we can have a linter enforce that `type: ignore` always comes with such a comment).
I really like this documented type ignore strategy and will start incorporating it in our codebase. Thanks for sharing.
I've seen a lot of push back on adding type checking to Python but we had a similar case at my company where we tried it out on a new project and the clarity and readability of the code was immediately beneficial to the entire team. Perhaps it's something well suited to larger codebases.
I want type checking on pretty much anything that will ever exceed about two screenfuls of code. If I can't keep the whole thing in my head at once, I want the computer to do it for me. That's the point, right? Making computers do stuff for us so we don't have to?
I kindof think of them as a giant set of unit tests. The compiler/linter etc. can check every variable and every function call to check to make sure you didn't mix up your types, which _will_ blow up at runtime if you got them wrong.
So rather than write them all by hand, just get your tools to do it.
I see them as documentation of what I think something means or is, which the computer can check for accuracy (more or less), both as I write and as the codebase changes.
That legibility to the computer is what makes them much better than documenting the same thing some other way. Are they out of date? Were they wrong to begin with? The computer will tell me, no action needed on my part. I need to look up something in the context of what I'm reading right now—oh, look, the computer just told me exactly what I needed.
It's $current_year and there's still debate whether checking stuff at compilation time is better than at runtime?
People who don't think types are a good thing need to work in a statically typed language for a year or two and then see what a difference it makes in reality. Unproductive Java bureaucracy != static typing.
I think the people debating it never tried it seriously.
I’ve done everything from Haskell to Java and I still strongly prefer Clojure and Common Lisp-style dynamic types.
I have to agree, I've done over 5 years of C# and then went to ruby and never looked back. Static type checking raises the floor on incompetence, but also lowers the ceiling on excellence. I have to admit I don't have experience with the extremes which would be Haskell and Clojure.
The amount of cruft I had to type in C# just to get shit done... It's all implicit in ruby thank god for that.
I never EVER have to check the type of a variable at runtime. I always know its type just by looking at its name. Is it enforced in ruby? Of course not. Ruby assumes I'm an adult and I know that I'm doing.
> Static type checking raises the floor on incompetence, but also lowers the ceiling on excellence.
At 40 years old, I've seen enough of my own incompetence that I'll gladly accept things that can mitigate it. As for excellence, I suppose static typing would have prevented a handful of clever hacks that I did in Python and Lua when I was in my 20s, 12+ years ago. Truthfully though, my memory of that period has faded enough that I'm not sure, and I doubt that any of those hacks were crucial for the products that I was developing at that time. Yes, a type system as primitive as Java's at that time would have felt like a straitjacket. The same might have also been true for C#. But modern static type systems are much more flexible, and I don't think I've rejected a language based on its static type system in the past several years. (I've recently done a project in Elixir, but that was despite its dynamic typing, not because of it.)
> I always know its type just by looking at its name. Is it enforced in ruby? Of course not. Ruby assumes I'm an adult and I know that I'm doing.
TIL taking notes of things you want to be reminded of in the future is for children and the incompetent.
How in the world does type checking lower the ceiling on excellence?
I"m guessing by rejecting perfectly valid and correct programs that are unable to be type checked. There is a large space of "false negative" programs that a type checker will reject, but that could be perfectly correct. E.g. compare Python-esque duck typing with nominal typing.
You can have that by using Any as your type if you so wish typically and escape the type system for those rare circumstances.
Also C# suffers a similar issue with what I call "unproductive Java bureaucracy", since it's basically Microsoft Java. Bureaucracy is not static typing. You can also have full dynamic dispatch and still have static typing too.
Tbh I'm conflating multiple things, I've heard a lot of good things about Haskell.
But in C# for example, if the system was not designed with dependency injection and everything being an interface it's very hard to build a test harness since you can't mock anything. Which means everything has to be tested manually. (I haven't done any C# in a long time, maybe it's not the case anymore)
So you have to create an interface and classes for every implementations for every type in the system just so I can change its type dynamically. By the time you're done with all the cruft, you forgot what you were about to code.
I'm infinitely more productive in Ruby compared to C#. But I can understand dynamic languages not being welcoming to juniors, since they can code themselves into pitfalls that will bite them later.
> But in C# for example, if the system was not designed with dependency injection and everything being an interface it's very hard to build a test harness since you can't mock anything. Which means everything has to be tested manually.
I wouldn't have put it like that, but I think I know what you mean. Mocks for unit testing do require that you have defined an interface to implement, which means that every class that you want to be mocked out needs to have an interface extracted. It is extra work. Overall I think the tradeoff is worth it, myself, especially if your IDE can automated extracting an interface. But it is dumb work that a smarter language/type system could avoid.
> [...] which means that every class that you want to be mocked out needs to have an interface extracted. It is extra work.
If you define the interface first, then you can simply copy-paste that into the class definition and off you go implementing it. Hardly any more work at all.
https://docs.microsoft.com/en-us/visualstudio/test/isolating...
This and other libraries can supersede method modifiers.
i'm guessing maybe sometimes type checkers make you jump through the hoops to pass, and the OP finds that distracting? To me though, the benefits of type checking far outweigh the cost.
>I always know its type just by looking at its name.
Do you ever feel the names are getting too verbose and it would be great to have tooling that would allow you to get that information on mouse-over instead of having it make your lines almost unreadable?
I mean, there's a reason mathematics have decided to keep variable names short instead of having the names contain all the context.
> lowers the ceiling on excellence
There is zero real world evidence for that statement. The smartest developers I have ever worked with love types. The not-so-smart ones couldn’t figure out how to use types well and their code was a buggy mess. Not evidence of anything of course but certainly a sample point.
> also lowers the ceiling on excellence
> I never EVER have to check the type of a variable at runtime.
> I always know its type just by looking at its name
I guess you've only ever written web backends and menial things like that?
I'm curious, after having done a significant amount of Haskell, I have flipped that opinion. The biggest difference is how the types help make things explicit and clear.
(although, IMO, I think purity makes a very large impact here too)
i have a small side project in clojure [1] and i always miss type checking when working on it. not by much because it's a small project but i am tired of iseq is not a function error.
I sort of think there are two mindsets behind this debate: people that miss the guard rails of a static type system and people that enjoy the experience of iterating quickly in a dynamically typed language. I don't really want to say everyone should pick one side or the other, just that my experience doesn't bear out the claim that "statically typed languages produce more maintainable code". And, the little bit of empirical evidence for this proposition is largely inconclusive: https://danluu.com/empirical-pl/
>and people that enjoy the experience of iterating quickly in a dynamically typed language
Programmers spend more time reading code then writing it. So I personally prefer the devs in the team will spend more time typing the code or use a bit more brain energy to think about types so later we can all read the code and understand it and edit faster.
Dynamic works great for write-only scripts.
People always say this about reading code and it’s just never matched my experience working in either sort of codebase: one difference (comparing lisps and, say, Typescript or Java) is that lisps just have fewer lines to read. So, any assistance you get from the types is counteracted by having to read more code.
But, additionally, I just don’t find it true to my experience that it’s easier to read and understand a dynamically typed codebase vs. a statically typed one. Especially when you have a lisp-like environment that makes accurate jump-to-definition possible.
EDIT: I think I just tend to think about codebases in terms of operations rather than types. And, consequently, when I build a codebase around compositions of functions, the way I think about it isn’t very different in either paradigm.
Maybe it depends on what you work on.
From my experience things go like this
1 we have a simple problem I implement a simple elegant solution
2 some new feature is added, this means there are some special cases now, most of the time someone else in the team adds this new feature , so 4-5 different places are modified, functions need to get more parameters and some IFs are added in those 4-5 places
3 later a new feature is added again, a few more extra special cases again , the dev will again go add some more function parameters here and there , add more IFs but fails to find where all the places that might need to be modified are.
4 things are now a big mess, I have to fix it, and I now have to read all the code, my own old code that was changed with different exceptions and the other developers code. I spend a lot of time on reading stuff, understanding how stuff works now, understand why this new code does some stuff then I spend the time abstracting again a solution, abstracting all the special cases. After I have a solution in mind comes the refactoring, with static types or type hints is much easier to find where stuff is used so you know what to modify.
I am sure one some projects where maybe there is only 1 or few devs that all write quality code and there are no new requirements that need implemented ASAP the code could stay more readable but this is the exception unfortunately.
Is it possible to elaborate in a comment? Honestly I probably wouldn’t take the time to read a lengthy article, but if there’s some elevator pitch then I’m all ears.
What makes CL/Clojure really work is that your editor (emacs usually, but there’s other options now) connects to the live program and has access to the entire runtime environment. So, you can do a lot of the things other languages need static types for via introspection (e.g. autocomplete: CL just asks the running program what functions are available that matches the current pattern and returns a list).
Secondly, since I’ve learned statically typed languages, I already have a mental model for how they make you structure your code, except dynamically typed languages make patterns easy that would require something like dependent types to check (see how complicated Typescript is, because it has to be able to model JS idioms). My experience is that a lot of the value of static types isn’t in the checking but in the modeling aspect: if you follow the general patterns you’d use in Haskell (represent algorithms like “apply a function to each member of the list” as functions), you reduce the amount of thought it takes to see the program is correct by splitting it up. For example, if I have this pattern in my imperative codebase:
I have at least three things mixed up together: accessing each member of a list (and there's an easy to miss off-by-one error in this implementation), transforming that member and building up a result. If I translate this to a functional style, it's easier to see that the implementation is correct:let result = [] for (let idx = 0; idx <= input.length; idx++) { result.push(input[idx]+1); } return result
Looking at this code, I can break down correctness into three questions: is list.map implemented correctly? is inc (the transformation) implemented correctly? And, assuming both are correct, are these two functions combined in the correct way? Types definitely can help here but my experience is that 90% of the benefit isn't the _checking_, it's the code structure you end up with as a result.[1]const inc = v => v+1 . . . return list.map(inc)Now, if this is true, why do I prefer dynamically typed languages? Well, it comes down to two things: I find the "live programming" model of CL/Clojure more productive and roughly equal to types when it comes to checking correctness (and I don't think it's just me, I've seen various papers, etc. that claim Haskell and Clojure have roughly equal defect rates); and, I find the patterns I like in CL/Clojure/Javascript require much more sophisticated type checkers to actually validate, and such type-checkers have a huge up-front learning cost and still add a lot of boilerplate that exists mainly to convince the type-checker that you know what you're doing.
Finally, in a language with macros, you can roll your own static guarantees: one project I worked on was doing a bunch of calculations inside a database. We hit an edge case where the DB's idea of a week didn't match our requirements. As a result, I wrote a code generator that generated Clojure functions and DB queries simultaneously. In this situation, if you assume the code generator is correct, you have a compile-time guarantee that the Clojure versions of the queries are equivalent to the calculations being done inside the DB.
[1]: This page surveys a bunch of studies on the question of dynamic v. static types and finds the evidence in favor of static types to be surprisingly small https://danluu.com/empirical-pl/
> This page surveys a bunch of studies on the question of dynamic v. static types and finds the evidence in favor of static types to be surprisingly small
Most of the studies seem to be rather poor though, so difficult to draw any solid conclusions from them. Almost all seem to drown in noise, or have flawed setups.
From personal experience, with a static type language I can jump into an unknown codebase and make non-trivial modifications much, much faster than if it's a dynamic type language codebase.
I've wasted soooo many hours doing print(dir(x)) in Python it's far beyond funny.
On the flip side, over the years I've helped countless people with their C/C++/Delphi code in minutes, frequently using libraries and API's I've never seen before.
Yeah, the evidence here is mostly anecdotal but, while we’re trading anecdotes, I think you have to distinguish Smalltalk/Clojure/Common Lisp from other dynamic languages. Most dynamic language essentially work like statically-typed languages without typechecking: you put code in a file and then run it all at once (or run unit tests) and see what happens. The languages I mention actually bring your development environment to runtime (twisted manhole and pry are the closest things I can think of here) so, you don’t have to run the whole thing, you can just run the parts you care about and see what they do.
That being said, my experience isn’t the same: I’ve been able to make helpful changes to dynamically-typed codebases in roughly the same amount of time as to static codebases. I’ve never really identified what it is about how I approach code that makes a difference here, but I think it is because I think about changes in terms of operational equivalence (e.g. l.map(a).map(b) === l.map(compose(b, a)) ) rather than in terms of data types.
This is actual JavaScript code, from one of my projects:
It's implementing a callback from a library.function processAudioData(data, callback) { // dump audio data to WAV file }Even reading the source code of the library I had problems figuring this one out. Had it been say C# code I'm pretty certain I would have had it done in seconds.
How do you solve this in seconds? I'm genuinely curious as this is something I often struggle with when having to use say Python or JavaScript.
I could see a statically typed language that would give you a live reflection system and macros. I think it's more if you have to chose, you'd rather have that than static types.
But I think it is possible to have all 3, it just doesn't exist in any popular language that I am aware of.
The problem is that the “live programming” aspect violates a fundamental assumption of a lot of static type systems: the “closed world” assumption that all the relevant types are known at compile-time. If you can dynamically extend/redefine the types on the fly, your type-system guarantees start getting weaker anyways. Instead, you need a system of contracts or something like Racket has.
Also, if you have macros, you can always just embed a Haskell into your language for the parts where you want that sort of guarantee: https://coalton-lang.github.io/
This is a really useful overview from which I gained some new insights. I appreciate you taking the time.
LISPS are the only languages where I feel like dynamic typing works well.
After using TypeScript for even a little bit I find it painful to go back to JavaScript for anything more complex than white-boarding.
A lot of people grew up with Java -- especially early Java -- as their primary language. It was taught heavily in schools.
I think it ruined a lot of people to static typing and exceptions because Java is/was terrible for both of those things.
That's not really the debate in Python :)
Almost every Python user now has to "deal" with type annotations. It's tempting to gradually add type annotations, it's nice documentation.
But it also rubs me the wrong way to have annotations that are never checked(!). In many codebases, you might just have "casual" style type annotations in Python, and nothing ever asserts that they hold. That's nagging on me, a bit.
Never checked? They're statically checked.
Also, tooling like https://pydantic-docs.helpmanual.io/ can do runtime checking for important parts of your app or you can use this https://github.com/agronholm/typeguard to enforce all types at runtime (although I haven't measured the performance impact, probably something to do in a separate environment than production?).
They are statically checked, if you run a type checker. Which many don't.
That's a good point. If they're never checked, then they're just like incorrect/outdated comments. They sort of get at this idea in the article, and sort of describe a compromise for it. They have a list of files that they've completely annotated, and only those files are checked by mypy. So in their case, they know which annotations to ignore, and which they can rely on.
I think it's well suited to anything really - the amount of casual problem solving and inference you can make from some simple types is pretty big in my experience, and Python's approach to allow you optionally buy into it is really nice.
Just wish Python's typing was better. But it's impossible to type hint the crazy "pythonic" code out there. Like the kwargs used to do a Django query.
Even though I had previously learned some rudimentary C, C++, and Java, I really came-of-age with Python. Now, having written (and maintained) nontrivial code bases in statically typed languages including D and Rust (and dabbling in others with contributions in C, OCaml, etc.), I am never going back — except perhaps in a few cases when a library like PyTorch or Pandas has no good substitute.
(edit: corrected "Linda's" to "Pandas" heh, mobile kbd)
Having spent a decade with Python and more recently a few years with C# I still can't quite put my feelings into words but here's an attempt:
"The benefits of explicit typing are obvious and clear but they downsides are subtle and hard to communicate"
I still think typing in general is a net win but I'm not sure whether static typing is. You find yourself writing code that just wouldn't be neccesary in a dynamic language - and I don't just mean the direct code you write to declare and cast types. There are more subtle costs.
I need to spend time with a good type inference in a language with modern typing and dynamic features to sort out how I feel about this.
Types are effectively assertions about the values they represent, and statically-typed code constitutes proofs that the assertions actually hold at runtime. The static typing forces you to be sufficiently rigorous in those proofs, which may require additional code as you mention. Without static typing, one has to rely on the "proofs" in one’s head to be correct (which humans aren’t really good at), instead of having the compiler double-check one’s reasoning.
I think this falls into the category of "The benefits of explicit typing are obvious and clear". It's the other side of the equation that I'm intrigued by and struggling most to formulate.
Lack of type checking was a hot thing for a while. It made you "move faster". It was actually sold as an advantage. Until we realized that after moving faster you grind to a halt because now you have a massive codebase, with hundreds or thousands of files, and everything takes forever, and every change requires multiple rounds of testing.
I believe it really has to do with the size and complexity of modern projects. With a half-decent IDE you could sort of used non-type-checked Python in 2012, but times have changed, and now we are talking about statically checking Python and Ruby. And Javascript, of course, now has it in form of TypeScript.
I think it's interesting that PEP 484 says ([1]): "the authors have no desire to ever make type hints mandatory, even by convention," while the opening of this article says "type hints have grown from a nice-to-have to an expectation for popular packages." Things don't always work out the way the PEP authors expect.
It’s the most demoralizing aspect as even just typing the standard library online documentation and examples (such as Emil autoname example) would be extremely valuable.
I can’t understand programming without types - it’s just so weird…
Python is not without types, there are dynamic types.
Sorry to be specific I find it so weird to program without defining what I want in and out of a function
Oh look, they're finally discovering that strong typing is actually a benefit, and using a language without it is a huge step in the wrong direction.
I wrote Python code for 4 years, then moved to GoLang I really appreciate the typing languages as prevent so many bugs I just was used to handle
I think mandatory type hints in method signatures and optional type hints at assignment are a good compromise.
But if I had to pick either a language without any type hint/inference or a verbosely strictly typed language - I would must rather use the strictly typed language.
It would be really interesting to see some examples of the logic errors that were found that couldn't be found by tests. This seems to have been a very robust library. What kind of problems did you find? From what is mentioned in the article it really doesn't sound like the investment of hundreds of hours from multiple people has been actually worth it.
Tangential: did anybody find success with typed Model.objects methods with Django?
It is extremely funny to me watching Silicon Valley types slowly (very slowly) re-invent everything we knew about programming languages decades ago.
Well for the hipster culture what we were doing wasn't cool.