Type Inference That Sticks
jaredforsyth.comImagine debugging a type inference problem when there's all this hidden state
No, I'm sorry, but type annotations are where the conversation happens. It's very very important in my view that type inference is a pure - ideally easy to mentally model - function of the source code you're looking at. Type annotations let you negotiate with and probe that function, and then once you understand what it's doing and have gotten it in a good place you can remove all the ones that aren't needed
Adding (hidden!) state to that process sounds like my worst nightmare as a programmer
So I think "easy to mentally model" gets harder and harder to achieve as the type system becomes more powerful. If your conclusion is that we keep the type system limited in power, that's valid! but not what I'm exploring.
Another point that I failed to make in the post (thanks for the pushback!) is that type inference algorithms leave you high & dry in the presence of a type error. Then you're left with lots of hidden state (the internal steps the algorithm took before arriving at the error) and the final conclusion, which, if you're lucky, points you to two places in the code where the inferred types are in disagreement.
With my proposed system, all types are annotated all the time (to be shown or hidden via editor flag, or on hover), and the annotations are updated in response to actions taken within the editor. The "algorithm" becomes extremely simple, with almost no intermediate steps.
Of course, proof will be in the pudding, if I can actually achieve a pleasant editing experience :)
> all types are annotated all the time (to be shown or hidden via editor flag, or on hover), and the annotations are updated in response to actions taken within the editor
So there are two possible reasons I can see for automatically "annotating" all the types:
- Caching that info to easily show in the editor (which is a good feature, but many editors already do this without having to modify the source; it happens entirely editor-side)
- Using the "current" inferred type as a jumping-off point for determining the "next" inferred type
The second was my original interpretation, and what sounded so distressing as a user
Consider this situation:
1. Your program and the inferred types are in one state
2. You modify some code which changes an inferred type
3. You change the (visible) code back to what it was previously, but now the inferred type is different because it was partially based on the previous inferred type
This is what sounds like a nightmare, assuming I understand correctly that it's possible on the described system. The inferred types are now a state machine, where it matters not just what code is on screen, but how it got there.
Hm so it is a state machine, but (my hope is that) all state transitions are simple and direct (and observable!) outcomes of user action. If the transitions get at all complex or unobservable, I'll probably call it a failed experiment.
It might also end up being the case that users keep types "visible" at all times, only turning them off in certain situations. I can also imagine a flag to "only hide inferred types that are primitives" or something similar.
Fair enough, and it's certainly an interesting experiment, don't want to discourage experimentation
But I will say I think the desire to show the inferred types is orthogonal to the desire to persist them on disk, in the source-of-truth code files. If the state machine really is what you want to experiment with then have fun, but if your desire is just to give the user more visibility into inference, I think there are simpler ways to go about that
> Another point that I failed to make in the post (thanks for the pushback!) is that type inference algorithms leave you high & dry in the presence of a type error. Then you're left with lots of hidden state (the internal steps the algorithm took before arriving at the error) and the final conclusion, which, if you're lucky, points you to two places in the code where the inferred types are in disagreement.
How much state gets hidden is 100% on the quality of the implementation, isn’t it? There’s nothing that forbids an implementation from dumping that hidden state in whatever form it wants to (as an example, compare C++ diagnostics of different compilers or different versions of a compiler)
> The "algorithm" becomes extremely simple, with almost no intermediate steps.
I don’t see how it can become simple merely be the way things are presented. If you write f(foo,qbar,baz,quux) in a language where overload resolution is complex and the compiler has 20 overloads to pick from, but can’t find a ‘best’ one, it still has to show you what overloads it considered, how it ranked them and why, and which ones were left.
So, if you want type errors to be extremely simple, I think you need a simple language (type-wise)
Although I personally think type inference is the killer feature that makes static typing accessible (almost always auto), I've seen hostility to it, in particular among C++ and C# programmers. The point of this project, as you say, is to be as explicit as possible in the code, while at the same time presenting a view of the code that is less cluttered.
My initial impression is that if I were to use a system like this, I would want any type annotation that differs from the default inference to be visible. The only choice left, then, is when to display redundant annotations, which is where a lot of the disagreement around type inference is, even without this scheme.
Type inference occasionally bites one's hand when pushing data across an FFI boundary. In one instance I was writing some graphics code for a project and for some reason the colors in the rendering model were coming out all wrong on the screen (TL;DR it ended up looking like two of the color channels were missing from some texture maps) and it turns out that Rust inferred the type of the buffer as a vector of f64s instead of a vector of f32s. Putting the type in specifically fixed the problem promptly.
Lesson learned: Sometimes type inference fails, and always annotate your types at FFI boundaries!
Yeah, it's important to annotate anything that interacts with the outside world, because that code/API isn't available for the compiler to reason about before run-time. Same goes with HTTP endpoints (on both the client and the server)
While I can imagine worlds which improve upon this (working in Agda and Coq is a great way to see a glimpse of it) I honestly feel like my dominant way of working is conversational with the compiler.
I tend to write code for a bit, and then ask it one of two questions (1) what is the type of this thing or (2) is there any part of this that looks wrong?
In particular (2) is obviously type checking (which is great to run in a fast background loop like `cargo watch -x check`) and (1) is some system for "please insert an annotation on this variable with your current inferred type". This happens near constantly, and a very common "move" I make typing is to isolate some section and give it a temporary variable name just so I can receive that annotation.
The flow there could be better, to be sure.
The proposal here is kind of fascinating with regard to question (1), making it less interactive and more reactive. It reminds me of, say, working with Observable, where you make edits throughout your document and watch the dependents react.
To that end, being able to draw circles around fragments of code and have a persistent monitor that reactively outputs either "error: [reason]" or the type of that variable as I edit the code seems great.
Though I do feel a little uncomfortable with that being literally embedded in the code. Code being reactively edited by a program makes me itch with fear of file corruption.
> While I can imagine worlds which improve upon this (working in Agda and Coq is a great way to see a glimpse of it) I honestly feel like my dominant way of working is conversational with the compiler.
Agda programmer here. Not sure what you mean here. Although Agda does save a binary cache file to minimize type-checking latency (.agdai, "agda interface") with compressed type information, it does not have any kind of hidden state. This is very different than what the OP suggests.
Other than this, I agree with you. When I write code in a typed language I have a conversation with the compiler. I keep asking, is this ok, what do you think about this, what type is this... Which is why I find OP misguided. You can have a conversation with the compiler without hidden state.
Sorry, not referring to the whole post wrt Agda, just the vision of interactivity that proof search and tactics give you.
I think JetBrains actually does something like this. When writing Rust or Kotlin or C++ (these are the ones I have used) the editor will annotate the inferred types with the actual type and you can see it in the editor (with a slightly different font to differentiate).
VS Code also supports inlay hints for both inferred types, and parameter names https://code.visualstudio.com/docs/typescript/typescript-edi...
I think it's turned off by default for most languages though - probably considered unnecessary, as you can always just hover over the variable. Also, unlike the JetBrains IDEs, I don't think you can Ctrl+Click on such a hint to navigate to its definition in VS Code (like you can in normal code).
You can if the LSP enables it, I believe. In rust-analyzer, it's definitely possible to do this, for example, but I think that was a new feature added somewhere in the last few releases.
Regular Visual Studio has this now too, at least for C++.
Yeah, it is definitely nice to have inferred types surfaced to you, but in these systems they are still re-computed on every textual edit, as opposed to being "sticky" (persisted in the source tree).
Lots of IDEs do that for lots of languages these days, and it's wonderful, but I don't think that's the kind of thing they're talking about here in the article
Yeah they're talking about recording the type inferences in the source code. Interesting idea but I feel like it might fall down the same way all "not just text" ideas fall down - the whole coding world is built around plain text. Git, diff, GitHub, IDEs, etc.
Some inferred types in Rust can be pages long. Have fun with those merge conflicts!
I've seen a few experiments like this that store extra information behind the scenes. My first question is always how does it work with source control? Dealing with diffs/merges of that underlying serialization format would get fatiguing quick. I don't think the world will move over to something like this until there's a good answer there.
Source control does look very different in a projectional language, as git diffs no longer make much sense (viewing a pull-request with the source tree's JSON blob is essentially useless).
Unison is the language that's gone farthest with this, as far as I know; their solution to diffs & merges is to handle everything from within their CLI, bypassing git entirely. I imagine I'll do something similar.
I don't know anything about how you're storing the source, but if the data elided from view could be stored in files next to the source that were more readable than JSON, you could potentially take advantage of the whole git ecosystem still. People could put code up for review and the reviewers could even confirm if the additional (normally out of sight) information made sense to them.
FWIW, I'm totally on board with the idea of focusing on the interactive experience, but history has shown it to be very difficult for language environments not based on files to cross the chasm into more common use. (If you don't care about lots of users, then ignore this entire comment! Plus, it's always _possible_ that something will cross over, but we've got decades of people trying...)
Anyhow, I'm really excited to see where you go with this!
Oh hi kevin! So if there's a possibility that the "extra data" would get out of sync with the "source code" (e.g. via a git merge, or someone just editing the source code as plain text), then it unfortunately breaks the guarantees that this system needs :( I guess I could store a sha of each and fail loudly if I detect that one was edited "outside of the IDE" but again I think that would break the assumptions that git makes about diffs &c. I've tried to go down the "text but nicer" route before, and it always comes up short; so now I'm trying "text is not a supported representation" we'll see how far I get.
Hey Jared :)
Yeah, good point that text is a potentially fragile approach when combined with git. I think the Squeak Smalltalk folks figured out a way to retrofit git support, so maybe it's possible to go all-in on the representation you want and work out git support later if it seems useful enough.
If I may, I’d suggest instead of shunning git, which is widely preferred as people’s revision control, perhaps consider writing a diff plugin for git to render out the diffs nicely, and include that tool with the compiler/interpreter install.
This hits on the biggest problem with projectional languages, and why text has dominated: they don't interoperate well with any existing developer tooling.
So the language implementors have to write everything the developer needs: code editor, source control, code review, code browser, etc
You're completely isolated from the wider ecosystem of development, and the benefits so far don't seem worth it. As described here, you save on the reference resolution phase of linking, which is nice, it should be faster and eliminate some classes of error.
The nice part about text as a common format is that it's common. If every language has to implement every tool then it's an N * M problem. When tools can be reused across languages they become N + M.
A user of a projectional language is giving up git, Github, their favorite code editor, and depending on language-specific tooling to replace all of that. And even when the lang-specific tooling is sufficient, it needs to be relearned and recustomized (think all the configuration just in a code editor: prefs, theme, extensions like VIM keybinds and Copilot, etc).
Is the benefit worth the cost?
> bypassing git entirely. I imagine I'll do something similar.
So I'm assuming your language will have some sort of custom source code management system? If so, what do you do when a user wants to store a non-code file together with the source code, e.g. how web applications might have CSS files, fonts and image assets in the repo?
So the "repo" in this case is a database, where currently the only things stored are denormalized source tree nodes. I could imagine extending the database to support arbitrary assets, probably also addressed by the hash of their contents (same as top-level language definitions are).
What about actual comments in the source code? Or type annotations if the language supports them...
This goes directly at one of the paradoxes of typing.
On the one hand, one of the main benefits of types, and the only one for which there is empirical evidence, is the (automatically checked) documentation effect. For this, the types have to be visible.
On the other hand, not having to type in the types all the time and having them inferred instead seems like a major convenience. For this type types cannot be visible.
Compared to that, the inefficiency of having to do the work over and over seems like a relatively minor issue, though compile times are getting ever more problematic.
So I really like the idea of having types that can be inferred and persisted, though I'd prefer the language to make some room for having "provisional" type annotations, so basically using all sources for getting the types: the user, inference, the running program (dynamic languages can find out about actual types at runtime).
You already can have both, at least in certain contexts: In my IDE, when I write a function call, I can press Alt+Return to have the IDE assign the return value to a new explicitly-typed variable, with the type inferred from the call. Likewise, when for example the return type of the function changes, so that the type of the variable doesn’t match anymore, the IDE highlights that, and again I can press Alt+Return to have the declared type of the variable be updated to match. The point is, in those contexts I never have to actually type the variable type myself.
This is similar to what TFA proposes, and it doesn’t require storage of type information separately from the source code, or in some AST representation. The type information is simply “stored” in the plain-text source code itself, as type declarations/annotations.
I like this approach better than having the IDE merely render inferred type information alongside the source code, because (a) when the type information is part of the actual source code, then it is always visible, e.g. in diffs and source control tools, and (b) like TFA suggests it gives opportunity to interactively resolve type conflicts, and making those decisions “sticky”. With the right IDE tooling, you still wouldn’t need to type the types yourself mist of the time.
> In my IDE...
Sounds great! And very close to what I was thinking about.
What's your IDE?
> The type information is simply “stored” in the plain-text source code itself, as type declarations/annotations.
Yes, I think that's the right way to do it. I do believe that there needs to be some way to tell the difference between these three cases:
1. I put the type information there, that's really what I intended
2. The system put the type information there and I okayed it
3. This is what the system currently thinks, no human interaction
However, I believe this just from thinking about it, without having used such a system, so I may be completely wrong.
My background is also dynamically typed systems like Smalltalk, so the state of having no static type information available is also acceptable.
> What's your IDE?
NetBeans. Since Oracle dropped it to Apache, not a lot of work is done on it anymore, but I still cling to it due to a couple of things that IMO it does better than IntelliJ or Eclipse.
Regarding type inference, I prefer to be able to see the types of all variables right away, in all contexts, and therefore I’m not a fan of inferred variable types. It seems to me that those who favor inferred types mostly want to avoid typing (as in “pressing keys”) and problems with refactoring, as opposed to not wanting the types to be visible at all (because they are mostly fine when IDEs do display them). But I believe that can be adequately addressed by appropriate IDE support for inserting/updating type information in the source code itself.
An editor can allow seeing types without having them in the code.
If i got a weird typing issue and i have no clue about it, then i know my design is mostly wrong. That's one of usefulness of a static type checker for me.
This is really awesome, such a useful direction to pursue. Look forward to seeing where you take things