I recently quit a job I enjoyed with people I liked to independently research AI software design. This is a career pivot, so I want to lay out my thesis and research areas for this specific work.
Thesis
AI can write good small programs and bad large programs, but it struggles to write high quality, large pieces of software.
More controversially, I think that much of the software industry is missing obvious wins by treating AI simply as an oracle, rather than a new software concept that can be modeled with existing software design techniques, even if that means building new models.
I’m betting that there is enough open design space that as a software design obsessive without a PhD but meaningful experience on large, real world software, I can contribute to that specific problem.
Shoutout to @ninthhostage on twitter for using some AI tool to make a comic of this tweet of mine
Research Areas
This is a subset of the areas I’m exploring.
AI-first design patterns
This research all started with a simple pattern I stumbled upon that I’ve been calling “determinism invalidation”. The idea is that AI can generate enormous amounts of vanilla code, but that code often goes stale when the environment changes. The technique involves some kind of simple “gate” that runs at build/compile time, checks an environmental condition (has a version updated, has a file updated, is England still in the EU), which determines whether the script that comes afterwards is still valid. If valid, the script is executed/deployed; otherwise, AI steps in and rewrites the script.
It’s so simple that it hardly counts as a pattern, but it allows AI to be integrated into a software stack, while all runtime code execution is deterministic. I’m very interested to explore more patterns like this.
How to get AI to write high quality software
AI can code reliably, but building high quality software systems is still an unsolved problem. AI thrives when goals are specific, unambiguous, and easily testable. Real world software is often about building powerful, flexible tools that find the right balance of competing priorities. We need techniques and tools that bridge this gap.
This is my current area of focus and I have projects/writing I plan to share shortly.
Rebuilding the AI stack by modeling AI from first principles
My somewhat controversial belief is that basically all current AI tooling, from CLIs to skill files to OpenClaw to multi-agent orchestration systems are all answering the question “how should AI be modeled?” at the wrong abstraction level. I also believe this problem is solvable, and am exploring whether my instincts here are correct or if I’m oversimplifying.
From a modeling standpoint, there isn’t enough emphasis on modeling what AI or an agent can do within a software system, and then building a system that is maximally flexible for agents while providing the control and security boundaries needed to make them reliable production tools. Every solution solves a single part of this broader problem, without trying to solve the whole thing. I think the latter approach is under-explored, but potentially extremely powerful.
From an execution standpoint, this leads to a lot of systems that use LLMs to (hopefully) call deterministic code. I believe that is backwards.
This is very broad, so I’ll share more in snippets going forward.
What comes after monoliths?
Most large, multi-generational software tools, whether open source or commercial, are criticized by new users for being overly complicated and existing users for being overly restrictive. These issues, and many others, are ultimately because software has been expensive and labor intensive to build. When an existing tool needs a feature that doesn’t work with the existing data model, rather than rewrite half of the codebase, it has made sense to increase complexity. This makes it more useful, but also worse quality. Most popular tools have been through many generations of this process, so (I believe) most software we use is much, much worse than what is possible today by doing ground-up rewrites using AI.
This observation is obvious, but the interesting question then is “what does it look like to build a large system in the future?”. One approach could be that teams/companies essentially select the exact features they want from an open source feature list/product spec, and the tools/stack they want, and then AI “grows” a specialized tool. If needs or requirements change meaningfully, the tool is regrown.