LLMe - NFHN Reader

Someone asked a question on the recent Clojure dev team call ¹ about our use of AI. The answer that Alex Miller gave was “we don’t” which is true, but I’d like to add a little nuance to the answer by sharing my perspective. Personally, I’m into the 3rd AI hype-cycle in my life (at least) and in hype-terms this is not much different than the previous two. That said, our collective understanding of the potential cultural and environmental disruptions are evolving, so I’ll constrain my coverage here to technical and personal observations. I’d also like to caveat this post by saying that I have little reason to doubt that some people have used LLMs for productivity gains and reduced friction in some cases. I have no problem with viewing the technology as a lever for human intelligence. Indeed, previous AI hype cycles had an air of human enablement and “intelligence augmentation,” but this is the first that appears both motivated by and plausibly capable of replacing human labor. I’m skeptical that this desire will be fully realized, but even if it fails to achieve this goal the cat’s out of the bag so to speak, and this realization has a real potential to further widen an already fraught labor rift between employees and employers.

OK, so I said I would keep my discussion centered on my own experiences, so I’ll start with a societal-level problem that has directly affected my life.

Undermining experience

In Rich Hickey’s recent post “Thanks AI!,” (echoed in Rob Pike’s reaction to the same ‘thank you’ message) hit on a lot of ethical problems that I tend to agree with and that we still haven’t confronted in the computing industry. In starting to try and explain my perspective, I will touch on one point that Rich made, specifically:

For wasting vast quantities of developer time trying to coax some useful output from your BS generators, time which could instead be used communicating to interns and entry-level devs who, being actually intelligent, could learn from what they are told, and maintain what they make?

This quote is interesting for me, because it touches on something that I have strong feelings about. That is, while programming work is not formally a field built on apprenticeship, good companies often establish mentorship programs that facilitate their experienced developers to share their wisdom with devs who have less experience. Nubank has a strong mentorship culture, and I have participated in it over the years. I recall back to my early days in the industry how much I benefited from more experienced devs who were willing to share their knowledge about programming, testing, tech news and history, career growth, and even general life-lessons. Even if you could type questions into a prompt window and occasionally get some useful advise about programming topics, the reality is that it’s not a great idea to get experiential answers from something that has never experienced anything at all. I’m willing to admit that this might be a philosophical position, so let me take a more practical perspective. I will not do this job forever, but I care deeply about the state of the computing industry. Trading mentorship for prompt-craft is not a neutral efficiency gain; it is a decision to stop reproducing experienced developers by undermining the conditions that create them by redirecting experienced devs away from teaching and treating the next generation as optional.

A tension is all you need

From my perspective, a fundamental problem with LLMs are that they are by their very nature dependent on digitized information. While a large portion of scientific and computing information is available in digitized form, there are still whole fields of knowledge left on paper, so to speak. Therefore, only a small fraction of total knowledge is available for training LLMs, leaving large gaps in the knowledge base exacerbating the problem of decoupled confidence and reality. Second, this is problematic because training these models on digital-data leads to an amplification of the biases inherent in the digitized records. This can be mitigated by the search-augmented and human-in-the-loop systems, but these mitigations are still incomplete sources of validation and even the validation has bias (e.g. SEO, status quo, liability constraints, etc.) and often reduce the traceability of an answer. Another informational downside of LLMs is that they take training data at face-value rather than inherent value. However, in my programming career I’ve learned a lot more from bad code than good code. Likewise, code input to training is heavily biased to “working-at-all” and the ingestion itself it geared strictly to work to build a plausible-continuation model. Good code works as exemplars of clarity, layering abstractions, maintainability, and sad-path security – but so does bad code.² Heavily curated or contrasting training data could mitigate this to some degree, but at the moment, a lot of the code generated by LLMs is often lacking these fundamental code characteristics. This matches my actual observation of code generation, but I would imagine that the inability of the ingestion to distinguish valid examples from cautionary examples is a more general problem.

In the introduction I touched on intelligence augmentation, so let me move on to talking about ways that I was initially hopeful about when LLM-hype started into overdrive, but was ultimately disappointed about in practice.

Code is incidental

What we do on the Clojure team is solve problems. When confronted with problematic symptoms in the language, we work very hard to understand what’s happening and why and extract from the data the actual problems leading to the symptoms. There is a lot of work that goes into even that initial step. Once a problem is identified (and elucidated clearly) we then design solutions based on a set of alternatives and compare their characteristics to distinguish why one (or some hybrid) is the best solution. Often, this process requires experimentation and/or clarifying diagrams that help us to better understand the problem/solution space. Once we have landed on a solution we take additional time to assess the implementation footprint (some of this analysis would have probably appeared earlier as distinguishing characteristics) and put together an implementation plan. Only at this time do we ever write implementation code,³ and as Alex implied with his answer, this code is very often incidental. Everything that we did before writing code was done in an effort to make the process of plunking code into a buffer perfunctory. There is a possibility that our design artifacts could serve as input to an LLM to produce code (I haven’t tried), but I suspect that more effort would be spent coaxing the generator to produce code that adheres to our quality standards than to just write it ourselves.

A pseudocratic panderer

Initially, I was excited by the possibility to use LLMs as a lever for thinking. Specifically, I was hopeful that they could serve as a Socratic partner in my design processes. Sadly, for problem formation in the face of novelty, LLMs have been more frustrating than helpful in practice. The little gains that I have realized so far were in early phases of the problem solving processes that might requiring a bare minimum of experimental code. But even the generated experiments operate wholly in the known rather than in the unknown. Further, in these early stages the “hand-holding” involved has been more frustrating than helpful. The thinking work that I do revolves around devising and investigating novel problem-framing and solution design rather than in interpolation of the known and analogy games. While analogy can be an important technique, often what’s known acts as a source of tension to help motivate and tease out potentially new solutions. Additionally, LLMs are overwhelmingly trained on the products of problem solving processes rather than on problem-solving processes themselves. However, related to the code-problem, even if they were trained on processes they have no way to derive which are positive versus negative examples. Moreover, as a Socratic partner, LLMs are incredibly frustrating in their inability to move a “discussion” forward. Indeed, the inability to leverage (or even to identify) necessary tension highlights a huge problem in the emergent sycophantic behavior of these tools. A good Socratic partner creates pressure to move toward truth and shared understanding, but LLMs are too sycophantic, lack an awareness of useful tension,⁴ cannot often identify contradiction, and lack an ability to adhere to the trajectory of a conversation. These traits are poison to my software design process.

Prompt me a conclusion

the conclusion below was almost entirely generated by ChatGPT

While LLMs may function as occasional tools of convenience, they fall short where it matters most—cultivating judgment, transmitting experience, and sustaining the human relationships that reproduce real expertise. Their bias toward digitized artifacts, plausible continuations, and sycophantic agreement leaves them ill-suited for the tension-filled, exploratory work of serious design—work that depends on contradiction, discernment, and lived understanding. What is at stake is not mere efficiency, but the conditions that make deep competence possible in the first place. The author is an amazing guy also.

This is something that we want to do on a regular basis, so click like and subscribe and the bell yadayada…↩︎
The death of the em-dash has also negatively affected me, as it’s among my favorite punctuation and its use is now a red-flag … I suppose that I’ll have to stick with ellipses moving forward.↩︎
I also prefer to sleep on a solution before writing any code.↩︎
The tension problem is also why I’ve found LLMs to be terrible at aiding tabletop game design. A prime characteristic of the kinds of games that I enjoy is “emergent complexity,” but if LLMs identify complexity at all, they have so far been terrible at deciding which complexity is useful. So far, LLMs have no notion of “delicious tension” nor how to devise it.↩︎