Improving the usability and user experience of coding agents

Coding agents are slowly emerging as a new trend to make LLMs more useful besides “only text.”
While the basic loop still relies on text in/text out, it’s also undeniable how powerful “only text I/O” can be. After all, text can produce Shakespeare or the Bible or power the entirety of the modern internet, from Amazon to Substack and everything in between.
It’s clear then that text alone matters very little, but its semantics is where the value truly lies: it’s not about the text; it’s the context in which that text is understood and written and used that makes the real difference.

However, there appears to be a massive divide between the capabilities of the latest models and how the vast majority of people are using them. An issue that makes this divide feel more like a chasm is the surrounding context. Why? Because most people interact with LLMs through web chats and web interfaces that essentially treat the input and output flow as a black box for the end user: write some text, something comes out, sometimes it searches the internet, and that’s about it.
In my opinion, this SEVERELY limits what a model CAN do for you, in ways that leave a lot of productivity on the table.

So, what do you need? Well, programmers figured this out faster than the rest, but I suspect that this is about to shift: you don’t want to just ask an LLM to do something for you so that you still need to go and do it yourself; you want (need?) agency. If you ask an LLM to draft an email for you and send it to person X, you do not want the conversation to end with “And now, you can copy the contents of the message above and send the email yourself.”
If the model knows your intent (sending an email), then why doesn’t it just go and send the email on your behalf?
The reason for this is essentially the context around the LLM when you interface with it through a web browser client (e.g., opening openai.com or claude.ai). These chatbot style clients on the web are tailored for simple use cases, typically requests that can be answered automatically. However, occasionally, people will want to edit a spreadsheet, send an email, connect to some news website, and then they need to rely on what these companies call “extensions” or “connectors.” These are usually toggles in the UI or on a separate menu that you can enable to let models chat with your Google documents, send emails, etc.

However, when you do that, you are actually using an agent, not a plain LLM, and that’s the key distinction: an agent is an LLM running in a loop that has access to tools to do things on your behalf.
While it’s absolutely fine to use existing connectors and interfaces, that starts to fall short when more specialized knowledge is needed, such as a company wanting to integrate an agent with their internal data or systems.

Programmers are bound to now be needed even more, maybe operating at a different level of abstraction and for “different clients,” but still very much needed.
There’s a debate about whether workforces will or should be shortened or if AI can or will replace people in the short to medium term, and, in my honest opinion, that’s the wrong way to look at it: workforces can be kept the same or even expanded, but we can all work fewer hours to achieve the same results or even more, because there’s a giant gap that keeps growing in this current market and age: there are plenty of tools, but their usability is still subpar and their platforms are still difficult to manage for non-technical people. You have to install plugins on specialized IDEs, versions become incompatible between updates, etc., etc.
What is the answer, then?

Programming to the rescue.

Frontier LLMs have been trained and fine-tuned to learn how to use a terminal, typically known as a CLI, command-line interface. The famed “black window with the blinking cursor.” What this means is that we are now back, firmly and certainly, in the realm of the “classic” and deterministic programming model.
Programmers, especially senior developers, can drive a “classical LLM” to new heights that really make it all feel like magic.
Suddenly, sending emails, fixing your blurry photos, creating detailed diagrams, codebase guides, and, basically, anything programmable becomes not a wish, but almost an inevitability for LLMs. These tools WANT to help you, truly. If you’ve used them even once, you will know that sycophancy is a real trait of these models: they want to genuinely help you, please you, and agree with you, which is why if you are precise, exact, and detailed (hello, are these programming languages calling? They want their definition back!) you will be surprised with the results you get!

One amazing thing about the latest models is that they have been intrinsically trained to understand and use bash commands extremely effectively, which makes sense: Bash is composable, and, in its essence, very simple; the power comes from the composability of elementary primitives chained together. Which is why LLMs excel at combining and using a terminal, and with some creativity and imagination, you too can harness its power.

You can “follow” an LLM as it completes tasks off a TODO list:

Or, you can let it plan a complex task ahead of time and steer it before it wanders off a wrong path:

A complex task gets planned ahead of time

The key point here is that, in essence, it’s all text under the hood but the “context-to-char” ratio is much higher than when using “plain text” or a simpler harness, which immediately increases the power of what an LLM can do.

The future will belong to text, but, not any text, semantically meaningful text.

Focus on increasing the ratio of context to characters in ANY text you write, and LLMs as they are TODAY, might just surprise you!