Agent Harnesses are Just Shells

3 min read Original article ↗

When you put it this way, you realize that an agent harness is just a shell. Or, more precisely, it is:

  • A shell for running commands and controlling processes
  • Except it's "collaborative" (both the user and the model can use it) with a chat channel between them
  • And the stuff that would require a curses app, like editing files, is offered natively, since models can't use curses apps right now
  • Plus there's some random extra stuff like lightweight permissions, TODOs, and undo for file editing.

To me this is actually a pretty exciting vision of where agent harnesses could end up as the product matures. It could be a kind of "collaborative shell", where the user and the model are totally symmetrical, able to execute the same tools and commands, with the results (plus chat prompts) going into a shared context.

Ideally, both user and model could get the same UI treatment, maybe just distinguished by colors. I think this would really help teach people how to use the shell, a skill I see college students increasingly struggle to learn. Basically you could prompt the AI to do stuff, watch what it does, and then do the same stuff manually and have it work.

Plus, there are some nice UI affordances in existing agent harnesses that shells could adopt. Imagine if, when the user runs edit <filename>, the shell diffs the before-and-after to show a little inline diff. Naturally, edit <filename> would show a curses GUI for human users, while models would get a textual API. (For humans, it could dispatch to the user's preferred text editor, like Vim or Emacs, using the current $EDITOR method, but I increasingly find that the students I teach, at least initially, want a really simple Nano-style text editor.) Or think about the undo_edit feature, which I believe takes a reference to an earlier edit and reverses it. Humans could use that too, that would be nice!

Or consider permissions. With the AI model, it can be useful to restrict it to only touch files in the current directory, or to have it ask the user before doing dangerous stuff. But that's useful for humans too, just to prevent mistakes, especially for students. Naturally, over time, people gain confidence and don't want the hand-holding, but then I have also gained confidence in the models over time and also hold their hands less.

And one idea I've learned from the Zed guys is that collaborating with the model and collaborating with other humans is kinda similar and involves the same software substructure. I think that's true in this case too. A "collaborative shell" that involves two humans is kind of a niche product, but for doing deploys or debugging over Zoom or a few similar use cases it could be nice. If we get it as a byproduct of agent harnesses that would be great!