The Workshop Has Changed

Sunday Field Notes | May 24, 2026

Returning to software after building an institution, and finding that AI has made the old craft faster, stranger, and more dependent on judgment.

Portrait-style image of Rick Rubin in a recording studio.

The AGENTS.md file was open in one pane of the IDE. The test runner was in another, red but not catastrophically red. A planning note sat beside both of them, plain and procedural: inspect the boundary, write the missing test, keep the public interface stable, do not touch the data loader unless the failure proves the loader owns the bug. The instruction file was plainer still: “Discover current repo facts before editing. Prefer inspecting files, commands, and tests over guessing.” The repo had enough shape to resist casual edits. The assistant had enough context to be useful. I had enough history to distrust both of us.

This was supposed to be familiar ground, and in some ways it was.

For much of my adult life, software was the workshop. I spent two decades learning the craft: how to hold a system in my head, how to distinguish a local fix from an architectural lie, how to hear when an abstraction had become a self-esteem project. I read Uncle Bob Martin and Kent Beck with the appetite of a convert, got massively into test-driven development, and wandered from Perl to Java to C# to R in graduate school to C, C++, assembly, and Python for Army cyber work. Eventually I arrived at the only enlightened position language wars have produced: different tools are good for different jobs. I wrote C++ Crash Course because I loved the precision and danger of the work. The compiler had a moral quality. Tests did not care what I intended. Good documentation was a form of mercy for the future.

Then I spent years building an institution.

Building an institution expands the surface area of a day until the surface area is the job. Being CEO sits next to confrontation, disappointment, emotion, risk, and other people’s fear. For the emotionally attuned, that adjacency is weather. After hiring a CEO for Shift5, I came back to software with the strange relief of a person returning from war to a house that is still standing. The bench shrank the world. I needed that.

The bench was there. The tools were there. The rituals were there: read the docs, make the change, run the tests, review the diff, merge only what you understand. Some of the old hardships were gone, or at least diminished. Some of the tools were better. Some were so much better that they were disorienting. You reach for the familiar chisel and discover that someone has installed warp drive in the drawer.

The first time I felt this in a serious way was at a hackweek. I was building a seq2seq model for simulation and going back and forth with a major foundation model in what now feels like an earlier posture. The product problem underneath it was concrete. We often needed to help customers make better decisions from noisy operating environments: find clear signal in telemetry, infer failure modes, and move faster with more precision. But the scientific method needs data. Sometimes a customer has operating data; sometimes only a proxy; sometimes the product slice has to be demonstrated before the real environment is available.

The aspiration was to simulate close-enough-looking time-series data: patch plausible failure modes into traces, revise covariates so the result resembled operational reality, and create enough of a slice to unblock demos, early modeling, and product integration. Heuristics were too brittle. Classical statistical models were often the wrong tool. We wanted generative models because the bottleneck was making plausible worlds in which the product could begin to learn.

In that setting, the model was extremely useful. Not omniscient. Not magical. Just practically useful. I would ask for a snippet, a module explanation, a tensor-shape sanity check, or a pointer into documentation. It generated boilerplate I did not want to type. It read module docs without getting bored. It reminded me of idioms I knew but did not have loaded in memory. It did not replace judgment. It reduced the tax on mechanics.

The interaction changed when the assistants moved into the IDE. I was no longer consulting a model from outside the work. I was talking about what I wanted to build while staying inside the workshop. The repo, tests, and instructions were present. The assistant could read context, propose changes, ask for permission, discover that my request was underspecified, and wait with unnatural patience while I figured out what I actually meant.

That was when I noticed something unpleasant: I was often the bottleneck.

The system still made mistakes. Sometimes it was confidently wrong in the old familiar way, where fluency impersonates understanding. Sometimes it wanted to touch too much. Sometimes it would satisfy the prompt while betraying the architecture: reach across a boundary, widen a public interface to make a local case easier, or move responsibility into the wrong layer. It could be locally competent and globally destructive.

This is not hypothetical. If you do not articulate the acceptance test, build the harnesses, and explain the what and the why, you can get something that looks clean right up until the process loads. The friendly version fails spectacularly in front of you. The worse version runs quietly and fills a database with garbage embeddings of documents you cared enough to index. Goodhart’s Law is waiting in the repo: a system can obey the local objective while violating the thing it was supposed to protect.

But more often than I expected, the slow part was me. Me deciding. Me constraining. Me reviewing. Me remembering the shape of the thing I was trying to build. Me noticing that a fast implementation of a bad idea is still a bad idea, now with better indentation.

The question is not whether software craft has been abolished. It has not. LLMs did not make the old disciplines obsolete. They made them more load-bearing.

This is easy to miss if the conversation starts at the level of replacement or dismissal. Both frames are too blunt for the bench. The old disciplines still matter, but they are now operating under a new tempo. A vague instruction can produce a large diff. A weak premise can become twenty files. A half-understood system can acquire apparent progress before the human has earned confidence.

In that environment, the constraints and checks become the important part.

The AGENTS.md file is not mystical. It is an onboarding memo for a nonhuman collaborator. It tells the system how to behave in this repo, what conventions matter, which commands are expected, what boundaries should not be crossed casually, and where the sharp edges live. “Do not touch unrelated changes” is not bureaucracy. It is a guardrail against the seductive helpfulness of something that can edit faster than you can inspect.

I had left institution-building to return to craft, and the craft immediately demanded institutions in miniature.

Planning conventions matter for the same reason. So do tests, linting, architecture, pull requests, small merges, and review discipline. My working thesis is deliberately modest: because these systems are trained on human-produced artifacts, they behave, often enough, as if they have absorbed many of the patterns we use to coordinate human work. Not perfectly. Not because they are people. But often enough to make the old boring virtues newly useful.

The collaborator is still not human, and that matters. It is tempting to anthropomorphize anything that answers in complete paragraphs and appears to understand intent. It is also tempting to flatten the thing into a trivial tool because the alternative is uncomfortable. Neither posture is serious enough. These systems can participate in human-shaped workflows, but they still need hard contact with reality. The test suite is where persuasion becomes checked.

The compiler still matters. The tests still matter. The diff still matters. The ability to name what you want matters more, not less. What changed is the speed at which bad naming becomes code.

That speed is thrilling. It is also dangerous in exactly the way abundance is dangerous. It removes excuses before it grants wisdom. At low speed, sloppiness sometimes has time to be corrected by embarrassment. At machine speed, sloppiness scales before shame can catch it. You can produce more surface area than you can inspect. You can create more optionality than you can govern. You can confuse motion for evidence because the motion is now so cheap.

This is where Rick Rubin helps me. Not as a saint of taste, and not because software is music. Rubin’s useful provocation is plainer: he knows what he likes and what he does not like, and he is willing to be responsible for that distinction. The producer function is reduction and refusal. It is hearing what is necessary when abundance has made almost everything available. Software has correctness conditions, customers, and architectures that can collapse under attractive choices. But the pressure rhymes. When generation becomes cheap, selection becomes governance.

That does not make taste a hierarchy of human worth. We should be careful about ranking what coats someone else’s brain in serotonin, gives meaning, or makes a life bearable. People love what they love. The point is not superior taste. The point is that abundance makes selection unavoidable. A repo can hold only so many directions before it becomes sediment. A week can hold only so many experiments before it becomes evasion.

This is the part of building this way that makes me most uneasy. When I am exposed to someone building something interesting, I have a million ideas. This is not a compliment. It is a failure mode. For most of adulthood, I have learned to prune, unevenly but seriously: to say no not because an idea is bad, but because a life made of promising fragments is still a failed architecture. Without that discipline, I would have a museum of half-finished prototypes and a nervous system full of dangling references.

Sometimes I have even obeyed this lesson.

Now I am being forced to unlearn part of it. Scarcity discipline was not a superstition. It saved me from becoming a person with infinite starts and no finishes. But the old ratios have changed. Some ideas that once deserved immediate pruning now deserve a cheap prototype. Some mechanical barriers that once functioned as selection pressure have disappeared, and not all of those barriers were wise.

The problem is that the human is still human.

You cannot context-switch creative spark between ten live ideas. You can open ten branches, scaffold ten prototypes, ask ten assistants to explore ten surfaces, and feel for a moment as if optionality has become free. It has not. The cost has moved. It now appears in attention, taste, review, integration, sleep, and the low-grade shame of not remembering why one of your own projects exists.

It can feel, strangely, like being CEO again. One of the exhausting facts about leadership is the bioaccumulation of hard problems. The company’s hardest issues do not usually arrive as one grand strategic memo. They arrive every thirty minutes, already filtered by other people’s inability or lack of authority to solve them. By the time the problem reaches you, it is not just a problem. It is a distillate. Too many accelerated projects can recreate that feeling inside a single workshop: every branch returns carrying concentrated ambiguity, and every prototype asks to be judged.

This is not a reason to retreat. It is a reason to become more serious about the work: more explicit about constraints, more willing to kill, more suspicious of apparent progress, more attentive to the difference between a prototype that teaches and a prototype that flatters novelty. Warp drive does not remove navigation. It makes navigation less optional.

Product work adds another humiliation.

My taste has been wrong many times. Software is hard. People are harder. Product-market fit is effectively impossible to know in advance except in the fraudulent memoir tense, where every successful company later discovers that its path was obvious. I am usually best when solving a problem I have myself, because then the problem and the solution architect occupy the same gray matter. Feedback is immediate. The irritant is real. The taste is grounded in use.

But I am also weird.

Many of my problems are not normal people’s problems. The fact that I sincerely want to build a workstation around comically overpowered hardware absolutely not intended to coexist peacefully inside a desktop case - solving power-delivery problems for dual 350-watt server CPUs and an RTX 5090, designing custom water-cooling loops, and 3D-printing replacement mounting hardware because almost none of the original parts were meant to fit together - is not, by itself, market validation. It may be a beautiful problem. It may be my problem. It may even teach useful things about thermal constraints, packaging, fabrication, and the dignity of obsessive specificity. But it is not proof that anyone else should care.

The new tools make this more dangerous because they dignify the first mile. They let a personal itch acquire files, tests, diagrams, a README, and maybe even a decent interface before the harder question has been asked with sufficient cruelty: who else bleeds here? Not who can be made to admire the cleverness. Not who might say “interesting” in a conversation. Who has the problem strongly enough that the solution matters when novelty decays?

George Box’s old statistician’s warning belongs here: all models are wrong, but some are useful. A prototype is a model of a future product. It can be useful. It is not, by itself, evidence of demand. At most, it is a hypothesis sharpened enough to test. The danger is confusing a cheap artifact for a posterior update.

This is why selection is not decoration. It is governance. Rubin’s taste, at its best, is not mere preference. It is a willingness to choose, cut, and stand behind the cut. That does not make the choice correct. It makes the chooser accountable.

In the old workshop, scarcity did some of this governance for me. Time was slow. Boilerplate was expensive. Documentation had to be read by a tired human. A prototype asked for enough effort that many bad ideas died before they acquired names. That regime wasted good possibilities, but it also killed weak ones before they could become pets.

The new workshop is kinder to possibility and crueler to judgment.

I do not pretend to know where this settles. Part of the difficulty is that it does not stay still long enough to settle. As soon as I find a local equilibrium, the floor shifts. The model handles more context. The old constraint turns into plumbing. The discipline that felt advanced last month becomes the price of admission.

Yesterday’s discipline becomes today’s assumption; today’s advantage becomes tomorrow’s chore. What I know from the bench is narrower.

The useful part is not that the machine is always right. It is not. The useful part is that the machine is often fast enough to move the burden from typing to deciding. That sounds glamorous only if you have not had to decide all day. Deciding is where the craft hides. It is where architecture, taste, discipline, and humility become the same activity viewed from different angles.

Coming back to software after building an institution feels less like a technical upgrade than a return to a changed country. The roads are better. The tools are stranger. Some old hardships were just hardship and deserve no nostalgia. I do not miss typing boilerplate or hunting documentation by hand when the answer was mechanically available all along.

But I also do not trust speed just because it feels like liberation.

The workshop is familiar enough to welcome me and changed enough to unsettle me. There is still a keyboard. Still an IDE. Still a repo with a memory longer than mine. Still tests, docs, plans, lint, architecture, review, merge. Still the old pleasure of making something real.

Only now the room answers faster than I can think.

So I keep the planning file open. I run the tests. I read the diff slowly.