My week with opencode | deadSimpleTech

29 min read Original article ↗

And this is the law I will maintain,

Until my dying day, sir

That whatsoever king shall reign,

Still I'll be the Vicar of Bray, sir!

What I write in this introduction may well shock some of my more sensitive readers: in the interests of investigation, I have spent the last week working on a number of projects with the help of opencode, an LLM-based coding framework. I've done this for a number of reasons, detailed below: this article is a summary of my observations.

Why is a notorious LLM critic doing this?

The first question that you might ask when reading this is, likely as not, "why the hell is Iris writing this? Has she gone soft in the head?". There is something to this: I have been pretty vocal in my criticisms of LLMs and LLM-generated code in the past, and I'm still extremely sceptical of the whole enterprise for a bunch of reasons. I have also not shifted an inch on LLM-generated text and 'art': it's terrible, it remains terrible and I really see no way to improve it. Why, then, have I shifted my opinion on code models?

The reasoning is as follows. For a considerable period of time, the biggest criticism of LLM code tools from an engineering angle was along the lines of "where are all the small, weird projects?". If the claims that LLM boosters were making about LLM-assisted coding were true, there should have been a profusion of people all over the place writing small programs to make their lives easier one way or another, and at least some of these programs should have been genuinely useful. For most of 2024 and 2025, we simply saw nothing of the kind, so I was distinctly uninterested in the tools.

Moving into 2026, however, I started to see actually useful projects built with LLM assistance pop up. rv is perhaps the prototypical example: it's a package manager for the R language that is much better than basically any other tooling available, at least from a software engineering perspective. It was also written with the assistance of Claude code and generally has the marks of GenAI all over it: I really wish they hadn't used a Ghiblified generated image as their icon. Moreover, I'm not clear that it would have gotten written without Claude being available at all: the overlap of "people who use R frequently" and "people who want package management tooling written in Rust that works from the command line" is small enough that writing the package manager and getting it to the point where it's production-worthy manually might well have been impractical.

Similarly, I started seeing intelligent, level-headed people whom I respect use LLM tooling to write various small projects that make their life easier. These aren't huge projects or anything: they're things like gym tracking apps, data processing scripts and apps to help in the kitchen. Nonetheless, they're still useful and it appears that LLMs make them very quick to build. Tellingly, while everyone whom I've seen making the LLM tools work for them is a) very smart and b) works in a field where clarity and rigour are of foremost importance, they aren't all engineers: it turns out that say, an analytic philosopher or an IR analyst can get pretty good results from the tools in question. Given that all of this was directly observable, I thought that it would be pretty intellectually dishonest to not at least try some of the new coding tools.

Of course, me being me I was unwilling to completely abandon any sense of principle, so I didn't want to use most of the existing tools (there's no level of "useful" that makes Sam Altman's conduct remotely OK, for example). To that end I chose to use opencode, an open-source LLM coding framework that works with a wide variety of language models. I tried it with two models: the DeepSeek-based GLM 4.6 (this one was cloud hosted) and the Flash version of the same model, which I ran locally via Ollama. While I'd like to run the full GLM-4.6 model locally, I don't currently have enough RAM on my machine, and as we might note, buying more is currently difficult and expensive. While the local model did well enough, it's noticeably worse performance-wise than the full model, so for the work I did I mostly stuck with GLM 4.6, but I'm of the view that in light of power consumption and relying on foreign services, a local model is the more ethical approach if you can use one.

I tried the tooling on three different projects: the first was a rudimentary social media automation tool that I'd written last year but had lost the motivation to develop further: my goal there was to see how well opencode did building on an existing codebase. Secondly, I used the tooling to write a basic front-end for the social media automation tool: this was to see how well opencode would handle a greenfield project with varying levels of direction. Finally, I pointed the tool at my website and asked it to clean up and implement a bunch of small things that had been bugging me for a while but that I hadn't had time to deal with: that was mostly for fun and to make my website look more like what the cool kids are doing (stupid, I know, but visual language is important and I need to make money).

I didn't really have a formal framework for testing the tooling capability: I just tried things ad-hoc to see what worked and what didn't. A more rigorous treatment would probably call for a more formal assessment framework, but that would take more time and thought. In any case, without making this drag on for too long, here are my thoughts:

Impressions

The good

It might surprise some of you to know that there were some good elements associated with the tool. I would not, however, be entirely honest if I said that the whole thing was unrelentingly bad: if nothing else, we need to understand why the proponents of the tooling like it and what they see in it, because there clearly is real value there under all the shit. Here, then, are what I think of as the good points of using opencode:

Modern LLM tools can consistently produce decent code

While LLM-assisted coding tools were pretty awful in 2023 - 2025, in 2026 code-specific models can quite consistently produce useful code. The first thing I tried doing with opencode was getting it to write a listener function for an RSS feed that would push a message to a Redis queue whenever the RSS feed updated with a new article. And over two prompts, it produced a working listener (I was cautious and asked it to print to stdout first, then modified it after the fact). The code was readable and made sense, and the tool generally made sensible decisions. Similarly, I was able to get the tool to write a method to effectively perform reads and writes to a PostgreSQL database with very little trouble.

The code isn't amazing, but for tools like this it's absolutely equal to what a motivated and fairly clever junior developer can do. Would I ask the tool to write high-performance computing or statistics code? Almost certainly not; definitely not with the state of the models at the moment. But for basic automation applications like "post an update to social media whenever you see something new in the RSS feed (which is probably how you saw this post if my analytics tell me true)", it's pretty effective.

The tools are great for boilerplate

In general, what us software engineers want to do when we get into the field is solve interesting problems with code and work on interesting core logic, which makes sense. What we often wind up doing, however, is writing an awful lot of boilerplate which is neither difficult, nor complex, nor compelling: things like form validation, exception handling and logging. These all have a certain pattern to them, but they're just different enough that you need to customise the logic a little bit each time. This means that, in practice, a lot of development time gets eaten up by, say, writing two hundred slightly different bits of exception handling logic.

Opencode with GLM 4.6 handles this kind of thing remarkably well, consistently handling exceptions, logging and form validation to an acceptable standard. I think this makes sense for an LLM: they're fairly simple patterns that are easy for a transformer architecture to learn, the logic that modifying them relies on isn't too convoluted and they're quite consistent across applications: all in all, this kind of thing is perfect for a machine learning model.

I don't think we can really say that there's much virtue inherent in manually writing two hundred slightly different exception handlers: perhaps it's a good way to learn how to write production code as a junior engineer, but for a working intermediate-to-senior level engineer it's boring, demotivating and generally a waste of time. If a current or future tool can consistently do this kind of "dot the i's and cross the t's" task that are often a real drag on developer time and morale, I think that would be a good thing.

They're a great morale booster for solo developers

Being honest, though all of the previous things mentioned are useful, the single biggest bit of value that I got from opencode is that it significantly reduces the activation energy that it takes to motivate me to actually write something. When I write code manually, especially when my mental health is in a bit of a slump, being able to ask opencode to write a basic feature for me makes it a lot easier to get into the flow of writing code. The fact that the tool often speeds up feature development to the point where you can get a useful feature working in a day rather than a week also helps you feel as though you're making progress, and while I'd guess that getting the feature to the point where it's production-ready would likely take about as much time as it currently does (you end up doing a lot of manual QA work when writing stuff with opencode), for personal projects what it produces is often good enough for our purposes. In the week I've been doing this, I've managed to get to the point where my social media automation platform does almost all of what I want it to do without issue, which is a significant quality-of-life improvement for me. For small shops and solo devs then, I could definitely see something like this being a useful tool to have in their pockets.

The bad

As much as these are real benefits, there's enough bad stuff in the current coding tools that I really don't think that they're anywhere near ready for production use. Here are the major issues:

The tools are terrible at writing greenfield code

When I first attempted to write the front-end for my social media automation tool, I simply scaffolded a basic SvelteKit application and began asking for (fairly specifically defined) features. It didn't work well at all: it produced a large volume of boilerplate code, to be fair, but approximately none of it did what I wanted it to, and this was with fairly specific requirements. In practice, it seems, the tools require a lot of scaffolding to start getting good results. This makes a certain amount of sense: if you present the model with two interfaces and ask it to write glue code to communicate between them, deducing how to connect them is a simple question of inferring syntax, but if you attempt to make the model actually design an interface by itself, it struggles like you wouldn't believe.

For my next attempt (in Nuxt), I first wrote a data model in SQL that defined what data was being stored, along with installing a UI framework (Vuetify) and writing two page views: one for listing all of my social media posts and another for creating and editing new posts. With that scaffolding in place, opencode was finally able to add new features relatively consistently, though even then I've experienced considerable issues.

Realistically speaking, to get good results out of opencode you need to specify quite a lot beyond simply the requirements and the framework to use: you need to make deliberate choices about your data model, your interfaces and the libraries that you wish to use for additional functionality. Linus Torvald's old adage about bad programmers worrying about code while good programmers worry about data structures and their relationships is, in this case, truer than it ever was before. In short, in order to get opencode to work for you, you basically have to already a) be an extremely capable programmer and b) have already written most of the project to start with before opencode becomes even a little useful. While small amounts of LLM code seem to do alright on the whole, the more of your project becomes LLM-generated, the more the tools begin to struggle, which means that if you want code that's at all adequate, you'll still be writing most of the code yourself.

The tools struggle with DevOps/infrastructure tasks and uncommon languages

As many of you will know, while I know how to write decent application software, I honestly find the whole thing quite boring: I'm a data/infrastructure girlie, and I care much less about what software's actually being run than about how it's run and on what platform. Just about the only application programming work that I really enjoy is high-performance computing and some kinds of front-end web development.

Given that, it's interesting to note that opencode really struggles with basically all of the tools I use for this kind of thing. It can write passable SQL, but that's about it: my attempt to get it to generate OpenTofu scripts for a PostgreSQL server were an unmitigated disaster: I have to assume that there just isn't enough Terraform or Hetzner in the corpus to let the LLM fit to it. There's a chance that they might do better with Azure or AWS, but if you can only use a tool for two cloud providers, that is quite frankly pretty fucking useless. Dockerfiles and compose files are just as much of a disaster: opencode will consistently choose base images that are outdated or not fit-for purpose (one noteworthy example was when it used an alpine base image for a uv project, not realising that it didn't include certain system dependencies important for some of the packages I was using), fails to reason effectively about systems dependencies in general and all in all just isn't as good as it needs to be to deliver DevOps code. The shell scripts that it writes are somewhat better, but still very odd, and given how close the shell is to the system, there's no way that I'm willingly running a shell script that an LLM generated outside of a sandboxed environment. CI/CD scripts are just as bad: the model really just doesn't seem to have a grasp on them at all.

Considering just how much of the difficulty in actually making use of real software is in building and deploying applications and maintaining the infrastructure backing them rather than simply writing whatever Node app you want, this is what we would call in technical terms a massive fucking problem. Of course, I can say that I'll only use opencode for application code and not use it to touch anything DevOps or infrastructure related at all, but believing that other people won't strains one's belief to its limits, and quite probably past them. In itself, this means that we really have to treat the use of opencode and similar tools with considerable suspicion, because while the worst that bad application code can do is introduce security breaches, bad systems code can run up massive bills or completely nuke your deployment.

The tests aren't adequate

Opencode, so far as I've gone with it, really struggles with writing decent unit tests. It can't infer what functionality needs to be tested particularly well at all, will often cheat to get tests to pass without actually checking the logic exhaustively and can't write mocks to API calls like, at all. Given that unit tests are one of those things that it's really important to have if you're letting LLMs anywhere near your code, this means that you spend most of your time writing unit tests rather than actually producing code. While this is generally good XP practice, it somewhat strains credibility to believe that your average developer who uses a coding tool like this for development is suddenly going to drop the tool and write all of their unit tests manually.

This means, in short, that writing production code is not going to be meaningfully sped up by opencode: in fact, it might will slow it down rather than speeding it up because you really need unit tests if you're aiming to produce decent code in 2026. And this gets even worse: because of this, LLM-generated code requires a lot more manual QA work than code written the old-fashioned way does. Personally, I think that getting rid of manual QA and testing was a bad idea and a false economy in the first instance, but with the increasing prevalence of LLM code, this is going to become an even bigger problem. I've been running my social media automation tool for a week now, and while it doesn't seem to be buggy in any big way, I'm constantly finding one smaller bug or the other that I have to correct. Proving correctness rigorously is simply going to take a very long time whether I like it or not.

It's easy to get into weird states

The most personally annoying issue I've found with LLM-assisted coding is that even strictly enforcing architectural standards and making sure that the model only gets small, controlled chunks of functionality to look at at any given time, code written with the model can still easily get into states that the model can't easily get its way out of. If it doesn't have a strong enough model of a particular interface that you overlooked, for example (and being a human, you will overlook shit), you can easily wind up with a whole class or a bunch of functions that just facially don't do what the model claims they do. They might look superficially similar if you don't look at them too hard, but under the hood they have approximately zero relation to what they claim to do. One example of this is with the front-end that I tried to produce, which wound up with a really weird and somewhat clunky front-end that I suspect it's going to take quite a lot of manual work to unpick.

Undoing this basically means deleting the whole thing, writing most of the class or method manually and then trying again: arguably this is probably more of a pain than just writing the method yourself in the first place. And of course, as mentioned earlier, a lot of users of these tools are just not going to bother. My personal experience of this phenomenon was that it's about as annoying as dealing with a junior co-worker with zero common sense, a lack of object permanence and an inability to understand instructions: as anyone who's been in that situation will know, a couple of hours of this will leave you feeling actively homicidal towards the entity in question. In the end, you'll either write the damned thing yourself or give up and leave the thing broken, and in any serious situation, neither of these outcomes are acceptable.

Feel like helping me keep my morals and contributing to a useful and important line of investigation? You can support me in doing so via Patreon, Liberapay or through a one-off donation through Stripe. Help me in my project of applying cultural criticism techniques to software, which is probably also helping with the political situation in some roundabout way (certainly more than donating to Chuck Schumer's campaign fund).

You can also sign up to receive email notifications and my monthly (somewhat more marketing-y) newsletter below. Strong reader counts help me a lot, and subscribing to regular notifications really helps me build a strong readership.

📧 Get new articles delivered to your inbox
🔍 Monthly insights on tech and business

The ugly

Finally, there's a category of things I observed about the code results that aren't bad as such (in the sense that they don't directly impede the use of the tool in practice), but that are weird, ugly or annoying. Here they are:

The tools are unbelievably verbose

One of the chief virtues when it comes to writing code is concision. In general, you want to achieve outcomes with the minimum possible amount of code. opencode does not do this: the first bias, as might be expected from a generative model, is always to generate more code rather than removing code that's unnecessary. This means that it's extremely easy to get an application out that's much larger and more complex than it needs to be, and it's almost impossible to get the thing to actually tone it down and generate only what's necessary. This necessitates a lot of reading code to confirm that it does what you expect it to, as well as going through and deleting a lot of superfluous shit fairly often.

This behaviour is more or less robust to anything that I tried to do to get it to stop, and it represents a serious issue. After all, the more code it generates, the more I have to review and the more likely a bug is to slip past, which means bugs, security risks, slow loads and a whole lot of other weirdness.

Code style tends towards the average

One thing I've noticed with the generated code is that it consistently tends to a style that I'd call, for want of a better descriptor, "Bay Area Standard". It's difficult to pin down exactly what about it makes me think that, but the variable names that it uses, the structure of the code and the general approach to the problem very much puts me in mind of "generic yCombinator startup". In particular, the model's obsession with emoji for everything, including CLI logging, weirds me out a little, and I'm a little curious about where it picked that up.

Similarly, when I asked the model to rework some of the CSS for my website, the design it produced ended up being a lot closer to your default Silicon Valley startup's page (or your average none-too-thoughtful developer's page) than what I had before. It looks better, for some value of better, and a lot slicker, but even when starting with some fairly odd choices the model tends to regress to the mean. I'm probably going to keep the new design as I think, somewhat cynically, that coming across more normie might make me seem less threatening to the kinds of people who actually have money to spend these days (principles, alas, don't pay the bills), but if you want to do work that's at all unique or creative, there's no real option but to keep LLMs as far away from your work as possible.

Average code isn't exactly bad, but my feeling is that excessive use of LLM tools is liable to lead to a monoculture in code style, which very much is. As ecology teaches us, monocultures are fundamentally vulnerable: all it takes is one disease or one piece of malware, and before you know it, the whole thing's gone to shit. Beyond that, it's very much not how I like to write code, so it annoys the crap out of me. And stop with the fucking emoji!

You'd better get used to using git properly

As alluded to earlier, getting decent results out of these coding tools requires that you follow best practice basically everywhere else: architecture, interfaces, tests, documentation... if you slip up on even one thing, the model will take it and find some way to fuck up a perfectly clear instruction. Even when you do get everything right, it still will a bunch of the time. Now, obviously following best practice is generally a good thing, but experienced developers will develop a certain level of judgement: we'll let things slide because either we know they can be sacrificed or we know that they won't be a problem until we can come back and fix them. LLM-assisted coding admits no such flexibility, and as such, for an experienced developer, it can be a real pain in the behind to get anything done at all.

While all of this is annoying, the most infuriating part was using git. It is true that frequent small commits are in general good practice, but the degree to which I had to take this to an extreme was ludicrous. Especially for layout changes, I had to work with very small prompts, check the results exhaustively and then commit the change if I was happy with it and restore it otherwise. I think I called git restore more times this week than I have in the three years before that. All things told, I think we can safely say that this just isn't a good pattern: too many commits is just as annoying to deal with as too few, and having to snapshot the state of the system every damned time you do anything with it is a real headache.

More generally, what I got from this is that LLM-assisted coding is only more flexible and more chill than doing the thing manually if you don't care about results at all. The moment you start caring about a specific output rather than something vaguely output-shaped, it all of a sudden becomes a whole lot more rigid and finicky than just writing the thing manually. And that's quite the opposite of what LLM assistants promise.

The ethics of the thing

I have to admit, there is a part of my brain that sees the potential of this tooling and wants to support it. After all, there are a lot of people out there fighting the good fight against fascism who could use their own software to do a lot, but don't have the resources to write it. One developer working with LLM assistance on a volunteer basis could write a lot of software like that, and if people don't mind it being rough around the edges, it could be useful. With proper model choice, I find it hard to say that that is in and of itself an unethical thing to do.

The tools also have some applications in IndieWeb and digital sovereignty spaces that I can't quite write off. After all, an LLM-coded application could plausibly go a long way towards getting people off American services, or even plausibly helping people set up a personal website who wouldn't otherwise have been able to. These don't seem like such terrible things.

Unfortunately, the reality is that on a social level consistent use of these tools probably isn't workable. For a start, using any LLM generated code in production or client work seems to me like a major ethical breach. We simply can't guarantee the correctness and stability of it well enough to do any such thing. If any such code does wind up in production work, it'd likely have to be manually reviewed, tested and refactored to such an extent that by the end of the process the whole code might as well have been manually written. However, there is about zero chance that the majority of users will actually bother to do any of that, which means that the creators of the tool are releasing it in the near certainty that it will be misused, which is in itself an ethical breach: by analogy, if a company sells arms to Russia knowing that they'll be used to commit war crimes, they are morally liable for the use to which those weapons are put.

There are also social side effects: while the coding frameworks are fairly recondite and a lot more specialised than the usual chat LLM interfaces, they do, in the end, rely on the same models with the same flaws and ghoulish issues that we've all heard about. We can mitigate this issue by choosing models carefully and running them locally, but only to an extent: in using a Deepseek-based model I am, in some sense, giving support to the PRC, and I have little love for Daddy Xi, all things considered.

Finally, using these tools is contributing to the massive social nuisance that is LLM scraping: given that we find ourselves having to put significant effort into protecting ourselves against scrapers, it leaves a bit of a bad taste in the mouth to be benefiting from the product of this nuisance, which certainly runs against the spirit of OSS, if not the letter of it. While I'm not always against the use of unethically built tools that are meant for evil (if I was, I'd probably not be wanting people to ship as many weapons to Ukraine as is humanly possible) I think that in this case use of the tools introduces significant moral hazard. If the tools become significantly more powerful and capable, this might change, but considering just how little value you get for your moral compromise at the moment, I don't think they're currently worth the cost at all.

In the end, the conditions to make use of the tools relatively morally acceptable are onerous enough that it is, on the whole, probably not worth it. You need an expert engineer who's willing to test and document everything meticulously, a strong architecture, lots of unit tests and a fair amount of the codebase already written. You also need an application that is highly useful while not being critical in the sense that accuracy is paramount, and you need a strong disaster recovery plan. Finally, you need to not have the resources available to you to do it the manual way. This is a set of conditions that almost never come about: the closest situation I can see to this being the case is with certain activist groups, and even then it's questionable.

Conclusions

So, what did I think now that the week is done? I can see why the tools are so seductive, and I have to admit that a couple of times I was struck by the tool having capabilities that I did not expect it to have. In the end, however, the tools are troublesome enough and questionable enough morally that I'd be unwilling to ever release code from them into the public, leaving them as, more or less, a somewhat unethical toy that one might use to write quick utility programs but nothing beyond that.

I'll probably keep a pet local model around to provide input whenever brain-rotted slop is required (the world seems to have an endless demand for it), but beyond that I can't see myself using the tools unless either my ethical concerns are allayed or the tools become so capable that using them would properly kill my career (given that a) my career is already pretty dead and b) not going along with the hype is not a valid reason to give in, I don't see this happening). All in all, getting the thing to work adequately requires writing enough code manually that it's not really much of a time saving.

Socially, I can't see any company that uses this kind of coding tool to any great extent to produce high-quality code: you can already see this with AWS, NVIDIA and Microsoft beginning to suffer the early stages of LLM blight in their outputs. Things break, they're inefficient and they don't work as expected. Of course, this isn't going to stop this from causing job losses, but I strongly suspect that if the political situation globally keeps going as expected, people are soon going to get a short, sharp lesson in why this is unsustainable. Inasmuch as these tools will kill certain industries, I think the likely first targets might actually be the likes of Wordpress and Shopify: commercial software that aims to let people build websites with minimal code. A decent web dev with a model can produce a strictly better website very quickly at this point, and given the quality of your average Wordpress or Shopify site... well, they're bad enough that the average LLM output might not actually be worse.

Looking further afield, I think that one of the major issues with the current tooling is that chat-type prompts are unavoidably vague, and while the process of writing seed code for it to build on works well enough, it's also not something that a lot of people might think to try in the first place. A first step towards making these models more usable might well be to train them on an intermediate language somewhere between natural language and code: that would allow behaviour and desired results to be specified more precisely without having to rely on actually just writing most of the code yourself. This would be a long-term project, though, and who knows how well it'll work? Tackling this from the other end, a large part of what I found the model actually useful for was generating boilerplate. Given this, this raises the question of whether we could design a language that would cut down on the amount of boilerplate we need to write: something that makes exception handling and form validation as easy as defining a function, for example, would significantly reduce the need for coding models and allow us to write better software in the process.

All in all, while I'm no longer as sceptical of coding models as I was, the supposed revolution that coding agents bring has singularly failed to materialise, while the negative effects are still very much there. For now, then, I remain unconvinced.