Introduction
I included the current month in the title of this article because we truely live in interesting times and the rate of progress in the LLM space including the open weights models that you can run locally on "consumer" hardware is truely phenomenal! The fact that I write this when May 1 aka International Workers' Day was only a few days ago is also not lost on me.
In this article, somewhat like a field report, I would like to show currently what is possible in the area of agentic software development with Dart using reasonable sized local models paired with a good agent harness (which can be just as important as the model).
The cast
So lets get down to specifics: what exactly have I been using? On the software side I have used:
- Model: Qwen 3.6 35B 4bit quant
- Context size: 132k
- Inference engine: Ollama
- Agent: pi-agent
- OS: Fedora 43
On the hardware side I am fortunate to have a Framework AMD strix halo 128GB so I could easily run both the model and agent all on the same machine.
The show
The task I decided to give my agent+model combo was to migrate my website from the long in the tooth DSG to picosite. While both are Dart based static site generators, DSG is a project I had taken on the maintaining after its original author stopped and did so for some years but had decided to stop maintaining about 3 years ago (though I did do a little update to it late last year). Meanwhile picosite is a new SSG I have written from scratch starting in mid 2024, with the initial purpose of writing the user manual for the picoTracker project.
Now the reason I chose this particular task is that it should be one that LLMs excel at, coverting from an existing form to a new one and it was also one that I knew would involve hours if not days of drudge work for me as while both SSGs are Dart based, I built picosite without any intention of keeping backwards compatibility so while they shared some similarities there was enough difference between them to require alot of work to migrate and I also knew that picosite didn't yet have some of the more advanced features of DSG that my website content made use of.
But not content (sic) with just a straight migration, I decided that once that was done I also wanted a new design for the website, so added that to the work the minion (my current preferred term for agentic LLMs) would need to do.
Finally while I have been very happy with my existing publishing setup of keeping the content in a Github repo and having the SSG run on Netlify to generate and then host the website, I wanted to improve my workflow slightly by being able to preview the site so I added yet one more task of creating a GH Action workflow to publish the content via GH Pages as well.
The process
To carry out the work I used my now preferred workflow on requesting that the agent first create a markdown document with a detailed implementation plan. This plan covered all the pages, templates, and configuration files that needed to be migrated, along with the new design requirements but not the github actions workflow for publishing via GH Pages, I added that later as an adhoc task.
The agent got to work and the plan itself was surprisingly solid, covering most of the cases I has in mind. There were a few minor refinements along the way where I had to point out something that the agent had missed - mostly around some of the more unusual features of my existing site that didn't have obvious equivalents in picosite yet, but nothing major. The agent was able to incorporate these refinements into the plan and the overall process felt very much like giving guidance while working with a knowledgeable colleague, just one who in this case just happens to be powered by a local LLM and so sometimes takes a while before they start to reply.
The result
The result was excellent - the migration was a complete success. All of my blog posts, the new design, and the new GH Actions workflow all came out workign nicely. The only minor hiccup was the poor choice by the LLM of initially using a third party GH action but that was fixed easily. The agent handled the tweaks required to go from DSG's templates to picosite's templating system, migrated all the markdown content along with fixing up missing or incorrect frontmatter that I had not even noticed previously and even managed to get the new design looking good after a few attempts which was more about my lack of clarity on what I wanted for the design rather than anything else.
I was particularly impressed by how well the agent handled the less obvious parts of the migration - things like the custom data sources and the way DSG handled folder listings. While picosite didn't have direct equivalents for all of these at the time, the agent was able to work around the differences, add the missing functionality to picosite and produce a site that was not only functionally equivalent but better structured than the old one.
All told, what would have taken me days of tedious drudge work was completed in a fraction of the time, and the quality of the output was genuinely impressive. The agent didn't just copy-paste - it understood the structure of both systems and made sensible decisions about how to bridge the gap between them. Importantly I essentially wrote zero lines of code. Of course that doesn't mean this was a completely hands off process from my side as I reviewed the work through out and looked as small specific pieces of code but as I mentioned above, the process was very much like assigning work to a collegue and then providing guidance and feedback on the work in progress.
But why not Model X ?
Of course there are other contenders for the crown of best local open weights LLM with Google's Gemma4 dense and MOE models having just recently been released, but while I havent had a chance to evaluate them thoroughly, my initial tests compared to Qwen 3.6 were disappointing enough that I chose to use Qwen3.6 over Gemma4 for this experiment. I do however plan to do more tests comparing the 2 very similar (on paper) models.
But wait, there's more!
Things are changing fast, so its quite likely there will be new, more capable open weights models released by the time you read this, even if you are reading this only days after I publish it! For example the "dense" 27B version of Qwen 3.6 was released recently but I had not noticed when I did experiment documented here so I plan to try it out over soon. Likewise having got llama.cpp compiled and working well on my Framework strix halo machine, I have started using it with the 8bit quantisation of the Qwen 3.6 35B as Ollama while being super easy to use only makes available the 4bit quantisation which should on paper be only marginally worse than the 8bit quant, but given I have more than enough ram to run the 8bit and the speed seems to be perfectly acceptable I want to get the best performance possible out of the model for the complex, long running agent tasks I'm currently using it for.
Concluding Thoughts
This experiment of course is no news to people using the latest frontier cloud models like Gemini, Claude and GPT via agents their respective agent harnesses including myself! But what is signficant to me is that for the first time since I began using agentic harnesses in early 2025 (yes just 12 months ago!) I am able to run a local model on my fairly modest mini-itx desktop that is able to drive an agent harness to carry out fairly complex and long running tasks! All while sipping at most ~140W of power. It certainly doesnt run at the speed or quality of the cloud models, but the fact that its mine and not a service that can have unpredicatable degraded performance, can be price increased by any amount or even can be cancelled on me at any time makes my local LLMs as the ads say "priceless"!
I'll have more articles coming up in the near future documenting my further use and exploration of using local LLMs with agentic harnesses so do subscribe if you want to be notified.
Until then, I wish you all a happy and safe today and tomorrow in these very uncertain times.
Stay up to date
Get new posts delivered straight to your inbox. No spam, unsubscribe anytime.