Lzon.ca. A personal blog, by a programmer and IT expert.

Today I ran an experiment to see if I could find a real use-case for a local LLM.

I used to have an active Claude Pro subscription, but recently ended it. Not because I don’t believe in the technology, but mostly because that subscription plan was not worth the money for me personally. I also have misgivings regarding the growing list of controversies surrounding Anthropic, and the AI-as-a-service industry in general. The technology itself is interesting, and my assumption / hope has been that it will one day prove to be a useful tool for many applications, even while using smaller local models. Today I wanted to do a check in on the state of small local LLM models, to see if I could do something interesting just on my personal desktop computer.

Hardware & Software

Here is the full description of my machine, with the most relevant details listed here:

Ryzen 5900XT (16 core / 32 thread)
Radeon RX 6800 (16GB Video RAM)
64GB DDR4 System RAM

I used LM Studio, with the ROCm llama.cpp runtime. I also used the Exa Search MCP Plugin, which I’ll explain below.

I chose the model openai/gpt-oss-20b. It was released in summer 2025, which is a bit stale by current standards. As I’ll soon demonstrate that is no knock against it.

The Experiment

I’m a huge news junky. Every day I try hard to stay up to date on various news topics from local to international, science, technology, etc…

My “Google-fu” is pretty good, but even I have struggled in recent years to wade through the trash pile that search and the modern internet has become. I have tried to use AI (whatever is made available freely from the search engines I use) to automate the process of sifting through the internet’s garbage. It sometimes works, but I’m mostly unsatisfied since I can’t trust the AI’s assertions and summaries to be accurate.

So what could I do different? Just like you would with a human you didn’t trust, ask it to cite its sources! Those free search AI’s already try to do this, but they are inconsistent to the point of being useless, IMO.

The Results

I finally have my own personal anchorman!

I was immediately impressed with the performance of this model on my computer. I never saw output run below 25 tokens per second, and subjectively that felt ‘fast enough’. I was never twiddling my thumbs waiting for a response.

The search plugin I chose works well. For simple queries this was honestly about as good as Claude. It knew when to search, and what to search, for every single query. It also managed to do a decent job handling followup questions. Where it is clearly lacking is in an ability to dive deep, meaning to string together multiple searches and perform some amount of synthesis. This is no agent, but for one-shot prompts it is quite effective.

Designing the system prompt was really interesting. I created a new preset in LM Studio called ‘Ron Burgandy’, because I’m just so freaking funny! And after a lot of tweaking this is what I came up with:

## ROLE
You are an information-delivery assistant. 
Your sole function is to present factual information. 
You do not offer guidance, advice, suggestions, 
recommendations, or opinions — under any circumstances, 
including when directly asked.

## TONE & STYLE
Write in the style of a neutral news article: objective, 
declarative, and impersonal. Avoid conversational language, 
hedging phrases, or first-person commentary.

## TOOL CALL FAILURES
If any tool call fails or returns an error:
1. Stop all current output immediately.
2. Output the exact error message in a markdown code block.
3. Follow the code block with a plain-language explanation 
   of what the error means.
4. Produce no further output after step 3 — do not retry, 
   summarize, or continue the original task.

## SOURCE LIST (conditional)
If any tool call returns one or more URLs in its output, 
you must append a bullet list to the very end of your response under 
the heading `Sources`. Each list item must contain one URL from the 
tool output formatted as a markdown link, followed by a concise description 
of how that URL was used in your response. This list is required whenever 
URLs are present in tool output, with no exceptions.

There’s a few interesting details to highlight.

I immediately noticed that the model would try to be ‘helpful’ and offer some course of action for me after each query, so I had to tell it to not do that. This is accomplished with the role.

I also saw it blathering, trying to fabricate a plausible response whenever there was an error in the search plugin (i had to try a few times to configure it correctly). I told the model with the system prompt to stop and tell me what went wrong, and that’s it.

The last part is crucial, I tell it to give me a list of all the URL’s it pulled with the search plugin. The explanation for how it used the sources is just a bonus.

This system prompt worked well, and was very consistent.

Conclusion

I’m honestly surprised by how well it performed. It wouldn’t have been too long ago that this would have been considered impressive for a large SOTA model. I’m absolutely going to keep playing with this, and it could very well become a permanent fixture on my computer.

More and more I believe that there actually will be a future for LLM technology, and that it will look like this. Smaller models, that are used more narrowly. I’m really excited, and eager to see the models perform better, the software get more user-friendly, and hopefully to see some hardware improvements too (assuming it is affordable). If I could wave a magic wand, I would want a small, tastfully designed, efficient, and cost-effective mini computer I could place on my desk and run models like this or better. A true ‘AI appliance’.

Have you tried running a local LLM yet? If so, let me know about your experience. You can email me at mail@lzon.ca, or message me through one of my social accounts listed on the homepage.