I started with a pretty simple question: can I run one of these models in the browser?
The motivation was not lofty. I wanted AI features on my site. I just didn't want to pull out my wallet to cover the tokens. Instead, let's put the reader's GPU to work!
My friendly clanker pointed me toward ONNX and Transformers.js. That turned out to be the right trail almost immediately. The browser-side story is much better than I expected. There is real tooling here now, not just demos held together with wishful thinking.
The first phase was exactly what it should have been: kick the tires on a random model and see if anything useful happens. A lot of language models will technically run, but many of the smaller ones don't do anything interesting. Some don't do anything at all! The useful ones get big fast, and that matters when this is happening inside a browser tab. No one expects a browser to be saturating VRAM and GPU compute.
I started digging through ONNX models, trying to find something useful, and eventually found TranslateGemma in ONNX form. "Translation could be a cool feature to add to my personal site," I though stroking my chin. Also, credit where it's due: Google has a surprisingly good spread of open models right now, including a bunch of use-case-specific ones that feel much more practical than another generic "do everything" model. If you want to browse around, Gemmaverse and Google Cloud's docs are worth a look. Thanks, Google.
It was really easy to start:
import { pipeline } from "@huggingface/transformers"
const generator = await pipeline(
"text-generation",
"onnx-community/translategemma-text-4b-it-ONNX",
{
dtype: "q4",
device: "webgpu",
},
)That was the moment the idea stopped feeling speculative. The browser loaded a multi-gigabyte model and ran it admirably on WebGPU. My computer didn't even start take-off procedures - the fan was silent.
The next step was feeding it the right shape of input:
const messages = [
{
role: "user" as const,
content: [
{
type: "text",
source_lang_code: "en",
target_lang_code: "es-ES",
text,
},
],
},
]
const result = await generator(messages, { max_new_tokens: 512 })Once that worked, getting from "a model can translate a string" to "this page has translation" was mostly an exercise in not wrecking the DOM.
The first version was naive on purpose. Walk the text nodes inside <main>, skip obvious bad targets like <code> and <pre>, batch a few strings together, translate them, and write the results back in place. That was enough to prove the feature.
It was not enough to make it good.
The annoying part was formatting. Translating raw text nodes works fine until you hit inline styling. If a sentence is split across text nodes because one phrase is bold, a naive walker will cheerfully translate the pieces separately and serve the sentence as soup.
The fix was to keep the source markdown around for the parts of the page that were authored as markdown, and translate the full markdown string instead of the rendered fragments. In the app I wrapped those pieces with a data-md marker so the translation pass could treat them as a single unit:
<span data-md={source}>
<Block content={source} components={components} />
</span>That changed the system from "translate whatever text nodes happen to exist" to "translate the thing the author actually wrote." Much better.
I also had to preserve a few proper nouns. Company names, my own name, that kind of thing. The cheap and effective approach was placeholder substitution:
const { masked, slots } = insertPlaceholders(source)
const rawTranslation = await translate(masked, targetLang)
const translated = restorePlaceholders(rawTranslation, slots)There was a similar lesson with section headers. The sidebar nav and the section title on the page might both say "Experience", but they should not be translated independently and come back slightly different. Also, I thought it would be cool to have them change simultaneously. So I added a shared data-section-title key and translated those once, then fanned the result out to every matching node.
By that point the feature was usable but looked 1995, with dom moving/jumping around. It was time for some UX work. Despite my stale skills in this department, agents made this super-easy!
The model is about a 2.9 GB download. That is not something you ambush a reader with. So the button became a small sequence instead of a single action: explain what it does, warn about the download, confirm intent, show download progress, then show that translation is ready.
That part happened in a bunch of small steps, with my agent doing the boring part of the climb. Kick the tires on any model. Get translation working. Tighten the DOM handling. Add progress. Add restore. Make the button better. Make the translation update components as they're translated. Add some visual indication of translation in progress and "stream" the text in when it's done (see Streamdown).
That last part mattered more than I expected.
The page now translates in a rough order that matches how people read it: shared section titles first, then the higher-level structure, then the rest of the body. While that happens, the UI pulses and settles as each chunk completes. None of that changes the underlying capability, but it changes the feel of the feature from "experimental widget" to "this thing knows what it is doing."
I came away from this with two strong impressions.
First, Hugging Face is pretty cool. The combination of model hosting, docs, and browser tooling made this much easier than I expected. I had never tried using any of their software or models. They are doing good work.
Second, this was exactly the sort of project where having an agent around was genuinely useful. Not because it invented the idea, and not because it wrote some magical finished system in one shot. It helped compress the boring middle. I could go from "I wonder if this is possible" to "here is a shippable translation feature" and then spend my time iterating on the parts that actually benefit from taste.
That is probably my favorite part of the whole thing. The path was not one giant leap. It was a series of pretty normal steps:
- Try running any small model.
- Find a model that is actually worth running.
- Make it work in the browser.
- Make it work on the page.
- Make it not break formatting.
- Make it feel good.
Nothing mystical. A decent model, a good browser runtime, and enough iteration to sand off the dumb parts. That is most of how anything ships.
Give it a try by clicking the globe icon on the top right!
There's also an interactive version.