Embedding Tiny Language Models in Flink SQL
May 20th, 2026
I gave a talk at Current yesterday about how to embed a tiny language model inside your Flink SQL pipeline.
I used a fun mix of demos to show what I think are the main approaches available for using generative AI with Kafka events from a Flink SQL job. Some demos were definitely more sensible than others!
These are the slides I used, and what I’d planned to say.
In this session, I’ll be talking about your options for running language models for Flink SQL jobs.
I’ll cover:
- your options for where you run them, in relation to Flink
- what sorts of choices you have for the models you run
- how to use them – the sorts of prompts and settings we’d want for Flink
- how to keep an eye on it that it’s working well
- and finally, some thoughts on when it’s a good idea to do any of this
Tags: apacheflink, apachekafka, flink, kafka
Posted in code | No Comments »
Instrumenting a Kafka Connect connector with metrics
May 2nd, 2026
Metrics can help provide operational insight over Kafka Connect connectors, informing users of how to better configure them. With simple updates, a Kafka Connect connector can be instrumented to make this possible by emitting useful metrics.
A couple years ago, I created a simple skeleton Connect connector project to help developers at a hackathon create their first Kafka connector.
I’ve updated the source connector from that sample to emit metrics. In this post, I’ll walk through what I did, as an example for how to add metrics to your own Kafka connector.
Tags: apachekafka, kafka
Posted in code | No Comments »
How to create a Scratch extension
April 27th, 2026
A few years ago, I ran a workshop about how to create custom Scratch blocks.

I made a template repository, based on the Scratch Team repos, but with a skeleton extension and some extra scripts and automation to handle building and publishing it. I included step-by-step instructions for building different types of Scratch extensions, including Scratch blocks based on web APIs, and Scratch blocks based on JavaScript modules from npm.
Tags: scratch
Posted in code | No Comments »
22 years at IBM
April 26th, 2026
Looking back at my career so far, and what this could mean for what comes next…
Tags: career
Posted in ibm | 2 Comments »
What do people use to access Machine Learning for Kids?
April 23rd, 2026
I use Cabin for analytics on Machine Learning for Kids. (If you’re not familiar with them, their blog post on how to do analytics in a way that prioritizes user privacy is worth a read – the approach is simple but elegant. And you can see a demo of what a Cabin dashboard looks like.).
I thought it might be interesting to share what Cabin tells me about who has used Machine Learning for Kids over the last seven days.
What Operating Systems are people using?
| Operating System | Uniques |
| Windows | 404,873 |
| iOS | 132,971 |
| macOS | 67,848 |
| Android | 55,176 |
| Mac OS | 35,743 |
| Chrome OS | 23,536 |
| Linux | 21,852 |
| Ubuntu | 10,780 |
| Chromium OS | 8,484 |
| HarmonyOS | 408 |
| Raspbian | 31 |
| OpenHarmony | 17 |
| PlayStation | 13 |
| Tizen | 10 |
| android | 3 |
At work, I’m mostly surrounded by MacBooks and don’t often see a Windows computer. It’s easy to assume that is normal, so this is a reminder that I’m in a bit of a bubble. Windows is still dominant.
Interesting to see “macOS” and “Mac OS” separate (I was tempted to combine them, but I decided to leave the data I get from Cabin as-is.)
My favourite part of looking at this is wondering who are the thirteen people who visited my site from a PlayStation???
Tags: mlforkids, mlforkids-tech
Posted in misc | 1 Comment »
“How many Kafka events will Flink process per second?”
April 11th, 2026
I’m often asked this. The specific question varies, but it’s typically some variation of asking how quickly a single CPU of Flink processes events from a Kafka topic.
Why “per CPU”? Maybe because enterprise software is typically charged per CPU? Maybe because I tend to talk to people who run everything in Kubernetes, who think of running software in terms of requests / limits? Not sure, but the question tends to be framed from the perspective of asking how much processing they can expect to get from a CPU.
I try to avoid doing the engineer thing of answering “it depends“… but… it really does depend!
That is the motivation behind this post: to give me something I can point at as an illustration of the degree to which Flink’s performance varies (and a taste of the range of interrelated factors that influence it).
Tags: apacheflink, flink
Posted in code | 2 Comments »
Paying for image hosting
April 6th, 2026
I don’t take my blog very seriously. It’s a place where I leave myself reminders of things I figured out how to do, or share things I’ve done that won’t fit in a tweet Bluesky post. But even so, I get annoyed that my blog has often been offline.
My site’s host provider has a monthly bandwidth limit. When I hit that limit, my site is taken offline, replaced with an error page saying that I’ve exceeded my quota.
I’ll write a blog post that includes images, and if too many people look at the page too many times, the whole site goes offline for the rest of the month. (To be fair, it’s never a single post that does that – I don’t get that many hits! More often it’s when I’ve written a few posts in a month, and the last one pushes me over the line). Normally I end up offline for just a day a two, but it has been over a week before.
Last month, I finally decided to do something about it. I started looking at moving the images I use in blog posts somewhere else that wouldn’t count against my bandwidth limit. My blog isn’t serious enough for me to be willing to spend a lot on it, but I don’t mind paying something to make the worry about image bandwidth go away.
I searched for image hosting services, and started reading about services such as postimages.org ($14.99 a month), imgbb.com ($12.99 a month), sirv ($19 a month), and imagekit.io ($9 a month). Every service I found felt too limited, too expensive, or both.
Posted in misc | 1 Comment »
