How I built an ad-blocker that uses AI to identify ads

A few months ago, a friend of mine living abroad asked me if I can build a take home assignment for a company he failed to solve. The assignment asks to build a ad blocker powered by AI. More specifically, the end state is a browser extension that has options to block or highlight ads detected on a page. I took a hard look and decided to have a crack at it

Defining the problem

We all know ads are nasty. They can come in different sizes and types like text, images, videos, gifs, Iframes etc. I suspected that video processing would be a hard problem, so I just sticked to identifying text and ad images. Another problem I quickly encountered was, after building a basic ad-blocking system, I observed that ads are not static, apart from being injected at page loading, the scripts loaded can inject ads any time after the page is loaded. So, my solution needed to accommodate that too. Another problem I faced along the way is performance. We need to have low latency while rendering the page. Asynchronous nature would be highly welcome. This ties heavily with model deployment strategy.

Let’s formulate our requirements

Detect ad text and ad images
Block them
The entire workflow needs to run when a page is loaded. So page loading performance should not be heavily impacted
The interface should be a browser extension

Initial Ideas on Architecture

Send the entire page to a backend to purge ads and replace the rendered html with new html in response
- This will work if all ads are loaded the first time a page is rendered, but as we saw above, scripts can inject ads anytime during the life cycle of a webpage. So this won’t work effectively
Send individual pieces of text, images to backend for ad detection
- Also, set up a mutation observer to observe new images or text content that is loaded on to the DOM
- We can also include the metadata such as class list, image alts and other attributes

(Meanwhile something strung up in my mind. A quick detour…)

I use ad blockers quite a lot, so I looked into the one I use; UBlock Origin. So the way ublock and other ad blockers block ads is quite simple, they have a community list of all the domains ads are being served from and they just block all the requests coming from them during the lifetime of a webpage. This community list is called easylist. However, because ad networks can deliver ad content through different ad domains, this doesn’t work for us. We shouldn’t rely on static domain blocking. After all, that’s why we are using AI right!

But this detour turned to be quite useful for me, because I discovered that instead of analysing content on DOM, we can analyse network requests and cancel the requests if the content is ad-related. Wow! So, I went and explored this direction before I hit a major road block that changes the game. Chrome disabled the API that allows extension builders to block network requests programmatically. Screw you Google!

I had no option but to choose Mozilla Firefox. But the upside of going on this detour is that I discovered even better flow, intercept network responses, send them to backend and cancel them if the server says that they have ad-content. Ofcourse, this would violate our low latency requirements since there’s round trip to backend before rendering request content on DOM, but this is going to be more stable than scraping data off DOM and removing it later.

This is the core of the problem, isn’t it?

Simple keyword matching doesn’t work, we need some sort of semantic matching. Let me preface my saying that I spent lot of time on image ads. That’s what we will focus on as they will provide maximum gains to our final product.

I researched online and came across 2 interesting papers on this topic.

The first is a system researched and developed by Brave Browser called PERCIVAL. To put it succintly, they embedded a Deep Learning Model in the browser rendering pipeline that classifies images and blocks if they are ads. As you can guess, it’s a instant no for my project because I don’t have the resources or capability to do this.

The second; I came across this model from OPENAI called CLIP or Contrastive Language-Image pretraining. Essentially, you give an image and set of labels, the model will tell you how closely the image matches a label in percentage. The higher percentage, the more likely it is to be the same as what the label descibed. Well, I can just have a set of labels such as ad, sponsored, Advertisement etc and pass them along with downloaded image from network request to the CLIP model. Based on the response, if any of the above labels have a confidence of say 75%, I will block the request.

I can do the same thing with a pretrained BERT based Classification model for text content. Specifically, in my final code, I intercepted network requests, passed request urls to a preloaded Xenova/mobilebert-uncased-mnli model that determines if a url is ad related or not. For example, a url that has the word ad will be blocked.

So, I set out to build this before I hit a problem yet again…

The Challenge of deploying ML models

For installing the clip model in python, I had to download transformers library which in turned downloaded close to 100 GB of CUDA, pytorch dependencies. (Don’t quote me on this, but my ubuntu disk space was out of memory while installing these). They took forever to install, the latency of sending a request to backend and waiting is quite bad even when I pre-loaded the model during startup. I thought there’s no way this thing can be deployed. By this time, I was getting frustrated and almost quit the project for a while, not to mention, I was using claude code at this time and it was rate-limiting horribly and it was hallucinating very badly. I put aside the project for some time before I picked it up again.

The Breakthrough

This time, I came across a library and a ML runtime that runs pretrained models directly in the browser without needing to have a backend. Not to mention, the size of the binary is very small. I think it’s about 1.3 mb or something like that. It’s called transformers.js powered by ONNX Runtime which can use your cpu and gpu through WASM. This is the lifeline of this project. And that’s what I did.

Screenshot from 2025-08-05 14-22-22.png

I tested all of my strategies on MSN.com.

In the right side, you can see the network requests being blocked by my extension with the following text

Blocked by AI based Ad Blocker

You can check out the code at https://github.com/AdityaSanthosh/perceptual-ad-blocker-extension.

The code is just a representation of my final solution. It’s not working flawlessly as I don’t have enough time to spend on this project anymore. Maybe I will revisit this project in future.

Architecture of the extension

The extension’s background thread loads the model upon initialisation. the content thread running per tab sends requests to background thread for classification and waits for classification response.

Load once, use for every tab. Less latency

Next Steps:

If you the reader is interested,

You can go the AI route and identify the right model/fine-tune a model for this use-case. Do some testing and publish the results
Improve the code
Polish the ux and final look of the extension

If you read till here, thank you. it means a lot