Settings

Theme

AdFlush

dl.acm.org

276 points by grac3 2 years ago · 111 comments

Reader

pradn 2 years ago

What's fascinating here is AdFlush is a classical feature engineering approach: define a bunch of features on the data manually, and then use ML to figure out the most useful / impactful ones. This is not the "throw terabytes of data and see what happens" approach we see with LLMs. It's a bit funny to even point this out because I don't recall the last time a feature-engineered ML project made it to the HN front page.

Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.

And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.

[1]: https://dl.acm.org/doi/pdf/10.1145/3589334.3645698

  • tofof 2 years ago

    More specifically, page load time was 2.7 seconds without adblocker, decreased to 2.1 with uBlock Origin, but increased by 250% to 6.6 seconds with AdFlush, or increased to 3.4 seconds with AdFlush retaining prior predictions.

    The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs uBlock Origin, and it's not clear to me that this is a statistically significant difference. They do not claim it is.

    • blacksmith_tb 2 years ago

      That seems to argue for a first pass with a blocklist to filter out the well-known ad providers, and then possibly a followup step with the ML to catch things that are trying harder to slip by? But the extensions would have to cooperate to make that possible.

    • pradn 2 years ago

      Thanks for extracting the details. It doesn't seem like they'll be competitive with blocklist-based approaches like uBlock Origin, because their features are fundamentally expensive to compute - parsing JS and such, not just matching URLs against a list of regexes.

      • aembleton 2 years ago

        Seems like it could work in the background to build up new rules for uBlockOrigin to deploy

  • andirk 2 years ago

    I like the strategy of using flags to say "look into this suspicious part of the code" over a hardcoded block list. And also block shitty JS via "JS AST depth, average JS identifier length" etc even if it's not an ad but just bad code.

    For Brave browser users, you can see what hardcoded lists you're using at brave://adblock .

    As for the whole cat and mouse game, how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

    • dylan604 2 years ago

      > how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

      This has been my red line on where I will allow ads vs blocking them. If a site is hosting their own ads, that's acceptable to me. If they are using an ad provider, that is not. The newspaper example is my go to. If you wanted your ad in a paper, you called the paper and took out an ad. Today's equivalent would be every time you opened the paper, a slight delay while it randomly chose the highest bids for the ad space while potentially also inserting something that would slowly eat your hands. That's a nope.

      You are obviously in the camp that feels entitled to be able to read anything at anytime without allowing for a website to earn money by wanting to block all ads regardless of their origin.

      • andirk 2 years ago

        > You are obviously in the camp that feels entitled...

        Not at all. I use Brave and "shield down" websites that I like and generally keep their ad situation under control (incl. 3rd party). But your point of hosting vs 3rd party is a good one and especially because often one 3rd party connects to another.

        Likewise, I "block" annoying parts of websites like Yahoo Fantasy Football's enormous top nav that's not even an ad.

nomilk 2 years ago

AdFlush (F1 Score: 0.98) seems to do better than some other adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84), but it begs the question: why not compare to the most popular adblockers: uBlock Origin, Adblock Plus etc.

I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:

> However, manual maintenance of these filter lists requires significant human effort

Seems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.

  • Cthulhu_ 2 years ago

    The filter based adblockers are at risk though, with Google's new extension thingy that - at least a few years ago, I haven't heard from it since - limited the amount of rules. If there's a non-rule based system that is 98% effective then that would circumvent the arbitrary rule limits that Google set.

    • AlexandrB 2 years ago

      My understanding is that under manifest v3[1] only a list of rules is allowed. An algorithmic ad blocker wouldn't be able to work at all.

      [1] https://arstechnica.com/gadgets/2023/11/google-chrome-will-l...

      • GioM 2 years ago

        This is true. Extensions currently (manifest v2) are able to evaluate net requests dynamically, and are able to modify requests according to a dynamic ruleset that the extension can retrieve from some filter list published on the internet.

        Under manifest v3, extensions are not able to dynamically inspect requests, instead, they may only apply rules to net requests. Even worse, there is a limitation of only 5000 rules per extension!! [1]

        Even WORSE worse, under Chrome's manifest v3 rules, the extension cannot load any external code! Meaning that blocklists must be packaged with the extension. [2] Now, one might consider the reading of that link to no affect block lists, it's not a "library" and it's not "code" so long as it's just a list of textual rules.... however, google considers the following to be a violation: "Building an interpreter to run complex commands fetched from a remote source, even if those commands are fetched as data". [3]

        Sneaky sneaky. An extension update (and hence new app store submission) is required to update filter lists.

        In other words, dynamic net requests are banned, and remotely-updated blocklists are banned as well.

        [1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...

        [2] https://developer.chrome.com/docs/extensions/develop/migrate...

        [3] https://developer.chrome.com/docs/webstore/program-policies/...

        • nolist_policy 2 years ago

          Chrome allows at least 30000 static rules + 30000 dynamic rules[1].

          [1] https://developer.chrome.com/docs/extensions/reference/api/d...

        • uyzstvqs 2 years ago

          If Manifest v3 is really this bad then it's probably still possible to build adblockers by DLL hooking the browser. It should also not affect browsers with built-in adblocking like Brave and Vivaldi.

          • jgalt212 2 years ago

            > it's probably still possible to build adblockers by DLL hooking the browser.

            I like this. or possibly the COM API. but I'm not a Windows expert.

        • zx8080 2 years ago

          How complex is to revert changes to manifest to bring supporting v2 back to Chromium? Or is it intentionally made super complex by Google?

          • HWR_14 2 years ago

            Microsoft decided it was prohibitive for them. So probably overly difficult.

            • zx8080 2 years ago

              I would say it just works for them. Considering they show ads in the Windows Start menu now.

    • Centigonal 2 years ago

      If Google's goal is to thwart adblockers by creating limitations on what browser extensions can do, then creating a browser extension that blocks ads within the current set of limitations is a temporary solution at best.

      • avmich 2 years ago

        Google doesn't control the browser, user does.

        • Centigonal 2 years ago

          Google controls the APIs that extension writers can use. They are currently using that control to impose limits on what adblocker extensions can do. [1][2]

          You could download the Chromium source and patch it to change the extensions APIs (or better, just use Firefox), but the majority of users won't do this, and extension writers aren't going to make a version for a patched Chromium browser unless it has significant market share and support.

          [1] https://nordvpn.com/blog/manifest-v3-ad-blockers/

          [2] https://www.eff.org/deeplinks/2021/12/chrome-users-beware-ma...

          • ndriscoll 2 years ago

            You could always provide an extension that loads itself as a .dll/.so. I don't see much difference in friction between adding an extension through google's website vs. download setup.exe from somewhere. Of course like you say, using less user-hostile software is preferable.

            • babypuncher 2 years ago

              Such extensions would be trivially easy for Google to break with Chrome updates. You also cannot distribute an extension like that through any of the usual extension stores.

              Better to just use a browser that actually respects its users.

            • ysavir 2 years ago

              That might work for highly tech savvy people, but that's a very small minority of users. Google will still make ad blocking near-impossible for 99.99% of its users.

        • jonathankoren 2 years ago

          Firefox has 2.9%. Safari has 18.12%. Everything else is Chrome or reskinned Chrome, with Chrome itself being 65.3%.

          Unless you’re running that 20%, Google controls it, and they basically write the standards anymore.

          • avmich 2 years ago

            Oh, of course if you run Google-written software without modifications, you're not really controlling it. So if you want to control it, either go inside and tinker with the code, or - easier? - switch to a non-Google browser.

            I thought this is rather obvious, at least for those worried about experience. Do you think all those who realize they're suffering from ads don't think about using non-Chromium browser?

            • jonathankoren 2 years ago

              I honestly don’t think they think about a nonchromium browser, and if they do think of it, they reject it for unfounded reasons. If they did use a nonchromium browser, Firefox would have a larger market share.

          • tomjen3 2 years ago

            And if addblocking doesn't work on Chrome, Firefox usage will go up.

    • 4ggr0 2 years ago

      I guess that's why uBO Lite exists :) I started using it a couple of months ago instead of Ublock Origin, and still haven't seen any ads since.

      https://github.com/uBlockOrigin/uBOL-home

      • ladzoppelin 2 years ago

        I think eventually there is nothing that can stop certain adds on Chrome once specific API's are removed, even using manifest 3. Maybe someone could chime in on this as its really confusing now since Google keeps pushing back the date to remove manifest 2. (This might be outdated info)

        • downrightmike 2 years ago

          We'll create a shim to render the page in the background and use AI to remove ads and then serve the result to the user, at the least. Fuck ads and malvertising

          • specialist 2 years ago

            Yes and: There will be a tipping point where it'll be easier to allow the content rather than blocking the garbage. Dynamic screen scrapping, more or less.

        • 4ggr0 2 years ago

          Yeah, it generally does feel like a "Catch me if you can" situation. I'm sure that there will be different ad-blockers once those APIs are removed, as there seems to be a very strong desire from some people not to see ads.

          I hope we'll not end up in a DRM-like system where ads are somehow really baked in and content stops working for lay-people if they try to circumvent ads.

        • efdee 2 years ago

          And that will be the day Chrome dies.

    • Gud 2 years ago

      They day Google starts blocking ad blocking users is the day the exodus starts from Google services.

      • inversetelecine 2 years ago

        I think you're overestimating the number of people who 1) care and 2) use adblocking extensions or any extension for that matter.

        Google knows what will likely happen, and pays people lots of money to know.

        • Spoom 2 years ago

          Without commenting on Google[1], I think this sort of thing is true in the short term but less true in the long term. I expect that, were Chrome to ban ad blockers, technical folks will start to teach non-technical folks in their orbit how to e.g. install Firefox to regain ad-blocking capability. I think it would take some number of years but there would be a pushback in the medium- to long-term.

          1. Googler, opinion solely my own.

        • treyd 2 years ago

          They'd massively alienate a large and motivated subset userbase with the ability to build viable alternatives to Google products or at least build more active means to cirvumvent their platform restrictions.

        • TheNewsIsHere 2 years ago

          I think you are unfortunately correct about this.

          I am consistently blown away when I inadvertently experience the Internet without ad-blocking. It’s absolute garbage.

          I am sad that people are either OK with this or don’t care. For many they don’t know any better, and asking many of those same groups to install and manage plugins is a fraught request.

        • efdee 2 years ago

          Why do you think everybody switched from IE to Chrome? Because their tech friends told them to or did it for them.

          The day Chrome can't sufficiently block ads anymore is the day Chrome dies.

        • rustcleaner 2 years ago

          Do you remember IE exodus to Firefox pre-2010? Yeah Google better watch its hyperback.

          • CatWChainsaw 2 years ago

            They learned from Microsoft's mistake and most browsers run off the Chromium while they have Firefox by the balls with their default search engine deal. Not to mention Firefox is hellbent on snatching defeat from the jaws of victory.

      • bityard 2 years ago

        I don't know what you mean. They are already blocking adblock users on YouTube and there is certainly no exodus happening there. A few people complain about it and get a handful of upvotes on social media from their friends, but it hasn't even come close to rising to "backlash" status.

        • Gud 2 years ago

          Are they? I block ads on YouTube and I’m still allowed watch videos.

          I suspect they have silently stopped blocking ad blockers.

          I remember there was a lot of reports about this being the case, but there is no way I am not blocking Google.

      • MajimasEyepatch 2 years ago

        I suspect that such a move would draw significant scrutiny from regulators, potentially far outweighing any impacts from users switching browsers on their own.

    • babypuncher 2 years ago

      Real easy problem to solve by just switching back to Firefox

      • shpx 2 years ago

        The first thing you see when you open Firefox is an ad for Amazon and Expedia.

        • dbsmith83 2 years ago

          I don't. Are you talking about 'sponsored shortcuts'? You can turn those off in the settings. It's on the first page you see when you hit the settings button in the top right

    • klaussilveira 2 years ago

      Isn't this the case for a bloom filter (vacuum maybe)? You can have very few rules.

  • RamRodification 2 years ago

    > only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others

    Is it that easy? Sounds very abusable

    • rvnx 2 years ago

      Yes, and some list maintainers accept money to add or remove you from the list (officially, or officiously through a secondary maintainer, depending on the list), but otherwise it's no different than getting a domain marked as malware or phishing (with a few paid editors on Phishtank or VirusTotal).

      It's easier to get a domain added than removed. and for the "corruption"/"rackeetering" part, it's a "win-win" for the adblockers and the list maintainers.

      Adblockers also often pay browsers to be integrated by default (AdGuard, Adblock Plus, etc), and then they negociate with publishers to whitelist some domains (not necessarily the most obvious, can just be analytics).

      "We offer your domain to be unblocked on xx millions of devices by default, this will create you a uplift of revenue of +yy%"

    • kmlx 2 years ago

      yes, one of my clients was hit by this and i was tasked with solving the situation.

      i had to create a ticket in a repo explaining why blocking a whole domain instead of a single subdomain was actually pretty bad. they approved it and reverted the change.

      finding where exactly i had to open the ticket and what to write was a “down the rabbit hole” experience.

      • pbhjpbhj 2 years ago

        Domains are cheap, don't serve content on an ad domain maybe?

        Sounds like perhaps your task was to ensure a company's ads got through an adblocker?

        • kmlx 2 years ago

          my task was to rectify an issue in one of these crowd sourced lists of ad servers.

          they were blocking a whole domain instead of blocking the ad-serving subdomain.

          the issue was rectified, the main domain was replaced by the ad-serving subdomain.

          • hathawsh 2 years ago

            Still, as pbhjpbhj suggested, if I were publishing both content and ads, I would consider publishing the ads on a different domain (not just a subdomain) to reduce technical issues. Domains with ugly names are very cheap.

            • kmlx 2 years ago

              of course, and this is a valid proposal. but that was outside the remit.

        • __jonas 2 years ago

          You could be right but you are definitely jumping to a conclusion here.

          The default lists used by uBlock for example include things like error tracking telemetry, Sentry for example.

          I can see why people want to block that stuff (privacy) but it’s not exactly an “ad”

    • fckgw 2 years ago

      Yes, but the effects of that abuse are observable and easily fixable. If suddenly a whole site goes offline for a bunch of people a change like that is likely to get reversed very quickly.

  • _al_ 2 years ago

    there is an entire section in the paper sub-titled: Comparison with uBlock Origin..

  • 1oooqooq 2 years ago

    practical solutions don't get you published

    • ko27 2 years ago

      "Practical solutions" also leave you vulnerable to cat and mouse games against sites that block or bypass adblockers (even with ublock origin). The end game is to have heuristic/AI adblocking which would directly hook into browser rendering so that it becomes undetectable. Obviously leading browsers do not support this for extensions, but forking Chromium wouldn't be so hard.

      • 1oooqooq 2 years ago

        "doing thing X work and everyone uses it, so bad actors invest time against things X. While thing Y isn't used by anyone so bad actors aren't spending time to work around it, q.e.d. we prove thing Y is better".

        i don't really buy your argument

YmiYugy 2 years ago

Without comparison to the accuracy of crowed sourced blocklists it's not that valuable. Maybe there is a group of hopelessly overworked blocklist maintainers/contributors, that I'm not aware of. If so, their cries for help don't seem to make the HN front page. From a user perspective, blocking banner ads feels like a basically solved problem. I think the real pain point here is that for large chunks of the web, there is no distinction between ads and content.

  • JAlexoid 2 years ago

    There will never be a solution to native ads. It's part of the content you choose to consume, that someone produced.

    The only way to avoid native ads is to stop consuming content that relies on ads.

    • nemomarx 2 years ago

      Stuff like sponsor block works pretty well? If the native ad is seperable from the rest of it you can just skip ahead, and most of those things are still a sign posted sponsor break for now. I can imagine extensions to do something similar in articles by removing affiliate links, etc.

    • YmiYugy 2 years ago

      I think it depends on what solution space you are willing to explore. There is the possibility for regulatory action that restricts native ads. It's seems plausible that a flood of AI content tanks the prices for native ads, so some might pivot to original content + regular ads, which might also become more profitable if regulatory action weakens the oligopolies of that space. Aside from high level market shifts and regulatory action, there is of course also the possibility of technical solutions that can help you to avoid native ads.

    • yjftsjthsd-h 2 years ago

      That really depends on what you mean by "native ads"; if you mean "blog posts that appear legitimate but push a product" then maybe not (although I wouldn't totally rule it out with LLMs), but if you just mean that the ads are inline I have to disagree since ex. SponsorBlock already exists.

    • cess11 2 years ago

      In some jurisdictions advertising has to be named as such, there it will be at least theoretically possible to create filters if the platform is compliant.

    • 93po 2 years ago

      or have LLMs recreate the content without the native ad

    • beefnugs 2 years ago

      That is nonsense, if we know about 10 exact brands by name, then we can block their mentioning anywhere

3abiton 2 years ago

> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98

Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?

dale_glass 2 years ago

The future is here.

If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.

Or something along those lines, it's been a long time since I read it.

karaterobot 2 years ago

Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.

The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"

Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.

I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.

Night_Thastus 2 years ago

Always a joy to see efforts in the ongoing battle against advertisements.

There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:

They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.

Death to advertisement! On billboards! On television! On the internet!

Ads are a parasite on the human mind that need to go away, forever.

  • Terr_ 2 years ago

    Ultimately it's about where we draw the line for hacking other people's brains.

    It's a spectrum: Some level is an unavoidable part of communication ("I like dogs" forces you to think of dogs) some more is considered normal and traditional manipulation ("My food smells nice, that makes you hungry, wanna buy?") and then it goes on into grey-areas, scams, and eventually to potential extremes like "this image induces nausea" or "this sound knocks you out".

  • btbuildem 2 years ago

    They are a scourge and a tell-tale sign that we've grown far beyond excess and into absurd territory where more effort is spent on bending our minds to consume a thing that it took to make the thing in the first place.

  • CatWChainsaw 2 years ago

    Careful, apparently not wanting your mind polluted with psychological manipulation makes you a filthy communist..

  • p3rls 2 years ago

    Death to small media companies! You should have gotten some VC money if you wanted to make products for people, you poor pieces of shit.

tjpnz 2 years ago

I use a combination of UBO, PiHole and AdGuard on my mobile devices. Can't say I've seen an ad in the last year. Is this trying to solve an existing problem or speculating on where things could go in future?

  • rgrmrts 2 years ago

    I’m curious why you’re using 3 separate methods. Do you miss things with just one? AFAIK all 3 use similar block lists and are configurable.

    I’m building a pi-hole type solution for myself and essentially want all the filtering and blocking to happen at my firewall and not on my client (phone, laptop, tablet).

    • bluish29 2 years ago

      I think pi-hole (Adguard home) is useful dns level ad blocker which can be used on network/router level. But it is limited, UBO provides you more flexibility to block cosmetics and certain ads that cannot be done via dns. There will be overlap of course but it is worth it. I agree that adguard here seems redundant and UBO itself recommend against using another ad blocker to avoid interference and websites adblock discovery.

      However you might end up using

      1. pi-hole on router

      2. Adguard as device level DNS

      3. UBO on Firefox (android only)

      It is possible but not recommended and wasteful. 1/2 and 3 is enough.

    • tjpnz 2 years ago

      AdGuard is for things I take off the home network, for example when I'm at work. It's true I could use AdGuard for both scenarios but I do like the additional visibility and configurability Pi-Hole provides.

    • Night_Thastus 2 years ago

      uBlock only works in web browsers. It doesn't work in phone apps, smart TVs, anything integrated into the OS, etc.

      That's why I use uBlock and PiHole, which I deem is enough.

alexcason 2 years ago

Looks like this is the associated repo on GitHub: https://github.com/SKKU-SecLab/AdFlush

infogulch 2 years ago

So AdFlush beats uBlock Origin with a marginal detection rate advantage of 0.86 vs 0.84, at the cost of significant performance overhead: median 2.7s load time (no ad block); 2.2s (uBO); 6.6s (AdFlush clean); 3.4s (AdFlush cached).

I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.

jarbus 2 years ago

I didn't realize this was an active area of research, love this.

cimnine 2 years ago

So, this begs the question when we'll see ML put in place to avoid AdBlocker detection. Or ads as we know them just disappear from the web and are replaced with other kinds of ML-enabled ads. I imagine deep-fake models used for interchangeable product placement in videos or pictures or so.

h4kor 2 years ago

How does this compare to list based solutions? An overblocking/underblocking comparison would be great

gastonmorixe 2 years ago

Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.

rpastuszak 2 years ago

Oh boy, that didn't take long. Just last year I made Butter https://butter.sonnet.io as an excuse to talk about this:

> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!

mannycalavera42 2 years ago

https://chromewebstore.google.com/search/adflush

https://imgflip.com/i/8s3nur

Havoc 2 years ago

How realtime is this? Or well enough to not be noticeable while browsing

  • mrbluecoat 2 years ago

    I'd be okay with a hybrid approach: lists for real-time blocking and machine learning for passive analysis to augment the lists over time.

flakiness 2 years ago

This can be a Copilot+PC's killer feature :-)

seized 2 years ago

> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84).

... Has anyone even heard of these ad blockers before?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection