GitHub - vivienhenz24/fuzzy-canary: Poor-man's solution to stopping AI companies from scraping your blog

2 min read Original article ↗

Banner

Fuzzy Canary

AI companies are scraping everyone's sites for training data. If you're self-hosting your blog, there's not much you can do about it, except maybe make them think your site contains content they won't want. Fuzzy Canary plants invisible links (to porn websites...) in your HTML that trigger scrapers' content safeguards.

npm CI License npm downloads Bundle size

Getting Started

Installation

npm i @fuzzycanary/core
# or
pnpm add @fuzzycanary/core

Usage

There are two ways to use it: client-side or server-side. Use server-side if you can—it works better because the canary is in the HTML from the start, so scrapers that don't run JavaScript will still see it.

Server-side (recommended):

If you're using a React-based framework (Next.js, Remix, etc.), add the <Canary /> component to your root layout:

// Next.js App Router: app/layout.tsx
// Other React frameworks: your root layout file
import { Canary } from '@fuzzycanary/core/react'

export default function RootLayout({ children }) {
  return (
    <html>
      <body>
        <Canary />
        {children}
      </body>
    </html>
  )
}

For non-React frameworks, use the getCanaryHtml() utility and insert it at the start of your <body> tag.

Client-side:

If you're building a static site or prefer client-side injection, import the auto-init in your entry file:

// Your main entry file (e.g., main.ts, index.ts, App.tsx)
import '@fuzzycanary/core/auto'

That's it. It will automatically inject the canary when the page loads.

Notes on SEO

Fuzzy Canary now injects for every visitor, including crawlers. If you're concerned about how this affects indexing or rankings, consider testing in a staging environment before rolling out to production.