The Great GPT Firewall ๐
This collection is a curated list of websites that employ the robots.txt file to restrict access to AI Agents, AI crawlers and GPTs.
It will be updated monthly.
User agents & robots.txt
The robots.txt file allows website owners to control and limit the access of these user agents to certain areas of their website by specifying rules and directives.
# OpenAIโs web crawler: GPT3.5, GPT4, ChatGPT # https://platform.openai.com/docs/bots User-agent: GPTBot # ChatGPT plugins # https://platform.openai.com/docs/bots User-agent: ChatGPT-User # OpenAI Search bot # https://platform.openai.com/docs/bots User-agent: OAI-SearchBot # Google's web crawler: Bard, VertexAI, Gemini # https://blog.google/technology/ai/an-update-on-web-publisher-controls/ User-agent: Google-Extended # Apple's web crawler, dedicated to GenAI projects # https://support.apple.com/en-us/119829 User-agent: Applebot-Extended # Claude User-agent: anthropic-ai # Claude Bot User-agent: ClaudeBot # Claude web User-agent: Claude-Web # Amazonbot # https://developer.amazon.com/amazonbot User-agent: Amazonbot # Cohere User-agent: Cohere-ai # Perplexity User-agent: PerplexityBot # You # https://about.you.com/fr/youbot/ User-agent: YouBot # Common Crawl # https://commoncrawl.org/ccbot User-agent: CCBot # Omglibot: webz.io # https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/ User-agent: Omgilibot User-agent: Omgili User-agent: Webzio-Extended # Facebook: Llama # https://developers.facebook.com/docs/sharing/bot/ User-agent: FacebookBot # Facebook # https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/ User-agent: Meta-ExternalAgent # ByteDance: Duobao User-agent: Bytespider # Ai2 # https://allenai.org/crawler User-agent: Ai2bot User-agent: Ai2Bot-Dolma # Diffbot User-agent: Diffbot # Huawei # https://darkvisitors.com/agents/pangubot User-agent: PanguBot # Petal Search # https://datadome.co/learning-center/how-to-block-petal-bot/ User-agent: PetalBot Timpibot # https://darkvisitors.com/agents/timpibot User-agent: Timpibot # Censorship area Disallow: /
Disclaimer
Please note that this blocklist is intended for informational purposes only. Despite the provoking project name, it's fine to disallow web crawling and protect content ownership.
2025-12 update
Category: Press
- Scanned: 66
- โ Passing: 30 %
- ๐ Blocked: 70 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| The Times | ๐ฌ๐ง | ๐ |
| BBC | ๐ฌ๐ง | ๐ |
| The Guardian | ๐ฌ๐ง | ๐ |
| The Economist | ๐ฌ๐ง | โ |
| Financial Times | ๐ฌ๐ง | ๐ |
| The Independent | ๐ฌ๐ง | โ |
| The Telegraph | ๐ฌ๐ง | ๐ |
| Daily Mail | ๐ฌ๐ง | ๐ |
| The Sun | ๐ฌ๐ง | ๐ |
| Daily Mirror | ๐ฌ๐ง | ๐ |
| Daily Express | ๐ฌ๐ง | ๐ |
| Washington Post | ๐บ๐ธ | ๐ |
| USA Today | ๐บ๐ธ | ๐ |
| Fox News | ๐บ๐ธ | โ |
| ABC News | ๐บ๐ธ | ๐ |
| NBC News | ๐บ๐ธ | ๐ |
| CBS News | ๐บ๐ธ | ๐ |
| Los Angeles Times | ๐บ๐ธ | ๐ |
| Chicago Tribune | ๐บ๐ธ | ๐ |
| New York Post | ๐บ๐ธ | ๐ |
| New York Daily News | ๐บ๐ธ | ๐ |
| The New Yorker | ๐บ๐ธ | ๐ |
| Vice | ๐บ๐ธ | โ |
| New York Times | ๐บ๐ธ | ๐ |
| Wall Street Journal | ๐บ๐ธ | ๐ |
| CNN | ๐บ๐ธ | ๐ |
| El Paรญs | ๐ช๐ธ | โ |
| Sรผddeutsche Zeitung | ๐ฉ๐ช | ๐ |
| Der Spiegel | ๐ฉ๐ช | ๐ |
| Corriere della Sera | ๐ฎ๐น | ๐ |
| La Repubblica | ๐ฎ๐น | ๐ |
| Le Monde | ๐ซ๐ท | ๐ |
| Libรฉration | ๐ซ๐ท | ๐ |
| Le Figaro | ๐ซ๐ท | ๐ |
| 20 Minutes | ๐ซ๐ท | ๐ |
| Ouest France | ๐ซ๐ท | ๐ |
| Le Parisien | ๐ซ๐ท | ๐ |
| L'Equipe | ๐ซ๐ท | ๐ |
| Le Point | ๐ซ๐ท | ๐ |
| Marianne | ๐ซ๐ท | ๐ |
| Le Nouvel Observateur | ๐ซ๐ท | ๐ |
| L'Express | ๐ซ๐ท | ๐ |
| France 24 | ๐ซ๐ท | ๐ |
| BFMTV | ๐ซ๐ท | ๐ |
| CNews | ๐ซ๐ท | โ |
| Le Monde Diplomatique | ๐ซ๐ท | โ |
| Mediapart | ๐ซ๐ท | ๐ |
| Courrier International | ๐ซ๐ท | ๐ |
| Brut | ๐ซ๐ท | โ |
| IMDB | ๐ | ๐ |
| Allocine | ๐ซ๐ท | ๐ |
| Fakt | ๐ต๐ฑ | โ |
| Super Express | ๐ต๐ฑ | โ |
| Gazeta Wyborcza | ๐ต๐ฑ | ๐ |
| Rzeczpospolita | ๐ต๐ฑ | โ |
| Dziennik Gazeta Prawna | ๐ต๐ฑ | โ |
| Polityka | ๐ต๐ฑ | โ |
| Newsweek Polska | ๐ต๐ฑ | โ |
| Goลฤ Niedzielny | ๐ต๐ฑ | โ |
| Sieci | ๐ต๐ฑ | โ |
| Do Rzeczy | ๐ต๐ฑ | โ |
| Twรณj Styl | ๐ต๐ฑ | โ |
| Zwierciadลo | ๐ต๐ฑ | โ |
| Wysokie Obcasy Extra | ๐ต๐ฑ | ๐ |
| Pani | ๐ต๐ฑ | โ |
| Elle | ๐ต๐ฑ | ๐ |
Category: Video on demand
- Scanned: 9
- โ Passing: 33 %
- ๐ Blocked: 67 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Prime Video | ๐ | ๐ |
| Netflix | ๐ | โ |
| Disney+ | ๐ | ๐ |
| Hulu | ๐บ๐ธ | ๐ |
| HBO Max | ๐บ๐ธ | โ |
| Canal+ | ๐ซ๐ท | ๐ |
| FranceTV | ๐ซ๐ท | ๐ |
| TF1 | ๐ซ๐ท | ๐ |
| 6Play | ๐ซ๐ท | โ |
Category: Music
- Scanned: 6
- โ Passing: 67 %
- ๐ Blocked: 33 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Soundcloud | ๐ | ๐ |
| Youtube | ๐ | โ |
| Apple Music | ๐ | โ |
| Spotify | ๐ | ๐ |
| Deezer | ๐ซ๐ท | โ |
| LastFM | ๐ฌ๐ง | โ |
Category: Podcast
- Scanned: 8
- โ Passing: 75 %
- ๐ Blocked: 25 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Google Podcasts | ๐ | โ |
| Apple Podcast | ๐ | โ |
| Spotify Podcaster | ๐ | ๐ |
| Buzzsprout | ๐ | โ |
| Podbean | ๐ | โ |
| Acast | ๐ฌ๐ง | โ |
| AudioMeans | ๐ซ๐ท | โ |
| Radio France | ๐ซ๐ท | ๐ |
Category: X
- Scanned: 6
- โ Passing: 67 %
- ๐ Blocked: 33 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| PornHub | ๐ | ๐ |
| YouPorn | ๐ | ๐ |
| Xnxx | ๐ | โ |
| Xvideos | ๐ | โ |
| Xhamster | ๐ | โ |
| OnlyFan | ๐ | โ |
Category: Religion
- Scanned: 5
- โ Passing: 80 %
- ๐ Blocked: 0 %
- โ Unknown: 20 %
| Name | Country | Status |
|---|---|---|
| Bible | ๐บ๐ธ | โ |
| Bible gateway | ๐บ๐ธ | โ |
| Jehovah's Witnesses | ๐บ๐ธ | โ |
| Vatican | ๐ป๐ฆ | โ |
| Islamweb | ๐ | โ |
Category: Social media
- Scanned: 13
- โ Passing: 23 %
- ๐ Blocked: 69 %
- โ Unknown: 8 %
| Name | Country | Status |
|---|---|---|
| ๐ | ๐ | |
| ๐ | ๐ | |
| ๐ | โ | |
| Hacker News | ๐ | โ |
| Lobsters | ๐ | โ |
| ๐ | ๐ | |
| TikTok | ๐ | ๐ |
| ๐ | ๐ | |
| ๐ | ๐ | |
| Quora | ๐ | ๐ |
| VK | ๐ท๐บ | โ |
| TripAdvisor | ๐ | ๐ |
| Yelp | ๐ | ๐ |
Category: Artist
- Scanned: 42
- โ Passing: 76 %
- ๐ Blocked: 17 %
- โ Unknown: 7 %
| Name | Country | Status |
|---|---|---|
| Michael Jackson | ๐บ๐ธ | โ |
| Madonna | ๐บ๐ธ | โ |
| Taylor Swift | ๐บ๐ธ | ๐ |
| Rihanna | ๐บ๐ธ | โ |
| Bruno Mars | ๐บ๐ธ | โ |
| Justin Bieber | ๐บ๐ธ | ๐ |
| Beyoncรฉ | ๐บ๐ธ | โ |
| Katy Perry | ๐บ๐ธ | ๐ |
| Lady Gaga | ๐บ๐ธ | โ |
| Hardwell | ๐บ๐ธ | โ |
| Dimitri Vegas Like Mike | ๐บ๐ธ | โ |
| Kanye West | ๐บ๐ธ | โ |
| Black Eyed Peas | ๐บ๐ธ | โ |
| Imagine Dragons | ๐บ๐ธ | โ |
| Twenty One Pilots | ๐บ๐ธ | โ |
| Maroon 5 | ๐บ๐ธ | ๐ |
| Selena Gomez | ๐บ๐ธ | โ |
| Usher | ๐บ๐ธ | ๐ |
| Stromae | ๐ง๐ช | โ |
| Aya Nakamura | ๐ซ๐ท | โ |
| Soprano | ๐ซ๐ท | โ |
| Johnny Hallyday | ๐ซ๐ท | โ |
| Grand Corps Malade | ๐ซ๐ท | โ |
| Zaho | ๐ซ๐ท | โ |
| Jean Louis Aubert | ๐ซ๐ท | โ |
| Camelia Jordana | ๐ซ๐ท | โ |
| Indochine | ๐ซ๐ท | โ |
| Tryo | ๐ซ๐ท | โ |
| David Guetta | ๐ซ๐ท | ๐ |
| Mc Solaar | ๐ซ๐ท | โ |
| Zaz | ๐ซ๐ท | โ |
| Christine and the Queens | ๐ซ๐ท | โ |
| Boulevard des Airs | ๐ซ๐ท | โ |
| Calogero | ๐ซ๐ท | ๐ |
| Hoshi | ๐ซ๐ท | โ |
| Avicii | ๐ธ๐ช | โ |
| Adele | ๐ฌ๐ง | โ |
| Calvin Harris | ๐ฌ๐ง | โ |
| Ed Sheeran | ๐ฌ๐ง | โ |
| Arctic Monkeys | ๐ฌ๐ง | โ |
| Coldplay | ๐ฌ๐ง | โ |
| The Weeknd | ๐จ๐ฆ | โ |
Category: Gov
- Scanned: 3
- โ Passing: 100 %
- ๐ Blocked: 0 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| White House | ๐บ๐ธ | โ |
| Elysรฉe | ๐ซ๐ท | โ |
| Europe | ๐ช๐บ | โ |
Category: Science
- Scanned: 28
- โ Passing: 71 %
- ๐ Blocked: 29 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Google Scholar | ๐ | ๐ |
| Sci-Hub | ๐ | โ |
| PubPeer | ๐ | โ |
| Scopus | ๐ณ๐ฑ | โ |
| Elsevier | ๐ณ๐ฑ | โ |
| ScienceDirect | ๐ณ๐ฑ | โ |
| MDPI | ๐จ๐ญ | โ |
| Springer | ๐ฉ๐ช | โ |
| Wiley | ๐บ๐ธ | โ |
| American Chemical Society | ๐บ๐ธ | โ |
| PubMed | ๐บ๐ธ | โ |
| Academia | ๐บ๐ธ | ๐ |
| Science | ๐บ๐ธ | ๐ |
| ArXiv | ๐บ๐ธ | โ |
| American Physical Society | ๐บ๐ธ | โ |
| Mendeley | ๐ฌ๐ง | โ |
| Nature | ๐ฌ๐ง | ๐ |
| Taylor Francis | ๐ฌ๐ง | ๐ |
| Oxford University Press | ๐ฌ๐ง | โ |
| Cambridge University Press | ๐ฌ๐ง | ๐ |
| Royal Society of Chemistry | ๐ฌ๐ง | โ |
| ResearchGate | ๐ฉ๐ช | โ |
| BNF | ๐ซ๐ท | โ |
| Cairn | ๐ซ๐ท | โ |
| Persee | ๐ซ๐ท | โ |
| Gallica | ๐ซ๐ท | ๐ |
| HAL | ๐ซ๐ท | ๐ |
| OpenEdition | ๐ซ๐ท | โ |
Category: Dev
- Scanned: 3
- โ Passing: 100 %
- ๐ Blocked: 0 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Github | ๐ | โ |
| Gitlab | ๐ | โ |
| Stack Overflow | ๐ | โ |
Category: Other content
- Scanned: 19
- โ Passing: 68 %
- ๐ Blocked: 32 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Wikipedia | ๐ | โ |
| Medium | ๐ | โ |
| Substack | ๐ | โ |
| Common Crawl | ๐ | โ |
| Internet Archive | ๐ | โ |
| Wayback Machine | ๐ | โ |
| Notion | ๐ | ๐ |
| Weather | ๐บ๐ธ | ๐ |
| AccuWeather | ๐บ๐ธ | โ |
| Mรฉtรฉo France | ๐ซ๐ท | โ |
| Getty Images | ๐บ๐ธ | โ |
| Shutterstock | ๐บ๐ธ | ๐ |
| Adobe Stock | ๐บ๐ธ | ๐ |
| Unsplash | ๐จ๐ฆ | โ |
| Pexels | ๐ฉ๐ช | โ |
| Pixabay | ๐ฉ๐ช | ๐ |
| Flickr | ๐บ๐ธ | ๐ |
| 500px | ๐จ๐ฆ | โ |
| Giphy | ๐บ๐ธ | โ |
Category: Other
- Scanned: 1
- โ Passing: 100 %
- ๐ Blocked: 0 %
- โ Unknown: 0 %
| Name | Country | Status |
|---|---|---|
| Indeed | ๐บ๐ธ | โ |
๐ค Contributing
Looking for contributions:
- Enrich website database
- Chinese websites
- New categories
Please open issues!
- Ping me on Twitter @samuelberthe (DMs, mentions, whatever :))
- Fork the project
- Fix open issues or request new features
Don't hesitate ;)
Build
python -m venv venv source ./venv/bin/activate pip3 install -r requirements.txt python3 scrape.py # then copy the last version into readme
๐ค Contributors
๐ซ Show your support
Give a โญ๏ธ if this project helped you!
๐ License
Copyright ยฉ 2024 Samuel Berthe.
This project is MIT licensed.
