Google Books removed all search functions for any books with previews

old.reddit.com

222 points by adamnemecek 5 days ago · 77 comments

Reader

abetusk 5 days ago

Anna's Archive [0]:

> The largest truly open library in human history

[0] https://en.wikipedia.org/wiki/Anna%27s_Archive

cft 5 days ago

Mirrors https://open-slum.org/
- bigwheels 5 days ago
  
  Open-slum currently experiencing heavy traffic, but here's an additional mirror: https://open-slum.pages.dev/
- belter 5 days ago
  
  How funny. They have a DMCA Takedown Requests link...

al_borland 5 days ago

It might be time to update the mission statement.

“Our mission is to organize the world’s information and make it universally accessible and useful”

https://about.google/company-info/

tick_tock_tick 5 days ago

Why it's almost certainly not by choice.
zb3 5 days ago

* for us, advertisers and our AI models
- ern_ave 5 days ago
  
  My guess is that AI training is the main issue.
  Data that you can prove was generated by humans is now exceedingly valuable ...and most of that comes from the days before LLMs. The situation is a bit like how steel manufactured before the nuclear age is valuable.
  - adamnemecekOP 5 days ago
    
    But why would people train on excerpts from Google Books when whole books can be downloaded on libgen and such?
    
    londons_explore 5 days ago
    
    Google books is much bigger than libgen.
    
    asdefghyk 5 days ago
    
    copyright reasons?
    
    direwolf20 5 days ago
    
    Both are a copyright violation

crazygringo 5 days ago

Remember that preview functionality is granted by contract with the publishers. Which is why some books have it and some books don't.

Almost certainly, this is something that publishers requested the removal of, under threat of requiring previews to be removed entirely.

Books that are out of copyright still have full search and display enabled.

So blame publishers, not Google.

abetusk 5 days ago

I will blame overlong copyright term lengths. 70 years after authors death or 95 years after publication, allowing most recent work to enter the commons effectively after a century, or more, from now [0].
[0] https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...
- Analemma_ 4 days ago
  
  This is the rare case when Europe is even worse. Metropolis, the 1927 Fritz Lang film, is out of copyright in the United States but will still be in copyright in Germany until 2047: 120 fucking years.
  It’s preposterous, and offensive to anyone’s intelligence to claim that this is about incentivizing production; does anyone seriously believe there is a potential artist out there who would avoid making their magnum opus if it could only be under copyright for 119 years?
  - antonvs 4 days ago
    
    The problem is, copyright law is no longer about artists, if it ever was: it’s about corporations, i.e. maximizing the value corporations can extract from intellectual property.
    This post which was on the front page today is relevant: https://alexwennerberg.com/blog/2026-01-25-slop.html
adamnemecekOP 5 days ago

The previews are still there though, they just don't rank.
- crazygringo 5 days ago
  
  Right, that's what I'm saying. For whatever reason it seems publishers decided they don't want their preview-only books as part of the full-text search across all books. If they decide that, Google has to comply.
  This isn't like web search where web pages are publicly available and so Google can return search results across whatever it wants. For books, it relies on publisher cooperation to both supply book contents for indexing under license and give permissions for preview. If publishers say to turn off search, Google turns off search.
tamarinddreams 5 days ago

Given the argument over LLMs consuming books illegally, I think publishers could be a little concerned that an LLM that combined partial previews on every modern work on a subject might be a destroyer of the market for the average book on the subject with the license to do so having been properly granted via this feature.

Terr_ 4 days ago

Among the less-important things I'd like to send back in time to my past-self:

"The trend in digitized book passages will reverse, and they will become harder and harder to find with time, so clip your own copies of everything you like to quote."

mystraline 5 days ago

Thats easy.

Check out library genesis, Anna's archive, and scihub for content.

Piracy isnt theft if buying isnt ownership.

kevin42 5 days ago

I’m genuinely curious how you feel about LLMs being trained on pirated material. Not being snarky here.
Your comment reflects the old “information wants to be free” ideals that used to dominate places like HN, Slashdot, and Reddit. But since LLMs arrived, a lot of the loudest voices here argue the opposite position when it comes to training data.
I’ve been trying to understand whether people have actually changed their views, or whether it’s mostly a shift in who is speaking up now.
- mystraline 4 days ago
  
  Personally, my opinion doesnt matter. I'm a nobody who doesnt work in AI fields.
  But as a pirate, I specialize in finding hidden, hard to find, or otherwise lost sources. They're not making anybody any money, and I absolutely do not sell anything thats not mine (freely given).
  But having every commercial work available for ingestion into an LLM is an amazing way to train an AI. However if you're going to use piracy at scale to train, you should also not be able to sell the LLM or access to it.
  And yeah, that wrecks every corporate LLM strategy. Boo fucking hoo.
  Do creators need paid for content they create? Ideally, yes! Do they deserve iron-fisted control of your hardware (DRM) to enact their demands? Fuck no!
  Ideally, the LLMs would be FLOSS, full weights published, lists of content used to reproduce, etc. We could prune bad content and add more good. But the problem again is whoever does this must violate copyright cause copyright in the way its implemented is terrible.
  In reality, I like the RIAA's congressional solution. You send a check for how many plays you did to BMI/ASCAP and you're good. That could be extended to books and shows. If that were done, you could have a New-Flix service that literally has every show and movie in existence. You just pay a reasonable cost per month to access the whole of video humanity.
  Alas. Guess I'll have to build it myself.
  - krabat 3 days ago
    
    I agree. please do.
- spongebobstoes 4 days ago
  
  why would that change anything? copyright is still a tax on the whole of society for the benefit of rich people and corporations. it opposes innovation, evolution and progress
  maybe a short copyright would be fine (10 year fixed?) but copyright as-is seems indefensible to me
  - mystraline 4 days ago
    
    > copyright is still a tax on the whole of society for the benefit of rich people and corporations. it opposes innovation, evolution and progress
    The original reason for copyright, patents, and trademarks made sense.
    We want people to create and share. And unlike the old guild solutions from Europe, copyright and patents were a tradeoff to encourage the arts and science.
    But what's a good tradeoff? Thats a big copyright question. 17 years? 34 years? Life of author? 75 years? How about individual non-commercial use? Or abandoned works?
    And patents aren't even in scope, but we see similar abuses against the raison d'etra of them. Patents were supposed to entail a full reproduction of invention. Now, its a game of how incomplete can we make the filing while still getting protection. Or worse yet, really dumb shit has been patented like 1 click or the XOR patent, or that asshole Chakrabarty who patented living organisms.
    There were good reasons for a fair copyright and patent law for furtherance of the art and sciences. That narrative was lost long ago. Now, only the violators can really push ahead. And they can't talk about it.
    (Trademark law has never really had much complaints, aside trademarking a color. If you buy from XYZ company, you want to buy from them, not a counterfeit. And it relates back to coats of arms, again, representing a family or a charge.)
- gbear605 4 days ago
  
  Personally, I'd like for copyright to be abolished, and then for LLM training to be made illegal for reasons entirely unrelated to copyright.
GorbachevyChase 5 days ago

Ironic those doing the most for making information open and accessible are the criminals.
- direwolf20 5 days ago
  
  Of course. When it's criminal to make information open and accessible, only criminals will make information open and accessible.
- al_borland 5 days ago
  
  A centuries old problem. Early translations of the Bible to English were illegal or required licenses.
  William Tyndale was put to death for translating the Bible into English, which would have been an act to make information open and accessible.
  - josephcsible 5 days ago
    
    > William Tyndale was put to death for translating the Bible into English
    That's not what he was put to death for. See https://www.catholic.com/magazine/print-edition/tyndales-her... and https://www.chinakasreflections.com/did-the-roman-catholic-c...
    
    bijant 4 days ago
    
    Sure, just like Aaron Swartz was persecuted for "recklessly damaging a protected computer" and "wire fraud" not for any other reasons at all and btw the State wasn't at all involved in murdering him, he did that to himself, probably because he felt guilty for having damaged so many computers....
adamnemecekOP 5 days ago

None of these does full text search.
- jszymborski 5 days ago
  
  And they are under constant threat by nation states. sci-hub hasn't seen new papers in ages.
- droopyEyelids 5 days ago
  
  zlibrary does
  https://en.wikipedia.org/wiki/Z-Library
  - adamnemecekOP 5 days ago
    
    Huh, the search is not amazing but it will have to do. Thanks! Are there others?
    
    teraflop 5 days ago
    
    The Internet Archive supports full-text search on (AFAIK) its entire scanned book collection, even books that aren't available for borrowing.
    
    adamnemecekOP 5 days ago
    
    This is actually pretty good.
  - clueless 5 days ago
    
    I'd wonder if you'd ever consider putting up a downloadable mirror of their full-text search db?
- greenavocado 5 days ago
  
  Build a local index
  - adamnemecekOP 5 days ago
    
    My problem is finding references I don't know about.

Zathman 5 days ago

I just checked and yes, search inside of books with previews is still possible.

(a) when you search books.google.com and find a book with a preview, it opens their new book viewer - the search is at the bottom of the page. You can also click "View All" to see all references of your search in that book.

(b) if you go to the book homepage (clicking X in the top right of the book viewer if that opened), there's still a "Search Inside Book" next to the "Preview" button under the title.

adamnemecekOP 5 days ago

But you have to know what book you are looking for.

lr0 4 days ago

Anna's Archive or any piracy of book does not replace Google Books search functions at all. The search functions of these website just looks inside the PDF text, Google Books helped me many time to find manuscripts or old books that were not OCR'd properly. It's really a big loss.

didip 5 days ago

Google Books could have been a subscription service ala Netflix.

Then it would have been hella useful.

btrettel 4 days ago

Presumably these books can still be searched via HathiTrust: https://www.hathitrust.org/

More on the HathiTrust project: https://en.wikipedia.org/wiki/HathiTrust

Though I don't know how many of the HathiTrust books are the "preview" kind the Reddit post mentions. Maybe none are?

pfdietz 5 days ago

So, if you search for some text that occurs at the end of one chunk, will it then preview a following chunk? And could chaining these chunks give you the entire book?

If so, I could see someone doing this to exfiltrate books.

crazygringo 5 days ago

You're talking about in-book search (TFA is about search across all books), and yes that was indeed once a known technique for extracting whole or nearly whole books.
That's why publishers responded by excluding sections of books from search (it will list the pages but you can't view them), and individual Google accounts became limited in how many extra pages they were ever allowed to see of an individual book beyond the standard preview pages.
But then LibGen, Z-lib, and Anna's Archive became popular and built up their collections...

xorsula1 5 days ago

My guess is they detected being scraped and did this as preventive measure.

Andrex 5 days ago

My guess is they're cozier with publishers now than 20 years ago when they fought all the way to SCOTUS.
"Hey, remove search?"
"OK, it was costing money anyways."
breppp 5 days ago

my guess is that the copyright landscape changed due to AI training, and these publishers won't let Google use that data anymore
- adamnemecekOP 5 days ago
  
  The books are still there, it seems like the rankings have changed though.
londons_explore 5 days ago

If search gives you a preview with a few surrounding words, it is fairly simple to abuse search with quotation marks to extract bigger and bigger sections of the books, potentially till you have the whole book.

adamnemecekOP 5 days ago

The change happened on or around Jan 21. Overnight the results went from pretty good to absolute trash.

Here are two screenshots taken on Jan 20 and Jan 23 https://bsky.app/profile/adamnemecek.bsky.social/post/3mdbup...

They don't do full text search anymore esp for copyrighted books. I wonder if this is not a regression but an intent to give them a let up in the AI race.

toephu2 5 days ago

Yup, it's for AI.
Similarly, a year ago or so ChatGPT could summarize YouTube videos. Google put a stop to that so now only Gemini can summarize YouTube videos.
- AJ007 5 days ago
  
  The YT transcripts are linked to on the YT page itself. If they remove that, it is trivial to use a local STT model to transcribe the video. If they make it impossible to download a video, you could just have a microphone record all of the sound, and so on. Once you have the transcription of anything, summarizing is trivial. I have a local script that does this and I use it all of the time. Also produce diagrams for YT summaries. Hours saved, per day.
jeffbee 5 days ago

It isn't obvious why the left results are preferred over the right results.
- advisedwang 5 days ago
  
  The left results are contemporary, the right are decades old. That includes editions of the same book --- surely the newer edition is going to be preferred by most readers.
  - jeffbee 5 days ago
    
    I guess. That's not immediately clear to me. However, browsing around on Google Books suggests to me that it is the corpus which changed, not the algorithms.
    
    adamnemecekOP 5 days ago
    
    The corpus is still the same, like searching the name of the book will find it, but the full text search.
  - thaumasiotes 5 days ago
    
    > surely the newer edition is going to be preferred by most readers.
    Why? Where different editions exist, the reader will want to know which one they're getting, but they're unlikely to systematically prefer newer editions.
    But also, Google Books isn't aimed at "readers". You're not supposed to read books through it. It's aimed at searchers. Searchers are even less likely to prefer newer editions.
    
    gjm11 5 days ago
    
    > they're unlikely to systematically prefer newer editions
    That seems wrong to me. Generally when a new edition of something is put out it's (at least nominally) because they've made improvements.
    ("At least nominally" because it may happen that a publisher puts out different editions regularly simply because by doing so they can get people to keep buying them -- e.g., if some university course uses edition E of book B then students may feel that they have to get that specific edition, and the university may feel that they have to ask for the latest edition rather than an earlier one so that students can reliably get hold of it, so if the publisher puts out a new edition every year that's just different for the sake of being different then that may net them a lot of sales. But I don't think it's true for most books with multiple editions that later ones aren't systematically better than earlier ones.)
    
    thaumasiotes 4 days ago
    
    > But I don't think it's true for most books with multiple editions that later ones aren't systematically better than earlier ones.
    Most books with multiple editions are books that have been translated multiple times. It is definitely true that later translations aren't systematically better than earlier ones.

bryanrasmussen 5 days ago

Since I pretty much only use Google Books for public domain books, old magazines, and newspapers I haven't noticed any problem with it. Maybe it's not as dead as this person thinks.

mikestew 5 days ago

This was addressed in the post, I'm sure you just missed it when you read it:
"But a few days ago they removed ALL search functions for any books with previews, which are disproportionately modern books." <emphasis mine>
- bryanrasmussen 4 days ago
  
  right, my point was just because what they use it for is now useless mine isn't and personally I think mine is more useful.
  - mikestew 4 days ago
    
    Fair enough, thanks for the clarification.
adamnemecekOP 5 days ago

No the search results went from pretty good to absolute garbage https://bsky.app/profile/adamnemecek.bsky.social/post/3mdbup...

damnitbuilds 4 days ago

Done to satisfy the copyright barons.

Protest this by pirating, until copyright terms are reduced to make copyright once again a net benefit for society.

caplane 3 days ago

It's working again! Must have been a glitch. Very relieved to have it back!

ChrisArchitect 5 days ago

Title is: Google has seemingly entirely removed search functionality from most books on Google Books

pessimizer 5 days ago

Google Books is long dead. If you click on the author's name in one of the results, it will search inauthor:"Author's Name" and this search will return garbage because it chokes on double quotes. This has been true for at least a couple of years; Google Books is not compatible with itself. Changing the double quotes to single quotes fixes it. Also, lately, when you filter only for books that have Full View some results that have Full View get dropped for no intelligible reason.

Nobody is looking at it. I wouldn't be surprised if the preview search was switched off by accident.

For me Books is only useful (and it is very useful) for books out of copyright, 100+ years old. Sometimes they aren't at archive.org.

I hate Google, but I think it's a bit absurd to criticize them on this if somehow it's over AI. The only reason Google created Books may even have been AI, but they were hoping to have the books open to everyone, and the publishers and authors whose full text is being blocked are literally the people who stopped it from happening. Maybe they spoke up about AI, too. I find it even hard to even criticize that Google doesn't take care of Books - it has no purpose or profit potential for them anymore, it's obviously charity that they don't take it down completely.

kingstnap 5 days ago

My guess: Text search and indexing is expensive. And you are getting some kind of AI vector search instead.

Which tends to be kind of poop compared to true text search.

storystarling 5 days ago

I suspect it's actually the opposite. Standard inverted index text search is incredibly cheap and mature. Vector search requires generating embeddings and running approximate nearest neighbor queries, which is significantly more compute intensive than simple keyword matching. If they switched, it wasn't to save on compute costs.

Settings

Google Books removed all search functions for any books with previews

Keyboard Shortcuts