Ask HN: Is there any one book or resource on search engine development & theory?
I'm working on a search engine for a web application I am developing and realized I really didn't know that much about making search engines. I've taken a bit of AI & Expert Systems in school but never really run into any books specifically on developing search engines. Do any such books exist? If so, recommendations? Gred Linden likes Introduction to Information Retrieval: http://www-csli.stanford.edu/~hinrich/information-retrieval-... (free online). This article gives a wonderful overview of the challenges: "Why Writing Your Own Search Engine Is Hard"
http://queue.acm.org/detail.cfm?id=988407 (site is down currently.) google cache:
http://74.125.95.132/search?q=cache:13tlOSQwtjAJ:queue.acm.o... There are some ACM/IEEE journals that have relevant papers, but you have to ask yourself: is reinventing the wheel what you really want to be doing? Given that there are lots of available COTS solutions, shouldn't you be focusing on things that are unique to your app? (Needless to say, if the search engine needs are unique to your app, and a COTS solution isn't viable, you might want to bring in someone with relevant expertise.) spot on. OP: Are you asking how basic tf-idf works, or is there something you can't get lucene / SOLR / sphinx / tsearch to do easily? nevertheless, here are some good background materials (search amazon on "data mining" http://www.amazon.com/gp/product/1584504609 http://www.amazon.com/Data-Mining-Practical-Techniques-Manag... Also the Collective intelligence by Satnam alag is quite good (a lot of java code to wade through tho To be honest I hadn't even heard of tf-idf before you mentioned it. It is definitely not the case I am stepping beyond the bounds of something like sphinx. I basically want to lay a bit of foundation before I start mucking around with something I have no idea about. I have a couple e-books on Data Mining but I didn't think it was applicable. Are Data Mining and Search two things closely intertwined?