Ask HN: How to build full text search at scale?
Data Store - Elasticsearch Scale - 10 million writes/day (500 GB/day), about 100K search queries per day
Trying to figure out
1 - How to control access to data (multi-tenancy where there are ~100K tenants)
2 - Database design - Indexes and Shards and best practices around mixing different types of documents in a single index. I recommend writing down what exactly scale means for your needs. Number of users? Number of queries? Number of sources? Number of 'result-sets'? Number of documents? Number of text fields? Elasticsearch is built on-top-of Lucene which is a Java API that you can use in pretty much any application. If you already have a system that can search the MySQL clusters then I would recommend hooking Lucene into that system instead of standing up another one. Added more details >> How to control access to data (multi-tenancy where there are ~100K tenants) So basically, you need to run a web server that serves a search page in which your users can create and submit queries. The web server receives the query and then routes it to the appropriate search handler. The web server should handle access control and there are several standard different approaches for this. Solr and ElasticSearch can both be used in this manner. >> Database design - Indexes and Shards and best practices around mixing different types of documents in a single index. Depends a lot on what your users want to get in their search results. A first step would be to identify the primary text fields they want to search in each document type, then create a standard text field in the schema into which each document's primary content gets indexed. You can get fancy by running different document types through different analyzer/tokenizer chains (for example if they were in different languages) and you can do a lot of 'cheating/preprocessing' here so that the primary search text field has good information in it. I doubt you will get a useful answer in a comment. Each of your questions is very broad. And you didn't even select a search engine yet. You didn't specify the scale you're dealing with. You didn't specify the number of reads/writes per second that you expect. Choose one system and learn it well. Scale - Added details to the post. Search Engine - Elasticsearch