Settings

Theme

Ask HN: How to build full text search at scale?

4 points by ratpik 6 years ago · 5 comments · 1 min read


Data Store - Elasticsearch Scale - 10 million writes/day (500 GB/day), about 100K search queries per day

Trying to figure out

1 - How to control access to data (multi-tenancy where there are ~100K tenants)

2 - Database design - Indexes and Shards and best practices around mixing different types of documents in a single index.

itronitron 6 years ago

I recommend writing down what exactly scale means for your needs. Number of users? Number of queries? Number of sources? Number of 'result-sets'? Number of documents? Number of text fields?

Elasticsearch is built on-top-of Lucene which is a Java API that you can use in pretty much any application. If you already have a system that can search the MySQL clusters then I would recommend hooking Lucene into that system instead of standing up another one.

  • ratpikOP 6 years ago

    Added more details

    • itronitron 6 years ago

      >> How to control access to data (multi-tenancy where there are ~100K tenants)

      So basically, you need to run a web server that serves a search page in which your users can create and submit queries. The web server receives the query and then routes it to the appropriate search handler. The web server should handle access control and there are several standard different approaches for this.

      Solr and ElasticSearch can both be used in this manner.

      >> Database design - Indexes and Shards and best practices around mixing different types of documents in a single index.

      Depends a lot on what your users want to get in their search results. A first step would be to identify the primary text fields they want to search in each document type, then create a standard text field in the schema into which each document's primary content gets indexed. You can get fancy by running different document types through different analyzer/tokenizer chains (for example if they were in different languages) and you can do a lot of 'cheating/preprocessing' here so that the primary search text field has good information in it.

bufferoverflow 6 years ago

I doubt you will get a useful answer in a comment. Each of your questions is very broad. And you didn't even select a search engine yet. You didn't specify the scale you're dealing with. You didn't specify the number of reads/writes per second that you expect.

Choose one system and learn it well.

  • ratpikOP 6 years ago

    Scale - Added details to the post.

    Search Engine - Elasticsearch

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection