Anvitra - Human-like Search for Modern Applications

All of us have faced the issue of search feeling broken. But why is it so? Let's take a look at what we generally search for and examine why the experience often fails us. I might be searching for:

"Blue shirt of cotton make"
"Stock that is making steady progress over the last 5 years"
"Unexplored town in India for Christmas"

The list goes on... Some of these queries demand subjective answers, and some are purely objective. For example, "Blue shirt of cotton make" is objective, while "unexplored town in India for Christmas" is subjective. Let's take a closer look.

Dissecting the Query 🧐

I can quantify an objective query like "Blue shirt of cotton make" using a structured query language. I know you might be thinking, "How on earth are we going to store all products in a database and then search through them?" Put a pin in that; we'll come back to it later. Let us move forward to "unexplored town in India for Christmas." This is a profoundly subjective query, and it's not possible to quantify it easily. For one, people have different preferences and interests. My answer for an unexplored town in India for Christmas might be entirely different from yours. Many folks out there might want to explore desert like places in India for Christmas, whereas I prefer a hilltop area (though a desert sounds fun, too!). Then again, is a water theme park a good Christmas destination because few people go there, or does nobody actually want to go to a water theme park on Christmas? You might think I'm being delusional as some of these statements may not be entirely correct from your perspective. But one thing we can agree upon: the question is subject to interpretation. Interpretation is a hard problem even for humans. Ask my wife how many times I've misinterpreted her statements! So, if it is a hard problem for humans, how can we expect machines to solve it? We don't, period. [silence] I know you might be thinking I should keep going, as "We don't" seems like gaslighting. You're right; technically, we don't, but the point is we can expect machines to simplify it, right? In that case, let's take one more look at the problem.

Are there any parts of the query that can be quantified?

Apparently, yes. We are looking at an initial structured query: What about Christmas? It suggests an obvious subjective filter: 'recommended for' = "Christmas". Are there similar subjective filters? Oh yes, let's list them down:

'recommended_for' = 'christmas'
'expected_travel_time' = 'christmas'
'expected_weather' = 'cold'
'popularity' = 'unexplored'
'no_of_travellers' = 'low'
'no_of_travellers_on_q4' = 'low'

Now this seems to be getting interesting. Even though we are not able to perfectly quantify the request, we are forming a sort of incomplete structured + subjective query that could satisfy the search. We are getting closer, but we are yet not where you might be thinking.... 😜 What about the person asking the question? If it's a dad, how many kids does he have, or how old are they? If it's a couple, what are their ages? If it's young singles, are they female, male, or a mixed group? These questions might point toward a recommendation system, but their answers can add more subjective and even structured filters. For instance, if you were asking this question in 2021, an additional filter might have been applied: 'travel_restrictions' = 'low'. The filter itself looks like a joke considering the environment during the COVID days. But one thing is clear: the more we think and the more these edge cases surface, the better results we get. This reveals a vague blur between getting better results and true interpretation of the question. Let us summarize what we've learned so far: any query you ask can be defined as a structured + subjective query with an open end (cause I'm too lazy to find all the possible edge cases). With this in mind, let's move on to the next part.

How to Execute this Definition? 🛠️

To execute this open-ended structured + subjective query, we need a database engine that can compare data against this query and ingest new data seamlessly. Technically, you can create an SQL query to compare the data against this definition and query an a SQL database. If you aren't a technical person, what I'm saying is that multiple software programs exist that can process queries using a language called SQL even ChatGPT can help you write it. Now the hard part comes:

How do we store the data?
How many fields do we need to store?
What if some fields are not available?

And for each use case, building a custom database is simply not a viable option. If you are wondering how the existing travel, stocks, and e-commerce platforms work? Well, they are powered by the blood and sweat of developers who build APIs and specialized databases for those specific platforms. So, to power each kind of search query, you either need different databases or an ever evolving, complex database. Now we are getting into the meat of the problem:

You can somewhat define each search query as a structured + subjective query with an open end, but this means you'd need to change the database schema for each use case, re-index all the data, and deal with edge cases on the fly.
Or, you can define what kind of search can be performed and build a pre-defined database that can power only that.

Most of the platforms you use take the second approach. That is why you see a lot of clunky filters and keyword-based search bars. They are not meant to liberate you; they are designed to limit you to their pre-defined search space.

Can We Liberate Users? 🚀

Well, that is a truly hard problem. I mean it.

Before jumping into it, I just wanted to share why you might be noticing this problem now. Its answer is there in this blog post.

There are three major layers that power the search experience:

Search Layer: Responsible for interpreting the user's query and retrieving relevant results from the data layer.
Data Layer: Responsible for storing and catering retrieval requests from the search layer.
Application Layer: Responsible for quantifying the subjective materials in each domain.

Let's look at them one by one. At first glance, you can see each layer is interdependent. One cannot fulfill its role without the others. The Search Layer requires the Data and Application layers to construct a meaningful query. The Data Layer needs the Application Layer to ingest data and make it searchable. The Application Layer needs clean data and its attributes to quantify the subjective materials in each domain. I will continue the story of each layer in the future blog posts. But to keep your brain tinkering, let me spill some spoilers.

Spoilers

Search can be thought of as a plug-and-play system of multiple interpretation techniques. The more components you add, the better the query will be formed and the better the subsequent search results. But the more items you add, the more complex the system gets, and latency increases. So, it is always a trade-off between the use case you want to serve and the complexity you are willing to accept. To drop an example, a simple stock ticker search may not need to add contextual based interpretation, whereas a travel search likely will. So, in future blog posts, I will be back with deeper insights into each layer and how they can be used to power a truly liberated search experience.

Let us Deep Dive into the Search Problem

Dissecting the Query 🧐

How to Execute this Definition? 🛠️

Can We Liberate Users? 🚀

Spoilers