The Deep Noodle Blog | The Realities of Semantic Search

7 min read Original article ↗

The Realities of Semantic Search

A year ago, I started implementing semantic search in the SaaS app I was building. The promise made perfect sense: users type what they mean, and the system understands. Finally, search that works like our brain works.

What I didn't expect was how much trial and error it would take to make it work well.

After countless experiments, prototypes, and UI iterations, I've captured a number of key questions and decisions that will help you get the most out of semantic search. The technology is powerful, but success requires understanding nuances that only become clear after you've been working with it for a while.

Here's what I learned through months of experimentation:

  1. People still search with keywords. Design for that reality.
  2. Know when users expect exact matches vs. semantic understanding.
  3. Large datasets create semantic clutter that degrades results.
  4. Choosing the best embedding model may not improve your results.
  5. Hybrid search is where it's at.
  6. AI agents + hybrid search = genuinely impressive results.

If you're implementing semantic search, keeping these tips in mind could save you weeks of trial and error. Keep reading for more details.


People Still Search with Keywords

Here's what I learned: people use search by typing keywords. Basically always.

Why? Two reasons:

  1. We're trained for it. Years of using keyword search products has shaped our habits.
  2. Keywords are efficient. They're the shortest text that uniquely expresses what we want. Nobody wants to type a paragraph when three words will do.

Even if longer phrases would yield better semantic matches, users won't type them. It's too much work. So your beautiful semantic search system gets fed a steady diet of keywords. Exactly what it wasn't optimized for.

Don't get me wrong. Semantic search still performs adequately when fed keywords. It just would perform better if users typed longer phrases that expressed their intent.


Strict Match vs. Semantic Match

A strict match is really a filter. A guarantee. When users search for 2021 financial statements, they don't want 2020 or 2022 results mixed in. But that's what semantic search might give them.

More generally, when users provide multi-part queries, some portions are firmer in their minds than others. Searching for financial statements for PepsiCo? PepsiCo is more of a non-negotiable. But semantic search might helpfully include Coca-Cola's financials too.

From a UX perspective, it's often better to let users first filter for PepsiCo, then use semantic search within the filtered dataset. Depending on which semantic search solution you choose, implementing filters like this might be trickier than you'd expect.


Semantic Clutter

You know that feeling when your closet gets so full you can't find that one shirt? That's semantic search at scale.

Actually, it's more like asking a friend to find your shirt in your own closet. They emerge with an armful of "close enough" options, not knowing which is right. You have to pick the actual shirt from their selection.

This is why semantic search systems often need reranking. The initial results hopefully contain what you want, buried among less relevant matches.

Unfortunately, the problem gets worse as you add documents.

In a custom knowledge base I built for a client—approximately 10,000 documents chunked into 100,000+ searchable items—even specific queries returned dozens of "sort of right" results. Common topics in the dataset formed clusters of semantically similar chunks. Queries could generate hundreds of highly scored results.

You can't just dump all these matches into your RAG pipeline or show them to users. You need another pass to rerank and select the top N relevant matches.

This is doable, but you might not have planned for it. Some search products offer built-in reranking. But here's my favorite approach: use an LLM to pick the top N matches. It's highly effective and easy to implement.

A small model like Claude Haiku keeps costs and latency low while excelling at this nicely constrained task. For example: use semantic search to find the top 25 semantic matches, then ask the LLM to extract the best 5 from that set.


Choosing an Embedding Model

This area is funny to me now. Your embedding choices significantly impact cost, but accuracy may not change as much as you'd guess.

By cost, I mean:

  • Financial cost to generate embeddings (longer vectors = more expensive)
  • Storage requirements (vectors are large, especially without quantization)
  • Runtime CPU load and query latency

Here is what surprised me: for my datasets, using the larger and more expensive embedding model did not meaningfully change search accuracy from the user's point of view. The results were nearly identical whether using OpenAI's text-embedding-3-small or text-embedding-3-large.

Look at today's OpenAI embedding model pricing:

ModelDimensions (Default)Price Per 1M Tokens
text-embedding-3-small1536$0.02
text-embedding-3-large3072$0.13

That's a 6.5x cost difference, for generation only, for potentially little improvement.

Some datasets and applications will benefit from longer vectors. But my point is that you should not assume this will be true in your application. Test with your actual data before committing to higher costs.


Hybrid Search: Why Not Both?

Given all this, am I down on semantic search? No, not really. But let me introduce you to my favorite solution: hybrid search.

Hybrid search runs both keyword and semantic search, then intelligently combines results. It's a "why not both?" idea that actually works brilliantly in practice.

The industry is converging on hybrid search from both directions. Traditional keyword search oriented databases like Elasticsearch added semantic capabilities. Vector-first companies like Pinecone added keyword search (starting around 2022).

Quick terminology tip: dense vectors = semantic search, sparse vectors = keyword search. For many of us, that terminology is not the most intuitive, but if you read up on it it will make sense.

I have no affiliation here, but I've been genuinely impressed by Meilisearch. They built keyword, semantic, and hybrid search into a fast and incredibly easy-to-use package. Despite being young, it has to be the most performant and reliable search database I've used. It won't fit every use case, but it's worth a look.


AI Agents Enter the Chat

I'll cover this more in future posts, but it deserves mention here.

AI agents differ from traditional software in one key way: they decide what to do. Agents can decide what searches to run, if you arm them with the right tools. Like humans, they'll try multiple queries as needed, but without getting tired or frustrated.

Instead of engineering complex RAG pipelines to pre-fetch context, you can let AI agents gather information dynamically through search. I've watched agent-driven approaches outperform RAG pipelines that I spent weeks tuning.

Give an AI agent hybrid search capabilities and watch the magic happen. Let the agent experiment with different keywords, filters, and query strategies. This approach can even solve reranking "for free" by intelligently selecting which results to explore based on a summary of search result titles and descriptions.


Your Turn: Let's Level Up Together

So where does this leave us?

If you're building search for a SaaS product, here's my advice.

Start with an honest requirements assessment

What queries do your users actually make? If they search for SKUs, part numbers, or similar identifiers, you need keyword or hybrid search. Don't just default to semantic.

Try hybrid search for new projects

Tools like Meilisearch, Pinecone, Weaviate, and Elasticsearch make it straightforward. You'll want the ability to turn a knob to favor semantic or keyword search depending on the situation.

Test embedding models on your data

Don't assume bigger embedding vectors will translate to noticeably better results. Test with your actual data before committing to higher costs.

Evaluate agentic querying

Save your engineering team from building the "perfect" context pipeline. Give AI agents good query tools and let them do the research. You will need to factor in the fact that the agents will take a bit longer to return results, and you will want to tailor the UI design to account for this.

Share your experiences

This space moves fast and we're all still figuring it out. I'd love to hear what you're building. Everyone will benefit from sharing their experiences.


What search challenges are you facing right now?