Versatility of Exasol with Agentic Engineering

10 min read Original article ↗

Exasol is an analytical database. It’s built for joins, aggregations, window functions, and the kind of queries that chew through billions of rows before your coffee gets cold.

But Exasol is capable of doing much more than just that. It is incredibly extensible and malleable. I pushed the limits of it and solved a problem that one would not expect to be solved by their analytical database.

The problem

Semantic search is essential for data-driven systems. It returns meaningful results from a search query.

Semantic search is a data searching technique that focuses on understanding the contextual meaning and intent behind a user’s search query, rather than only matching keywords. For example, a search query of “How to change the password” would give results that contain documents like “account recovery steps” or “forgotten credentials”. Even though the words in the results do not exactly match the words used in the search query. The users don’t need to search for the exact document. Semantic search handles this for them.

This requires the data to be represented in vectors – embeddings. Vectors are datatypes that contain lists of numbers and floats. Vectors are used to create embeddings – a numerical, representation of low dimensional multi-modal data in high dimensions that capture semantic relationships. These embeddings are calculated by machine learning models specifically designed to create these embeddings.

With all these requirements at hand. This is the type of problem that requires a whole new architecture that lives separately from the database. But, with Exasol along with its extensible architecture and features – all this is possible from within the database.

Discovery

Exasol has three seams. A seam is a place where external code can get inside the database. These seams let the users/agents extend the functionality of the database – and this explains what a user/agent can do when you hand in the platform and some docs.

Most databases give you one extensibility hatch. If you’re lucky, it’s a stored-procedure language. If you’re less lucky, it’s a narrow plugin API with a handful of approved hooks. You can do things, but the shape of what you can do is tightly constrained by whatever the original designers pictured. With Exasol you get these three seams; they are all wide.

Virtual Schemas are the most important part of this whole story. The idea: an adapter sits between Exasol and an external data platform, and that external data platform now looks like a SQL table. When a query hits the virtual table, Exasol hands the parsed query – columns, filters, limits, and the adapter does whatever it wants. Query a database, hit an HTTP API, run a model. It hands row back. Exasol doesn’t care how the rows were made. It treats them like native data.

UDFs (User defined functions) are functions that users can write and run inside Exasol. Python, Lua, Java, R – pick your poison, write code, run it inside a SELECT statement. I used Python UDFs to ingest data: read rows, call an embedding model, push the vectors into Qdrant. Python UDFs have the full python runtime behind them, so you don’t have to install anything. If you need extra libraries like how we needed to run the embedding model in this. You can customize the SLC that sits behind every UDF. Find out more about more on customizing the SLCs here.

Script pre-processors are the third. They rewrite SQL before it runs, so you can intercept and transform queries right before they get executed. I didn’t use them here. But they’re sitting there, fully documented, for whoever wants to build something weirder.

Our Solution

Use Exasol’s Virtual Schemas and point them to an external vector database (Qdrant in this case) to import its vector functionality into Exasol. Use Exasol’s UDFs (User defined functions) to create the local embeddings which also gets injected into the vector database via UDFs – embeddings are data that get converted into vectors. We planned to create embeddings on table columns from an external machine learning model running on an Ollama instance. Ollama is an open-source server on which one can run open-source machine learning models. You can also store these embedding models in Exasol via BucketFS and invoke them with a UDF.

Now, I am not a Lua developer (programming language required to write this adapter). I’ve never written a line of Lua outside of this project. And the specific extensibility mechanism I needed to use – Virtual Schemas – expects adapters written in Lua. Historically that would have meant one of two things: file a request with engineering and waiting a quarter or spend a month learning Lua well enough to not embarrass myself.

I did neither. What I had instead was this:

  • Exasol’s Virtual Schema documentation. Thorough. Annoying to read end-to-end, but complete.
  • Exasol’s UDF documentation. Same story.
  • Claude Code, running on my laptop, ready to take instructions.
  • A Claude skill file that my team created for exactly this kind of project — a distilled set of patterns and conventions for building Virtual Schema adapters.

This is the gist of this whole story – the fact that I as a product manager did this in under a week. With the Exasol virtual schema Skill – made for AI coding agents and the Exasol MCP server – used to test this whole project with a bunch of different simulated personas.

What the agent built

Here’s the division of labor.

I wrote a description of what I wanted. Semantic search against Exasol tables, backed by Qdrant for vector storage, Ollama for local embeddings, and a SQL-native interface. All this needs to be abstracted from the users, so the users can use it as a native functionality without being drowned in all the architectural details. Four columns – ID, TEXT, SCORE, QUERY – and a query shape that looked like SELECTWHERE "QUERY" = 'some phrase' – nothing unusual or different from a regular search query.

Claude Code read Exasol’s Virtual Schema documentation. It read the virtual schema creator skill file, which encoded the conventions and the recipe of creating the virtual schema adapter code — file layout, required methods, how pushDown contracts work, the places you’re expected to plug in. It wrote the Lua modules: the entry point, the adapter lifecycle class, the metadata reader that presents Qdrant collections as tables, the query rewriter that embeds the searched text and runs the hybrid vector + keyword search. Qdrant collections define the structure of the data inside of a Qdrant database. It is a similar concept of what a schema is in Exasol.

Same pattern for the Python UDFs that ingest data. The agent read the UDF docs. Produced a bunch of UDFs – all working in cadence. The whole ingestion pipeline deploys via a SQL CREATE SCRIPT statement.

My role: reviewer and tester. I pointed out things that didn’t work, asked for changes when the behavior was off, and ran the integration tests until they passed. I did not write any Lua. I can now mostly read it, which is a different achievement.

The skill file deserves credit for what it is. It’s the compiled wisdom of someone who’d built these adapters before – the knowledge that would otherwise live in an engineer’s head, transferable only through code review and pair programming. Packaged up, it means the agent isn’t reconstructing the shape of a Virtual Schema from first principles. It starts with the shape and filling in the specifics.

What it looks like

Here’s what a user sees.

SELECT "ID", "TEXT", "SCORE"
FROM vector_schema.bank_failures
WHERE "QUERY" = 'banks acquired by JP Morgan'
LIMIT 5;

Under the hood:

The adapter pulls the query string out of the WHERE clause, ships it to Ollama with an embedding model, gets back a 768-dimensional vector (vector size depends on the embedding model used), sends that vector to Qdrant along with a few compound keyword tokens (“JP” + “Morgan” → “jpmorgan” so that exact-entity mentions also rank high), and Qdrant runs a hybrid search that fuses the rankings using Reciprocal Rank Fusion. The top results come back through the adapter, get wrapped in a SQL VALUES clause, and land in Exasol as rows.

Those rows are just rows. You can JOIN them against other Exasol tables. You can filter them. You can feed them into a CTE. The vector search is not a sidecar tool – it’s SQL. That’s the part worth staring at.

This solution has some tradeoffs and shortcomings that I am trying to mitigate. About eighty percent of that is Exasol’s Lua sandbox spinning up on each call – a platform cost, not a search cost. The actual embedding takes about a hundred milliseconds, and the vector search takes about fifty. For interactive sub-second search you’d call Qdrant directly. For analytical exploration – scan results, join them with metadata, run some aggregations – it’s fine.

Ingestion is roughly six to seven minutes for a million rows. You can write the virtual schema adapter in Java too, but since that will now require a JVM (Java virtual machine). You would have to deploy the adapter file to the BucketFS – one extra step, which I avoided using Lua’s HTTP protocol.

In a week, I was able to work with vectors and do semantic search in Exasol. This capability is not just limited to vectors. This project is a testament to Exasol and its malleability/versatility.

We are pushing the limits of Exasol in our Labs with agentic engineering. Which brings us to exasol-labs

Exasol-labs

This is where we experiment with Exasol and its features. Almost all the projects there are a result of doing excessive amounts of agentic engineering and pushing the limits of it.

This project lives at github.com/exasol-labs.

This space is for experimenting and prototyping. It’s closer to a field notebook – a public home for prototypes built with agentic engineering, and for the agent skills that let anyone else do the same. Some of what’s there will graduate into real features. Some of it will stay where it is, useful or curious or both.

The virtual schema Skill that abstracted all of the complexity and made using virtual schemas a breeze also lives there – along with other Exasol agent skills. Exasol-agent-skills is a repository where you can find all the AI agent skills one requires to work with Exasol. If you’re building with Exasol and you want your coding agent to have a head start, that’s where you look.

Exasol the company is leaning heavily into this mode of working. Not as a side experiment – as a working hypothesis about how software gets built now. Prototypes first. Agentic AI Skills are shared publicly. The extensibility surface is treated as a first-class way to ship, not a fallback to when the core product can’t move fast enough.

That’s the stance, stated plainly and never sold.

Close

Support for vector capabilities with embedding models is soon to be available in Exasol. That will open the possibility to do semantic searches natively without the need for using an external vector database.

Exasol doesn’t store vectors at the time of writing this blog. The engine never changed. Nothing inside the database’s core is different today than it was a month ago.

What changed is that the seams of the platform already had turned out to be wide enough to accommodate something the original designers almost certainly weren’t picturing. An agent read the docs, followed a skill file, and wrote the solution.

This started with an idea. I pointed an agent at it, stayed in the loop for a week, and came back with a working adapter.

Repo: github.com/exasol-labs/exasol-qdrant-adapter. The database didn’t learn anything new. I just handed it a new conversation partner. Now I can do semantic search in Exasol.