I Talked to 500GB of Retail Data With Zero Domain Knowledge.

I Talked to 500GB of Retail Data With Zero Domain Knowledge. The AI Designed a Strategy I Didn’t Know to Ask For.

Press enter or click to view image in full size

The end of a single investigation: the AI’s planogram optimisation strategy with projected business impact — +10% total sales, +£86,424 annual revenue. 134.9 seconds. $1.66 in compute. The agent is already suggesting what to investigate next.

Here is what happened.

A retail chain had 500GB of transactional data stored in Apache Iceberg. Sales records, promotions, product catalogues, store layouts, inventory movements — the full picture of a multi-store retail operation spread across multiple tables with complex relationships.

I connected this data source to an agentic data platform I had built. One connection. No schema mapping, no ETL pipeline, no data dictionary.

The system analysed the metadata and sampled the data. Within minutes, it understood the structure: it mapped the entity-relationship diagram, identified how tables related to each other, inferred business meaning from column names and data patterns, computed high-level statistics, and presented a functional map of the entire dataset — what the data represents in business terms, not just technical terms.

Press enter or click to view image in full size

The starting point: Data Discovery agent - a text box, and plain English. No schema knowledge required. Just “I want to explore data catalog.”

Then the user — me, with no prior knowledge of this specific dataset’s structure — started asking questions.

“Show me sales trends quarter-on-quarter for the last two years.”

The system didn’t just write a SQL query. It identified which tables contained sales data, which contained time dimensions, how they joined, what the correct aggregation logic was, provisioned the compute infrastructure needed to process 500GB of data, executed the query, and returned the results. The user never saw a table name, a join condition, or a WHERE clause.

“Which products performed well?”

The system built on its prior understanding, ran a new analysis, and identified top performers across categories, stores, and time periods.

“Which promotions didn’t perform well?”

Now it correlated promotion data with sales impact — a cross-table analysis that would traditionally require a data analyst to understand the schema, write complex joins, and validate the logic.

“What can be done to improve promotion efficiency?”

This is where the system did something no human analyst could replicate — not only in speed, but in method. It did not write one query. It launched an autonomous reasoning chain — over a hundred rounds of think, query, evaluate, adjust, query again. In each round, the AI understood the business context of the question, mapped it to the dataset, decomposed it into one or more queries, executed them, joined the results with what it already knew, reasoned about what the data meant, and decided what to investigate next. It optimised its own path: using fast approximate queries when it needed directional reasoning, switching to slower pinpointed queries when it needed precise answers.

It analysed promotion effectiveness patterns — not by running a single report but by systematically exploring which promotion types worked for which product categories in which store locations across which seasons, testing dozens of hypotheses, discarding dead ends, and pursuing promising signals deeper. Each round of this reasoning chain — the kind where a human analyst thinks about the business problem, figures out what data to look at, writes the query, validates the output, interprets the result, and decides what to do next — takes a skilled analyst hours. Some take days. Across a hundred rounds, a human team would need weeks.

The AI did it in minutes.

Here is the part that matters most.

At every step of this investigation, the AI didn’t just answer and wait. It guided me. After presenting the sales trends, it suggested what to explore next. After showing promotion performance, it offered to correlate with product categories. After analysing effectiveness patterns, it prompted me: “Your dataset contains product planogram planning information. I can apply the sales trend and promotion analysis we’ve done to optimise the planogram. Would you like me to do that?”

I had designed the agents to present business-contextual next steps at every point in the user’s journey — to guide the user toward deeper insights rather than leaving them to figure out what to ask next. But I am a technologist. I know nothing about retail. I had never seen this dataset before. I was simply following the AI’s guidance on a dataset I happened to have access to.

And the AI led me — a person with zero retail domain expertise — from a cold start on an unfamiliar dataset to a shelf placement strategy with a projected 10% boost to total sales.

Press enter or click to view image in full size

The AI’s shelf-level revenue heat map — generated autonomously from 500GB of retail data. Eye Level (S2) dominates at £153K. Each cell shows revenue by shelf position and zone, with the agent using visual intensity to surface where the money is.

I said yes to the AI’s suggestion, and it synthesised everything it had learned across the entire investigation — sales velocity by product and store, promotion responsiveness by category and season, customer purchasing patterns, category adjacency effects, inventory turnover rates — and produced the strategy. It projected +£86,424 in annual revenue from planogram optimisation alone, explained why it should work based on the data, identified which assumptions carried the most risk, and suggested what to investigate next to validate it.

I learned afterwards that planogram optimisation via sales data analysis is a well-established retail methodology — the exact approach that specialised retail consulting firms sell as high-value engagements. Industry benchmarks from the Category Management Association and Nielsen IQ research show 10–20% sales uplift from optimised planograms. The AI’s projection was squarely within that range. It had independently arrived at a domain-standard methodology, applied it to this retailer’s actual data, and produced a result consistent with industry benchmarks — in a domain I knew nothing about.

Think about what this means for a real business analyst — someone who actually understands retail, who knows which of the AI’s suggestions to pursue harder and which to redirect, who can add domain context that sharpens the investigation. If a technologist with no domain knowledge arrived at a viable, industry-benchmarked shelf placement strategy by following the AI’s lead, what could a domain expert achieve?

No analyst can do what this system did. Not because analysts lack intelligence — but because no human can hold a hundred rounds of cross-table reasoning in working memory, optimise their own query path in real time, and systematically explore a 500GB dataset without losing context. The AI didn’t just answer faster than a human. It answered in a way that is structurally impossible for a human to replicate.

How an AI Agent Reasons Over Terabytes

The natural reaction to what I described above is scepticism. Language models have context windows. They hallucinate. They cannot hold 500GB of data in memory. How does this actually work?

The answer is three components working in harmony. None of them is remarkable alone. The combination is what produces the retail chain
experience.

Press enter or click to view image in full size

Three components working in harmony. The user speaks plain English. The agent reasons using Metadata (business understanding) and Trino (big data queries). All tools exposed via MCP.

The infrastructure layer: Making big data agent-friendly

An LLM cannot process 500GB. It cannot even process 500MB. Enterprise data platforms run on distributed query engines like Trino — clusters of machines that parallelise SQL queries across massive datasets. But Trino clusters need to be provisioned, sized, configured, connected to data sources, monitored, and eventually torn down. Traditionally, this is infrastructure work — tickets, specialists, days of lead time.

In this platform, the entire big data infrastructure is exposed as MCP tools — the Model Context Protocol, an open standard for AI tool integration. The agent can provision a Trino cluster, connect data catalogues, execute queries, and retrieve results — all through the same tool interface it uses for everything else. The tools are designed as composable units: the agent assesses data volume and operation complexity, decides what cluster size it needs, provisions it, runs its queries, and releases it. The user never knows infrastructure exists.

Get @null_author’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

The critical design decision is in how query results reach the agent. A query against 500GB might return millions of rows. Feeding that into a language model would destroy the context window. Instead, results come back in a paginated format called TOON — the agent sees enough to reason about: column distributions, top values, aggregations, patterns. If it needs to go deeper, it requests the next page or writes a more targeted query. This is how a hundred-round autonomous investigation works without the agent running out of context: it never holds the full dataset. It navigates it.

The reasoning layer: metadata as the agent’s memory

Even with paginated results, an LLM reasoning about raw table names and column IDs cannot make business-level decisions. tbl_promo_xref.promo_cat_cd means nothing to a language model. “Promotion category code linking promotional campaigns to product
categories” means everything.

This is what the metadata layer provides. Before a user ever asks a question, a series of agents have already run against the data:

— A discovery agent connects to the raw data source, maps every table and column, and builds an entity-relationship diagram
— A profiling agent analyses every table — row counts, value distributions, null rates, cardinality — and writes the results to the
metadata service
— An ontology agent reads the profiling output, generates hypotheses about how tables relate (“this column looks like a foreign key to
that table”), spawns sub-agents that run actual SQL queries to validate each hypothesis, detects PII, and builds a business ontology
— entities, relationships, classifications

All of this is written back to the metadata service. By the time the user asks their first question, the platform has already transformed raw data into understood, profiled, ontologically-mapped business knowledge.

Now the agent investigating the user’s question does not need to reason over raw tables. It queries the metadata service for business concepts, semantic measures, and validated relationships. It reasons in business language — “revenue by store by quarter” — rather than navigating SELECT SUM(amt) FROM tbl_ord_det JOIN tbl_str_mst ON…. When it needs to verify something against the actual data, it runs a targeted query through Trino and gets a paginated response. Metadata for thinking. Trino for verifying. That combination is what makes terabyte-scale data investigable by a language model.

The agent itself: Plain English as code

This is the part that surprises people most.
Here is the explorer agent — the one that guided me through the retail investigation and produced the planogram strategy. This is the actual , though partial system prompt, excerpted from its Markdown file:


# Data Explorer
You are the user's partner for data exploration and business analysis.
Core Principle: You are a BUSINESS AGENT. Think in business terms
(revenue, customers, orders), not infrastructure.## Goal-Seeking Flow
1. UNDERSTAND - What does the user want?
2. FIND DATA - Search metadata for relevant tables/measures
3. BUILD QUERY - Use measures, join tables correctly
4. EXECUTE - Run via query_executor
5. PRESENT - Show results with insights
6. ITERATE - User asks follow-up → back to step 1
## Rules
1. Think business, not infrastructure
2. Use measures when available
3. Present insights, not just data
4. Guide the user with next steps
5. Ask when uncertain

That is it. That is the agent that took me from zero domain knowledge to a shelf placement strategy with a projected 10% sales uplift. Not a thousand lines of application logic. A set of instructions in plain English that tell the AI how to think about data, how to present what it finds, and when to suggest what to explore next.

Every agent on the platform works this way — the discovery agent, the profiling agent, the ontology agent, the explorer. Each one is a Markdown file written in English paired with a short YAML configuration that specifies which tools to connect:

name: explorer
 model: claude-sonnet-4–5
 tools:
 - query_executor
 - search_all
 - query_ontology
 - list_measures
 - resolve_measure
 - ask_user

The agent runtime takes the Markdown and the YAML, connects to the specified MCP tools, and executes. No application code sits between the system prompt and the tool calls. I spent more time writing these prompts than on any other part of the system. They are the product, in a way that is hard to accept if you come from traditional software engineering.

This architecture — agents as prompts with composable tools — is not my invention. My design was heavily inspired by my experience as a deep user of Claude Code, and I later discovered that Anthropic’s own guidance on building agents follows the same pattern. Readers who want to understand the design principles more deeply should start there.

The flywheel

These three components (infra, metadata and agents) do not just coexist. They compound.

Press enter or click to view image in full size

The flywheel: each agent enriches the platform for the next. Discovery, Profiling, and Ontology agents write to Metadata. The Explorer reads Metadata and queries raw data via Trino. On success, the Codify agent turns the workflow into a new agent.

The platform scales in capability without scaling in complexity. Once the MCP tools are in place — the infrastructure layer and the reasoning layer — adding a new feature is writing a new system prompt. The explorer did not produce the planogram strategy because someone coded a planogram feature. It produced it because the prompt said “guide the user with next steps” and “present insights, not just data” — and the tools gave the agent everything it needed to follow those instructions wherever the data led.

The discovery agent maps raw schemas and writes them to metadata. The profiling agent analyses distributions and writes them to metadata. The ontology agent builds business relationships and writes them to metadata. The explorer reads all of it and reasons at the business level. And when an investigation succeeds, a codification agent analyses what happened and turns the workflow into a new reusable agent — more English, not more code.

The platform gets smarter with every investigation, not because anyone ships a feature, but because each agent deposits understanding that the next agent builds on.

What the system does not do

I want to be direct about this because it is the single largest barrier to enterprise adoption.

The system does not produce 100% correct, perfectly reproducible output.

With well-structured system prompts, the agent reaches roughly 90% accuracy on business questions. With guardrails — validation steps,
cross-checks, confidence thresholds built into the prompt — it reaches 95–99%. That still leaves a margin. The planogram strategy was consistent with industry benchmarks, but it was also the product of a probabilistic system that could have reasoned slightly differently on a second run.

Ask the same question twice and the agent will take different paths through the data. It will arrive at similar conclusions, but the output format will differ, the intermediate queries will differ, the emphasis may shift. Two analysts given the same dataset will also produce different reports that converge on the same insight. The difference is that nobody expects two analysts to be identical. People do expect software to be deterministic.

Enterprise users trained on traditional BI tools expect the same query to produce the same dashboard every time. An agentic data platform does not work this way. The user’s role shifts — from consumer of reports to collaborator in an investigation. The right way to use this system is not to accept the first answer. It is to probe: ask the agent to validate its assumptions, challenge its reasoning, re-run the analysis with different constraints. The hundred-round investigation I described earlier is exactly this process. The agent is designed to be interrogated.

This is a genuine mindset shift. The technology works. Whether organisations are ready to work with AI as a reasoning partner rather
than a reporting tool — to trade determinism for the ability to answer questions nobody thought to ask — that is the harder problem.

What This Means

Today, the path from raw enterprise data to business insight runs through a bottleneck: someone technical must understand the schema, configure the tools, write the queries, and translate the results. This bottleneck exists regardless of whether the underlying platform is Snowflake, Databricks, or a custom data warehouse. The tools have gotten better over the years. The bottleneck has not moved.

AI-native data platforms — systems where AI operates the infrastructure, infers the semantics, and conducts the investigation autonomously — eliminate it. The user speaks in business language. The system handles everything between the question and the answer. And critically, the system guides the user toward questions they didn’t know to ask.

The foundations are available. MCP for tool discovery and composition. Trino for federated query processing. Temporal for durable workflow orchestration. Frontier language models for reasoning. What is missing is the integrated architecture — a system designed so that these components reinforce each other rather than merely coexist.

I have built one version of that architecture. The retail investigation is what it produced. Whether this specific platform reaches production or whether someone else builds a better one is less important than the category it points to: software where a single system prompt can take a person with no domain knowledge from a cold start on an unfamiliar dataset to a strategy they did not know to ask for.

This story is published on Generative AI. Connect with us on LinkedIn and follow Zeniteq to stay in the loop with the latest AI stories.

Subscribe to our newsletter and YouTube channel to stay updated with the latest news and updates on generative AI. Let’s shape the future of AI together!