From Text to SQL: How We Empower Users with AI-Powered Database Interactions

In recent times, AI has gained unprecedented attention with the emergence of ChatGPT. As a distributed database company called PingCAP, we have been contemplating how to harness the power of AI to enhance our database, TiDB , and improve organizational efficiency.

Using nature language to interact with data

AI-generated SQL in Chat2Query

While SQL has been the primary interface to databases, we observed that even experienced database administrators (DBAs) take a considerable amount of time to handle complex queries involving operations like joins and common table expressions (CTEs). Additionally, developers often struggle to write SQL quickly and efficiently.

To address this challenge, we initially considered eliminating the need for users to write SQL altogether. Instead, we aimed to allow users to express their intentions clearly in natural language, enabling us to automatically generate the corresponding SQL and execute it in TiDB.

Thus, we developed Chat2Query, a feature available in TiDB Serverless. Through the above demo, you can experience firsthand how to interact with the database using natural language.

How did we accomplish this? We utilized OpenAI’s Text-to-SQL capability. As demonstrated in the provided example , the process is remarkably straightforward. However, to enable AI to generate accurate SQL, we had to undertake significant efforts.

Our first challenge was creating a prompt that was both accurate and concise. Sometimes, in order to generate precise SQL, we might write lengthy descriptions. However, this approach would be costly and time-consuming. To overcome this challenge, we employed OpenAI’s embedding search, which I will delve into further in the next section.

Another challenge involved generating SQL that was both accurate and efficient to execute. TiDB already provides valuable metadata, such as table statistics and runtime metrics for workloads. Hence, for AI-generated SQL, we could evaluate its quality and, if necessary, provide hints to the AI to improve the generated SQL. I will explain this process in more detail in a subsequent post.

If you want to use your nature language to talk to your data, you can visit TiDB Cloud. Note that you need to agree to the terms to use AI-powered functions.

Increasing the accuracy of the generated SQL through Knowledge base

In the preceding sections, I mentioned how we enhance the accuracy of AI-generated SQL by developing our own knowledge base. How did we accomplish this? Let’s consider the example of OssInsight OssInsight GitHub Data Explorer.

Explore insights in OssInsight

In the GitHub Data Explorer, after you input a question and hit Enter, we perform the following steps:

Search the knowledge base in a Vector Database, leveraging Pinecone.
Utilize Open AI’s embedding search to find relevant knowledge.
Generate the prompt and send it to OpenAI to generate the SQL.
Execute the SQL in TiDB Cloud and return the result.

The most challenging aspect here is interacting with the knowledge base. We have three types of knowledge for OssInsight:

Database information, including table names, column names, and types.
Common sense knowledge, such as “Go is a programming language”
Examples, like “Q: TiDB developers are based in XXX. A: SELECT …”

The generated prompt encompasses these three types of knowledge (with the possibility of adding more later). Through A/B testing, we have found that the resulting SQL is accurate.

Having gained confidence in this approach, we have expanded its usage to enhance organizational performance. For instance, we have built a company knowledge base to facilitate easy access to information for everyone.

Epilogue

AI is a powerful tool, and we have been contemplating how to transform into an AI-powered company.

In the next article, I will discuss how we utilize AI to build AI-powered Data Services in TiDB Cloud and how AI aids us in code reviews, among other topics.