Making a Production LLM Prompt for Text-to-SQL Translation

Press enter or click to view image in full size

We have just kicked off an open-source project with the goal of crafting production-ready large language model (LLM) prompts for text-to-SQL translation. Our goal? Leverage LLMs, our own deep knowledge of PostgreSQL databases, and rigorous testing to develop best-in-class text-to-SQL translation.

To take full advantage of LLMs, we are developing production LLM prompts that are thoroughly tested, carefully tuned, and designed with security in mind. This blog post will introduce the project and explore the key principles, challenges, and best practices for creating a production-ready prompt that can be used to reliably generate accurate SQL queries from plain text while avoiding unintended uses and effects. Want to jump right in? Clone the open-source project repository and try out the prompt playground.

The prompt playground lets you experiment with different combinations of initialization prompts, user prompts, and schema details.

Make sure you’re subscribed to the Inner Join newsletter to stay up to date on all the latest project developments.

Text to SQL Translation: The Basics

SQL is the third most used programming language. It is the first language many aspiring developers learn. We just released AI-powered text-to-SQL translation in bit.io to lower the barriers to entry for learning and using SQL, allowing users to focus on the underlying logic of their queries rather than the syntax.

You can use this on bit.io right now. In the bit.io query editor or query API, place the following on the first line: #!translate:text. On the next line, type out your plain-language request in question form, something like, what record with provider 'aws' has the lowest latency? on our cloud latency data (see our article on cloud data center latency). It will translate to a query, which you can edit or run.

Our text-to-SQL translation function leverages OpenAI’s Codex models to send text and database schema information (the “prompt”) to an OpenAI LLM. The model generates the requested SQL and returns it to the user, who can then edit (if needed) and execute the query.

We craft a prompt from a combination of instructions about the desired output language; database schema details; and the user’s natural language prompt. The OpenAI model returns usable SQL. Always review the SQL before executing!

This might sound simple, like we’re just sending the text to a third-party API and returning the results. Certainly, OpenAI’s LLMs are doing the heavy lifting here. But there is a lot of nuance involved in making sure the models return a usable result, and do so quickly, efficiently, and securely. That’s where we’re focusing our efforts. Decisions such as model choice, hyperparameter values, and prompt content can all have dramatic effects on the quality of the results returned.

State of the Art (So Far)

Our text-to-sql translation functionality works well. We spent a lot of time working on the initial prompt, comparing a few different models, and tweaking the model hyperparameters. So far, most of the improvements to the prompt have been made through trial and error: we read a lot of articles about prompt engineering, wrote a lot of prompts, and iterated on the ones that worked. We also leveraged our existing schema summarization capabilities to pass schema details along with the prompt, allowing the OpenAI models to return SQL with correct identifiers.

The current prompt is fairly simple. It first passes a trio of comments: the first specifies the language (PostgreSQL); the second passes along schema details (schemas, tables, columns, types); and the third specifies the desired output, incorporating the user’s natural language query. The last line of the prompt says SELECT 1; which indicates that we don’t want to receive the output SQL as a comment, but as ready-to-execute SQL.

-- Language PostgreSQL
-- Table penguins, columns = [species text, island text, bill_length_mm double precision, bill_depth_mm double precision, flipper_length_mm bigint, body_mass_g bigint, sex text, year bigint]
-- A PostgreSQL query to return 1 and a PostgreSQL query for {natural language query}
SELECT 1;

With this prompt, given a well-specified plain-language query, the OpenAI models generally:

Return working code corresponding to the user’s plain-text query,
Return Postgres-compatible SQL code, not code from other languages or other SQL varieties, and
Include correct (but not always correctly-formatted; see below) identifiers corresponding to the database schema

Press enter or click to view image in full size

bit.io’s existing text-to-sql translation functionality works very well in most cases, returning Postgres-compatible SQL with identifiers specific to the database being queried.

Challenges

There are still opportunities for improvement.

One of the primary obstacles is sending a concise prompt to the Codex model while still providing enough schema information. We want to supply enough information about the database schemas to obtain usable queries without sending excessive tokens in the prompt and potentially driving up the cost of using the OpenAI model APIs.

Correctly formatting and quoting some identifiers is another challenge. The output is correct most of the time, but in some cases, tables in schemas other than “public” are formatted incorrectly, and table names with capitalization or special characters are not returned in quotes. These specific nuances of SQL syntax require careful handling to ensure the output is accurate.

Preventing misuse of the text-to-SQL translation function, as through prompt injection, is another important challenge and is critical for maintaining user trust in the system. Preventing misuse helps us keep our focus narrow and ensures that we don’t incur unnecessary costs or risk exposing a general-purpose code translator tool. For example, it’s currently possible (though inconvenient) to leverage the following approach to generate code in other languages:

#!translate:text
return a string defining a python function for adding two numbers

This query returns:

SELECT 'def add(x, y): return x + y'

There are easier ways to get AI-generated python code; and there are valid reasons to store code snippets in a database. Simply blocking this usage pattern isn’t the answer, but it is still valuable to anticipate and prepare for possible unintended usage patterns.

Lastly, preventing users from accidentally modifying or deleting data is hugely important. No one should ever execute code generated by an LLM without first reviewing it. But we also want to make it very clear when users might be running queries that could result in data modification or loss.

Get Daniel Liden’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

For example, the (rather poorly-articulated) prompt

#!translate:text
update the table to make it clear that all of the islands in the table are in Antarctica.

Results in the following SQL:

UPDATE penguins SET island = 'Antarctica' WHERE island IS NOT NULL

It’s probably not the user’s intent to overwrite the “island” column with Antarctica. Maybe the intent was to add a “continent” column, or to append “(Antarctica)” to each entry in the island column, though that intent is not clear from the prompt. Regardless, it would be useful to have safeguards in place that prevent users from blindly executing such queries and accidentally altering data.

Of course, we’re not pointing out these challenges just for fun. We have a plan to address them.

pg-text-query Open Source Project

We are continuously improving the text-to-SQL translation function. And we want to share. We are working on an open-source project for prompts, configurations, and tests to engage with the community and gather feedback, and broadly share our findings. LLMs have often been used as generalists: they’re great at converting any language into any other language and responding to arbitrary text prompts. We want to find out what it takes to make the best specialized text to SQL translator.

There are a few key features you can start using right away: these tools make it possible to start working on improving the prompts used for text-to-SQL translation right away.

Prompt Playground

Clone the repo; install streamlit with pip install streamlit.py; and, from the root directory, run streamlit run playground/app.py. This will open up an interactive “prompt playground” where you can experiment with different combinations of prompts and schema details.

The interactive prompt playground allows for rapid prompt experimentation. Use it to quickly explore ideas and build intuition about how prompts impact the generated SQL.

This is useful for rapidly testing and iterating on different prompt ideas and for building intuition about what works and what doesn’t. You can set the “initialization prompt” (which the end user does not have access to); the user’s plain-language query, and the schema details and see how these different parts of the query interact with each other. You can then generate the SQL and even execute it on a live database. Make sure to carefully review all generated SQL before executing it to ensure you do not accidentally delete or modify data.

Press enter or click to view image in full size

Edit all aspects of the prompt—app supplied, user supplied, and database schema information—and see how each affects the SQL output. You can execute the generated SQL on a live database. Make sure to carefully review all generated SQL before executing it to ensure you do not accidentally delete or modify data!

Schema detail utilities

The db_schema.py module includes utilities for extracting structured schema data from a Postgres database. It’s useful to provide enough schema information to enable the model to include correct identifiers. Too much schema information, however, may take up a large number of tokens, incurring unnecessary cost and possibly leaving too few tokens for the model to successfully generate the desired SQL.

You can use the module as follows:

import os
from pprint import pprintimport bitdotio
from dotenv import load_dotenv
from pg_text_query import get_db_schema
DB_NAME = "bitdotio/palmerpenguins"
b = bitdotio.bitdotio(os.getenv("BITIO_KEY"))
# Extract a structured db schema from Postgres
with b.pooled_cursor(DB_NAME) as cur:
db_schema = get_db_schema(cur, DB_NAME)
pprint(db_schema)

Prompt and Query Generation

You can generate prompts (based on our prompt engineering work so far) with the prompt.py module, which provides helpers for preparing Postgres query prompts.

# Construct a prompt that includes text description of query
prompt = get_default_prompt(
"most common species and island for each island",
db_schema,
)# Note: prompt includes extra `SELECT 1` as a naive approach to hinting for
# raw SQL continuation
print(prompt)

which returns the following prompt:

-- Language PostgreSQL
-- Table penguins, columns = [species text, island text, bill_length_mm double precision, bill_depth_mm double precision, flipper_length_mm bigint, body_mass_g bigint, sex text, year bigint]
-- A PostgreSQL query to return 1 and a PostgreSQL query for most common species and island for each island
SELECT 1;

The gen_query.py module is a wrapper of openai.Completion.create which handles sending requests to the OpenAI API.

# Using default OpenAI request config, which can be overriden here w/ kwargs
query = generate_query(prompt)
print(query)

which, from the prompt above, returns:

SELECT species, island, COUNT(*) FROM penguins GROUP BY species, island

Next Steps

This project is in its infancy, but we have a few major directions we’d like to explore:

The project will include a test suite that encompasses different types of queries with desired text to test different prompts.
Using this test suite, we plan to compare different models, hyperparameters, and prompts. Do some achieve higher accuracy than others? Are we able to achieve the same accuracy with a shorter prompt? Can we make use of a faster or more efficient model without sacrificing accuracy?
We plan to record any avenues for model misuse, as well as mitigation strategies.
Longer term, we will also fine-tune a model on a large collection of SQL queries and translations.

Conclusion

We’re excited about the potential for text-to-SQL translation to enhance the user experience and eliminate barriers to accessing SQL databases. Keep an eye out for the open-source project aimed at collaboratively making the best text-to-SQL translations possible using OpenAI’s LLMs. For all updates, subscribe to the Inner Join newsletter and join our Discord community. Want to contribute? Fork the pg-text-query repository, try out the prompt playground, and see if you can find better prompts for text-to-SQL translation!