Make ChatGPT Stop Chatting and Start Writing SQL

You are a natural language to SQL code translator. Your role is to translate text to SQL. Return only SQL. Do not include…

Edit 03/21/2023: OpenAI is discontinuing the Codex API on 03/23/2023 and recommends switching to the ChatGPT API. This article shows some of the key differences between the two models and shows how to use the ChatGPT as a replacement for the Codex model.

Even though ChatGPT needed a longer prompt, it was considerably cheaper than the text-davinci-003 model for text-to-SQL translation.

The OpenAI ChatGPT API was released last week. OpenAI advertises the underlying model, gpt-3.5-turbo, as the “most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003.” OpenAI notes that the model “does equally well on completions when compared with the Davinci model family.”

“Most Capable” and “cheaper”? Naturally, we want to use it to power our text-to-SQL translation engine. We originally built our text-to-SQL translation system using the OpenAI Codex model code-davinci-002:

We don’t want the experience or UI to change: we still want direct text-to-SQL translation, not a chat interface. Unfortunately, it’s not quite as simple as changing the model from code-davinci-002 (Codex) to gpt-3.5-turbo (ChatGPT).

This is the prompt format we use for Codex models (note the language specification and schema details):

-- Language PostgreSQL
-- Table = <table name>, columns = [col1 type1, col2 type2, ...]
-- A PostgreSQL query to return 1 and a PostgreSQL query for <natural language prompt>SELECT 1;

Sending this prompt to the ChatGPT API with a simple question about the Palmer Penguins database yielded the following results:

The query to return 1 in PostgreSQL is:```sql
SELECT 1;
```
And the query to count the number of penguins in the "penguins" table is:
```sql
SELECT COUNT(*) FROM penguins;
```
This will return the total number of rows in the "penguins" table, which represents the number of penguins in the table.

All we want is SELECT COUNT(*) FROM penguins;. The other details make for a useful learning resource and great context, but we don’t need them for our translation system — the excessive context actually works counter to our use case. We’re certainly not alone.

In this post, we show how we used the gpt-3.5-turbo model as a replacement for the code-davinci-002 model, allowing us to directly use ChatGPT for code translation.

How Chat Completions Work

We don’t want our text-to-SQL translation system to be a chat-based system — we want users to be able to ask a question, then have it translated to SQL, not go back and forth with the system. To get individual, specific completions, we still need to understand how the chat completion API works.

ChatGPT is expected to “remember” past parts of the conversation. Instead of a simple prompt, then, the ChatGPT API accepts a list of messages, with each message attributed to either the system, the user, or the assistant. You can pass a whole series of messages in each API request, representing the state of the conversation up to that point.

For example, the start of a conversation might look like this:

messages = [
{"role":"system",
"content": "you are a helpful assistant"},
{"role":"user",
"content": "I need help writing SQL"},
{"role": "assistant",
"content": "Sure, I'd be happy to help! What do you need help with specifically? Do you have a specific database and table in mind, or a particular query you're trying to write?"}
]

Each message is a key-value pair taking the keys user and content. The role specifies who made the message (usually the user or the assistant (ChatGPT)). The content is the message itself: either what the user wrote or ChatGPT’s response.

The sample exchange above starts with a system message. The system message gives high-level instructions on how ChatGPT should behave. Notably, OpenAI’s documentation mentions that “gpt-3.5-turbo does not pay strong attention to the system message, and therefore important instructions are often better placed in a user message.”

The next message, with role user, represents the first message from the user. And the following message, from assistant, is ChatGPT’s response. Over the course of the chat session, we would send a growing list of messages so the model was able to respond based on the complete history of the exchange.

In this particular application, we don’t need to send any assistant messages. We’re not trying to support a back-and-forth between the user and ChatGPT, and we don’t need the model to have access to earlier translations. We just want SQL.

Get Daniel Liden’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

A very basic request for a SQL translation in Python looks like this:

import os
import openaiopenai.api_key = os.getenv("OPENAI_API_KEY")
system_prompt = "you are a text-to-SQL translator. You write PostgreSQL code based on plain-language prompts."
user_prompt = """
 - Language PostgreSQL
 - Table = "penguins", columns = [species text, island text, bill_length_mm double precision, bill_depth_mm double precision, flipper_length_mm bigint, body_mass_g bigint, sex text, year bigint]
You are a SQL code translator. Your role is to translate natural language to PostgreSQL. Your only output should be SQL code. Do not include any other text. Only SQL code.
Translate "How many penguins are there?" to a syntactically-correct PostgreSQL query.
"""
# list of dicts specifying roles and content
query = [{"role":"system", "content": system_prompt},
         {"role":"user", "content": user_prompt}]
completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo"
    messages=query,
)
print(completion.choices[0].message)

Here, we send one system prompt explaining the purpose of the “chat” session, and one very specific user prompt. The prompt is key: we need to tell ChatGPT exactly what we’re looking for, otherwise we might waste tokens or end up with lengthy and difficult-to-parse output.

Teaching ChatGPT to Talk SQL

From the ChatGPT response at the beginning of this article, you might notice the clearly-demarcated SQL code blocks. Technically, we could parse the ChatGPT output and pull out the SQL. There are good reasons not to do this, though:

It wastes tokens. We’re not going to use the text in between the SQL, so why include it in the first place?
If ChatGPT returns multiple blocks of SQL, as it often does, we don’t have a clear direction on which one to select and return to the user.

So how do we do it? We just have to tell ChatGPT exactly what we want. If it can be a pirate, an Excel sheet, or a Linux terminal. Surely it can be a SQL translator.

With the following prompt, the model returns nothing but ready-to-execute SQL:

-- Language PostgreSQL
-- Table = "penguins", columns = [species text, island text, bill_length_mm double precision, bill_depth_mm double precision, flipper_length_mm bigint, body_mass_g bigint, sex text, year bigint]
You are a SQL code translator. Your role is to translate natural language to PostgreSQL. Your only output should be SQL code. Do not include any other text. Only SQL code.Translate "How many penguins are there?" to a syntactically-correct PostgreSQL query.

Here’s a comparison of the ChatGPT prompt (green) with the Codex prompt (red):

Press enter or click to view image in full size

You can try this out directly on chat.openai.com.

Press enter or click to view image in full size

No extra explanation, no formatting, just SQL.

Is it Worth It?

Right now, the Codex code completion models are in limited beta and are free to use. Notably, they are subject to reduced rate limits: whether on a free trial or paid plan, you can only make 20 Codex model API requests per minute. If you don’t expect to exceed this, Codex is a good choice for now (but stay tuned for some more detailed tests comparing Codex and ChatGPT-based completion quality).

ChatGPT completions cost $0.002 per 1,000 tokens, compared to $0.02 per 1,000 tokens for OpenAI’s flagship text-davinci-003 model. This might not translate to 90% cost reduction because longer prompts may be necessary to get the desired behavior out of ChatGPT. But in our limited testing, ChatGPT prompts were nowhere near 10 times as large as the Codex prompts.

For example, the following prompt for ChatGPT-powered text-to-SQL translation used 142 total tokens: 135 for the prompt and 7 for the completion.

-- Language PostgreSQL
-- Table = "penguins", columns = [species text, island text, bill_length_mm double precision, bill_depth_mm double precision, flipper_length_mm bigint, body_mass_g bigint, sex text, year bigint]
You are a SQL code translator. Your role is to translate natural language to PostgreSQL. Your only output should be SQL code. Do not include any other text. Only SQL code.Translate "How many penguins are there?" to a syntactically-correct PostgreSQL query.

And here is a codex prompt to generate the same query:

-- Language PostgreSQL
-- Table = "penguins", columns = [species text, island text, bill_length_mm double precision, bill_depth_mm double precision, flipper_length_mm bigint, body_mass_g bigint, sex text, year bigint]
-- A PostgreSQL query to return 1 and a PostgreSQL query for How many penguins are there?SELECT 1;

This used 97 total tokens: 87 for the prompt and 10 for the completion.

So the ChatGPT request cost 135 tokens * $0.000002 = $0.00027. Assuming the Codex pricing will be in touch with the text-davinci-003 model, the Codex request would cost 97 tokens * $0.00002 = $0.00194: more than seven times more than the ChatGPT completion. And this is without a concerted effort to come up with a concise prompt for the ChatGPT model.

In summary: with some careful prompting, we can get pure SQL output from ChatGPT. Right now, it’s not cheaper than the Codex models because the Codex models are in a free (but subject to reduced rate limits) limited beta. But if the Codex code-davinci-002 model ends up priced similarly to the text-davinci-003 model, the cost savings associated with using ChatGPT for code completion will be quite significant.

Try it Yourself

We’ve added the option to try ChatGPT SQL completions in our Natural Language to SQL Translation Prompt Playground. Simply choose “ChatGPT” as the model and try out some different prompts.

Try it out: you can now select Codex or ChatGTP models in the prompt comparison playground.

Let us know what you find! What differences did you need to make to the prompts? Were you able to find a more concise prompt that still consistently resulted in SQL output?