Earlier this year, we used SQLGlot to support translation of 18 SQL dialects to Postgres-compatible SQL on bit.io. We’ve taken it a step further: powered by OpenAI’s Codex models, you can translate natural language to SQL.
What is it for?
Why did we add this functionality? Shouldn’t users of a PostgreSQL database be able to, you know, write SQL? According to the 2022 Stack Overflow Developer Survey, SQL is the third most used programming language. It is the first language many aspiring developers learn. Effective text-to-sql translation has the potential to make SQL more accessible to new developers by reducing the learning curve and allowing them to focus on the underlying logic of their queries. In short, we’re introducing this because we think SQL is a powerful tool and we want more people to have access to it.
One of our users is particularly excited to hand the keys to colleagues so they can ask their own questions of a database without needing to pass every single request to the data team, which can be a major bottleneck, even if each request only takes a few minutes.
Making databases easy to use is in our company DNA. Our drag-and-drop file ingestion, easy-to-use web editor, easy sharing, and broad compatibility with third-party tools are all part of this vision. Text-to-sql translation is the next step.
How do you use it?
So how do you do it? Put #!translate:text on the first line and a plain-language prompt on the second, and you’ll get a SQL translation in response.
Get Daniel Liden’s stories in your inbox
Join Medium for free to get updates from this writer.
Suppose you’re looking at the Palmer Penguins database and you want to know the average weight of all the penguins on Biscoe island. Type the following into the query editor:
#!translate:text
Average weight of penguins on BiscoeIn a couple of seconds, this will return the following translation:
SELECT AVG(body_mass_g) FROM penguins WHERE island = ‘Biscoe’;Review the query, then run it. Done.
Note: The prompt doesn’t have to be in English! The following works just fine:
#!translate:text
Peso médio dos pinguins em BiscoeHow does it Work?
- bit.io generates part of a prompt made up of (1) an instruction to return PostgreSQL code and (2) information about the database schema.
- The user supplies the rest of the prompt by specifying what they’d like to learn about the data.
- The complete prompt is then sent to the OpenAI API, which generates the corresponding SQL query.
- The resulting SQL is shown in the query translation pane in the web editor.
At this point, you can modify the query or execute it directly (always inspect the query before you run it).
You can also use this functionality via the query API or with other SQL clients such as psql. These do not enable you to preview the translation before executing, though, so we don’t currently recommend it.
Usage notes and warnings
This technology is still relatively new. It doesn’t always behave as expected. A few points to be aware of:
- Make sure to examine the translation before executing it. We cannot guarantee that the translation will not alter or delete data in unintended ways. If you don’t understand what the translated query does, don’t run it.
- Use this as a research tool, not as a replacement for SQL in production. If you’re trying to write production SQL, use the translator to help you write the SQL, and then use the SQL in your pipeline. Excessive usage of the translation functionality may result in throttling or in deactivation of the feature.
- Translation may not work as intended for complex queries or on databases with many tables/columns/schemas.
- If you’re getting incorrect translations, experiment with different prompts.
- If the translated query doesn’t work, make sure schemas and table names are correctly formatted and quoted. Right now, the translation doesn’t always get this right.
What comes next?
As we mentioned, this technology is new and experimental. We’re happy with the initial results. But we want to make the best text-to-sql translations possible. That means investing some serious effort into systematically testing different prompts and model configurations. We’re kicking off an open-source project aimed at doing just that. And we’ll be updating our text-to-sql translation feature in response to our learnings over the course of this project.
Want to be the first to learn more about our new open-source project? Join our Discord.