Pre-Classify Tasks for Better ChatGPT Completions

Press enter or click to view image in full size

We’ve been doing a lot of work on getting LLMs, especially ChatGPT, to talk to Postgres databases. These models are very good at understanding a user’s intent from a prompt and supplying the correct completion. But there are a lot of different tasks someone might want to do with a Postgres database, and sometimes the models need a little help in the form of extra context, hints about pitfalls, specific formatting instructions, etc.

You can do this by using a series of prompts that incrementally get you closer to your desired outcome, but this can take a long time and get expensive fast. We can also try to fit all of this in one big prompt. After all, OpenAI is working on a version of GPT-4 with a 32,000-token context length. A single huge catch-all prompt, however, has serious downsides:

It’s expensive: every single request uses many prompt tokens
It’s hard to maintain: how do you determine which part of a giant prompt to update in order to correct unexpected or undesirable results, or to add new functionality?
LLMs tend to give better responses with clear and specific prompts: specificity isn’t possible with a single catch-all prompt.

We recommend splitting the difference. In this short post, we show how a two-step process of first classifying a task and then supplying a task-specific prompt can result in more reliable results at a lower cost. This highlights the flexibility of LLMs — with no pre-training, and few (if any) examples, we can use ChatGPT as a fast and inexpensive classifier, making it much easier to accomplish specific, narrower tasks.

Divide and Conquer

We’ve been working on a project aimed at allowing users to call our developer API with an LLM. The idea is that, given a user prompt and some information about the API spec, the LLM can return a JSON object with the necessary API parameters, which we can use to make the API request.

When we started this, we sent the LLM the user’s prompt along with a condensed version of the entire API specification and asked it to return a JSON object with the necessary parameters. We made some refinements to this approach by removing any parameters not supplied by the user and sending the API spec in a cleaner format. It worked surprisingly well, but not well enough to be useful in most cases. It would often mess up the format, return invalid JSON, miss necessary parameters, or incorrectly complete parameters.

We decided to try a two-step approach:

Pre-classify the task: given a user prompt and a list of valid tasks, ask ChatGPT to return the name of the correct task (and nothing else).
Send a task-specific completion template: given the task name from (1), we sent ChatGPT a prompt that requested that the necessary details be returned in a specific template by replacing placeholders in the template.

Press enter or click to view image in full size

The task classifier selects the appropriate task so we can send a concise, task-specific prompt to the LLM.

This greatly simplifies what we’re asking of the LLM at each step. With the “one big prompt” approach, ChatGPT had to:

Identify the correct task
Identify the correct contextual information for that task
Choose what information and format to return based on the task

all from a single prompt.

Get Daniel Liden’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

The classify and complete approach, on the other hand, results in one specific task per prompt:

Identify the correct task from a list
Fill in the blanks in a template.

Let’s get into each of these parts in more detail.

Classification Prompt

The classification prompt gives the user’s prompt along with a list of valid tasks and asks the model to return only the name of the task. Here’s an example prompt we use for API call classification:

"""Your task is to determine which of the following tasks the user's prompt falls into.  user prompt: {user_prompt}
  tasks:
  - execute a query
  - create a new database
  - update a database
  - import data from a URL
  - list metadata for all databases
  - get metadata about a database
  - create an API key for a service account
  - create an API key for the current account
  - revoke all API keys for a service account
  - export data
  - delete database
  - no task specified
  return only the task name. For example:
  user: Provide information about my databases
  assistant: list metadata for all databases
  user: Load data from this URL to my database
  assistant: import data from a URL
"""

For example, if a user wants to delete their database called temp_db, they might say: “delete database temp_db.” The model would return delete database and nothing else: at this phase, it doesn’t need to worry about the database name or any other parameters.

Task-Specific Completions

The classifier returns a task name: the list of tasks is predictable, and the name is always returned in the same format. As such, we can write a function that takes the task name and takes the specific action requested (likely involving another request to the LLM):

def generate_parameters(
    user_prompt: str,
    task: str,
    curs: Optional[psycopg2.extensions.cursor] = None,
):
    if task == "execute a query":
        return generate_parameters_execute_query(user_prompt, curs)    elif task == "create a new database":
    return generate_parameters_create_db(user_prompt)
    elif task == "list metadata for all databases":
output = {"endpoint": "/v2beta/db/",
"parameters": []
}
...

This function shows how a few different tasks might be handled: the “execute a query” prompt invokes a function to generate and send the query-specific prompt; it needs a psycopg2 cursor in order to get schema information from the database. The “create a new database” prompt results in a call to a function to generate and send the database creation prompt; it requires no further contextual information. Lastly, the “list metadata for all databases” task does not require any additional requests to the LLM; the endpoint can be called directly with the user’s credentials. These examples show how the classifier allows for different “paths” with different contexts and completion options.

Here’s the task-specific prompt the generate_parameters_create_db function completes and sends to the LLM:

 """Update the following JSON template by filling in placeholders based on the user's prompt.
The task is to create a new database.- "name" is the name of the database to be created.
- "is_private" is whether the database is private (true) or public (false). Default to true.
- "is_paid" is whether the database is paid (true) or free (false). Default to false.
- "burst_row_operations_limit" is the burst row operations limit. Default to placeholder. default to 0 if is_paid is true.
template:
{{
  "name": {{
    "type": "string",
    "value": "{{name}}"
  }},
  "is_private": {{
    "type": "boolean",
    "value": {{is_private}}
  }},
  "is_paid": {{
    "type": "boolean",
    "value": {{is_paid}}
  }},
  "burst_row_operations_limit": {{
    "type": "integer",
    "value": "{{burst_row_operations_limit}}"
  }}
}}
user prompt: {prompt}
Only return the updated JSON. Do not return any other text. Leave the placeholder unchanged if the user did not provide the necessary information. Return the template unmodified, with no other text, if the prompt did not supply any relevant information.
"""

This prompt is highly specific to the database creation task. It includes a response template with placeholders for the model to fill in. It only requires the LLM to fill in the placeholders based on the user’s prompt; it doesn’t need to identify the task (or determine if there is a valid task).

This approach addresses all of the challenges associated with large catch-all prompts:

It uses far fewer tokens than sending the whole API spec with instructions, caveats, notes, and defaults for each separate task.
It is easy to maintain. If we get bad responses from the “create a database” task, we know the issue is in this specific prompt — there’s no possible interference from other tasks.
It is clear and specific. We provide precise instructions for what the response should look like, default values, and explanations of the parameters to fill in from the user’s prompt.

Best Practices

We got to this point through a lot of testing and have come up with some best practices for implementing a classify and complete LLM strategy.

Provide examples to the classifier. This makes the output format clear and helps to prevent unwanted explanatory text.
Include a category for when there are no matches. A user might enter a prompt that does not correspond to any of the valid choices, and it’s useful to be able to catch and respond to these cases.
When coming up with classifier categories, be mindful of the terms users will actually include in their prompts. We might think in terms of “revoking” API keys, but a user might ask to “delete” their keys, which could cause some confusion with the “delete database” task.
If possible, use a fill-in-the-blank approach for the task-specific prompts. ChatGPT is usually quite good at respecting requests for output in specific formats, but it’s not completely reliable.
Test, test, test! Remember that LLM responses are non-deterministic. It’s hard to predict, for any given prompt, how well the LLM will follow the instructions. Repeating the same prompt might yield different results. You might need to include a lot of specific instructions, warnings, and examples to get the exact results you need — but it’s hard to know until you’ve seen how the model responds to a wide range of different prompts.

You don’t *always* need to use an LLM. We found that, for some tasks, we didn’t actually need to call the LLM. Getting the task classification was enough, and we could handle the rest with simple Python scripts. This is faster and it cuts down on costs. Much of the magic of LLMs comes from how they can be used with, not instead of, other tools.

With these examples and best practices, you should be able to use the classify and complete approach for more effective and maintainable LLM completions.