Machine Learning Architect
How do you get real business value out of ChatGPT?
With all the hype there’s been around OpenAI’s groundbreaking technology, that might sound like an odd question—isn’t a powerful, easy-to-use language model obviously going to generate value? The fact that it’s reached 100 million users faster than any digital application in history certainly speaks to its widespread appeal. But as we tinkered with it, some critical limitations became clear.
One of our defining principles at Smart Design is a focus on user experience and user value, regardless of the technology that enables it. This is why a powerful new technology isn’t enough on its own. To deliver value, it has to address a human need or drive an improved experience.
ChatGPT, for all its promise and appeal, is still a very generic tool. The GPT-3 platform it’s built on is trained on a huge but unfocused data set. For the average business or user, this makes it useful only up to a point—like a personal assistant who’s world-class at looking things up, but knows nothing about you, the specific challenges you face, or even the recent events affecting you.
Any language model is trainable though, and a chat-based assistant that actually knows your business, your industry, your company, or you personally would be game-changing in a way that ChatGPT currently isn’t. That suggests an opportunity: to customize GPT-3 as a platform, using OpenAI’s APIs.
What would that involve, we wondered? How much training data would it take? How much would it cost and how hard would it be?
So we did what any good developer would do, and started running experiments. An obvious place to start is current events since the latest version (GPT 3.5) is only trained on data up through June 2021. Our hypothesis was that we could train GPT-3 on RSS feeds of major news sites, then have it answer questions about recent events.
OpenAI offers a service called fine tuning which allows you to customize a model by feeding it prompts and responses that exemplify what you want it to learn. This was our first approach in running this experiment.
Fine-tuning with OpenAI’s GPT model
The first thing we did was install OpenAI’s Python package, then chose to train it on a topic that required recent information: the train derailment in Ohio in 2023.
pip install --upgrade openai
We also had to set up our OpenAI key, obtained from OpenAI.com. OpenAI offers $18 in free credit, which is far more than what’s needed to run this notebook.
import os
os.environ['OPENAI_API_KEY'] = "Add OpenAI key here"
openai.api_key = "Add OpenAI key here"
As a baseline, we queried DaVinci, which is OpenAI’s state-of-the-art GPT 3.5 model, to see if it knew where the trail was derailed.
prompt = "Where did the train carrying hazardous materials derail?"
result = openai.Completion.create(
model="text-davinci-003",
prompt=prompt
)
print(result["choices"][0]["text"])The exact location of the train derailment is not available, as different
The result is incorrect because the event occurred after GPT 3.5 was trained.
Our first training attempt was to fine-tune the model by adding specific data about the train derailment in early 2023. This required preparing the data, saving it to a file, and uploading it to OpenAI, as follows:
# from https://en.wikipedia.org/wiki/2023_Ohio_train_derailment
examples = [
{"prompt": "2023 Ohio train derailment", "completion": "The 2023 Ohio train derailment (also called the East Palestine train derailment) occurred on February 3, 2023, at 8:55 p.m. EST (UTC−5), when a Norfolk Southern freight train carrying hazardous materials derailed in East Palestine, Ohio, United States.[1] The freight train burned for more than two days, and then emergency crews conducted a controlled burn of several railcars at the request of state officials,[2] which released hydrogen chloride and phosgene into the air.[1] As a result, residents within a 1-mile (1.6-kilometer) radius were evacuated, and an emergency response was initiated from agencies in Ohio, Pennsylvania, and West Virginia. The U.S. federal government sent Environmental Protection Agency (EPA) administrator Michael S. Regan to provide assistance on February 16, 2023."} ]
f = open("trainingdata.jsonl", "w")
for example in examples: f.write(json.dumps(example) + "\n")
file = openai.File.create(file=open("trainingdata.jsonl"), purpose='fine-tune')
From here, we instructed OpenAI to begin fine-tuning a model using DaVinci as a base model, but including the additional information about the 2023 train derailment in Ohio.
fine_tune = openai.FineTune.create(training_file=file['id'], model="davinci")
Using the “follow” console command we tracked the fine tuning’s progress, which took about 30 minutes (note that if the command fails you can run it again to continue polling the progress).
openai api fine_tunes.follow -i {fine_tune['id']}
With this step complete, we copied the model below and ran our previous prompt to see if it did any better.
result = openai.Completion.create(
model="davinci:ft-personal-2023-02-16-20-32-47",
prompt=prompt
)
print(result["choices"][0]["text"])Officials say the train derailed in Nantes Dorian, just west of
No real improvement here! Perhaps more data is needed.
Fine-tuning using more data from RSS feeds
For our second experiment, we decided to fine tune the model using recent news, then ask it about a current event. We began by installing an RSS parser, then had it download all of the recent news from several major news outlets via RSS feed, and used that to fine tune the model.
pip install rss-parser from rss_parser import Parser
from requests import getrss_urls = [
"https://rss.nytimes.com/services/xml/rss/nyt/US.xml",
"https://rss.nytimes.com/services/xml/rss/nyt/World.xml",
"https://feeds.bbci.co.uk/news/rss.xml?edition=us",
"https://rss.cnn.com/rss/cnn_world.rss",
"https://rss.cnn.com/rss/cnn_us.rss",
"https://feeds.washingtonpost.com/rss/world?itid=lk_inline_manual_36",
"https://feeds.washingtonpost.com/rss/national?itid=lk_inline_manual_32",
"https://feeds.a.dj.com/rss/RSSWorldNews.xml",
"https://feeds.a.dj.com/rss/WSJcomUSBusiness.xml",
"https://news.google.com/rss?hl=en-US&gl=US&ceid=US:en"
]for url in rss_urls:
xml = get(url)
parser = Parser(xml=xml.content)
feed = parser.parse()
for item in feed.feed:
prompts.append({"prompt": item.title, "completion": item.description})f = open("rss-trainingdata.jsonl", "w") for prompt in prompts:
f.write(json.dumps(prompt) + "\n")
This time we used a tool that OpenAI provides to clean the training data.
openai tools fine_tunes.prepare_data -f rss-trainingdata.jsonl -q
This allowed us to then train a newly fine tuned model on this much larger set of data.
file = openai.File.create(file=open("rss-trainingdata_prepared.jsonl"), purpose='fine-tune')
fine_tune = openai.FineTune.create(training_file=file['id'], model="davinci")openai api fine_tunes.follow -i {fine_tune['id']}
With that complete, we could compare before (non-fine-tuned) and after (fine-tuned) models on a question about the day’s news. Once again we asked about the recent train derailment.
prompt = "Where did the train carrying hazardous materials derail?"
result = openai.Completion.create(
model="davinci",
prompt=prompt + '\n\n###\n\n'
)
print("Before (non-finetuned) result: " + result['choices'][0]['text'])
result = openai.Completion.create(
model="davinci:ft-personal-2023-02-16-21-29-25",
prompt=prompt + '\n\n###\n\n'
)
print("After (finetuned) result: " + result['choices'][0]['text'])Before (non-finetuned) result:
Additional Information:
Sound Transit’s emergency closure of
After (finetuned) result:
Backgrounder
In the early hours of February 10, 2019
The results, unfortunately, were still gibberish.
After some additional digging, the issue turned out to be that OpenAI is using a much older version of DaVinci, that doesn’t include the instruction-following features that text-davinci-003 (or ChatGPT) include. This means that fine tuning is not a good method for solving instruction-based problems—it’s better suited to solve problems like classification and autocompletion.
To make this work, we need a different approach.
Getting customized results without fine-tuning
For this next experiment, we tried something that seems counterintuitive: we posed a question to GPT-3 and provided the answer to the question as a pre-condition.
result = openai.Completion.create(
print(result['choices'][0]['text'])prompt = "Given that The 2023 Ohio train derailment (also called the East Palestine train derailment) occurred on February 3, 2023, at 8:55 p.m. EST (UTC−5), when a Norfolk Southern freight train carrying hazardous materials derailed in East Palestine, Ohio, United States. Where did the train carrying hazardous materials derail?"
model="text-davinci-003",
prompt=prompt + '\n\n###\n\n'
)The train carrying hazardous materials derailed in East Palestine, Ohio, United States.
This approach is not a practical solution to our problem in the long run, but it did reveal a valuable new insight. The fact that it worked suggests that if we could search for content that provides the answer to the question being asked, and then pre-populate that content within the prompt, then we could use GPT-3’s instructional features to work with this new information. Fortunately, there are some great tools for coming up with creative solutions, like langchain.
pip install langchain
Next, we downloaded the same RSS feeds as before, but instead we prefilled our prompt with this data before asking the question about current events.
prompt = "Where did the train carrying hazardous materials derail?"
chain = load_qa_chain(OpenAI(temperature=0))from langchain.docstore.document import Document documents = []
for url in rss_urls:
xml = get(url)
parser = Parser(xml=xml.content)
feed = parser.parse()
for item in feed.feed:
documents.append(Document(
page_content=item.title + '. ' + item.description
))from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
chain({"input_documents":documents, "question":prompt}, return_only_outputs=True)["output_text"]InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 17073 tokens (16817 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
We see here that GPT-3 doesn’t support prompts this large. We did, after all, dump the entire contents of the previous day’s news from several major news publications into the prompt—about . Given the model’s (reasonable) limit of 4097 tokens, we may be able to make this approach work, but first, we need a more refined method of populating the data.
Using text embeddings and vector similarity searches to pre-populate a prompt
Instead of dumping all the news from a recent period into the prompt, there should be a way of searching within the recent news data, and only populating the prompt with content that’s appropriate to our query.
A typical full-text search index probably won’t work here, because it’s unlikely the exact words will appear in our content—especially since news content is mostly made of statements, and our prompt is a question. If we populate our prompt with language that’s similar to our prompt we’re likely to have the answer in the prompt.
Instead, we used some cutting-edge technology that can determine text that’s similar to other text. OpenAI recently released a Text Embeddings API which can convert words into a vector that can be compared to other vectors, allowing us to search for similar meanings, not just similar words. For example, the statement “people work” produces a vector that is similar to the vector equivalent of “humans do jobs”, even though none of the words match.
search_index = FAISS.from_documents(documents, OpenAIEmbeddings())
prompt = "Where did the train carrying hazardous materials derail?"
chain = load_qa_chain(OpenAI(temperature=0))pip install faiss-cpu from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
chain({"input_documents":search_index.similarity_search(prompt, k=4), "question":prompt}, return_only_outputs=True)["output_text"]' East Palestine, Ohio.'
It worked! By pairing the similar text search with GPT-3, we’re able to now give answers about news in the RSS feed.
Limitations, possibilities, caveats, and final thoughts
About Carter Parks
Carter Parks is a systems architect who has a knack for applying new technologies to the right problems. He brings expertise in machine learning, full stack web and mobile development, and IoT and has worked with clients in sectors ranging from eCommerce to nutrition, finance, and SaaS. Notable clients include Gatorade. When he isn’t coding, you can find him in the outdoors, probably on a long trail run, or playing the piano.