Settings

Theme

I added context data to the TruthfulQA dataset

huggingface.co

1 points by roh26it a year ago · 1 comment

Reader

roh26itOP a year ago

Being one of the most downloaded datasets on Huggingface, I was a little bit surprised by how dirty this dataset was. Plus it had very limited information and some incorrect classifications as well.

For an internal experiment on building a "Truthful Evaluator", we picked up this dataset and tried fine-tuning a model on these 8000 odd examples.

Realised that it needed: 1. Cleaning up 2. Some reclassification

But, most importantly - it lacked context data. It only had a link pointing to the source which was also absent for a few rows.

We scraped the internet for the link in the dataset, matched it to the question and narrowed down on a small context to be added to the main dataset.

Releasing it publicly so that someone else may avoid the 2-3 days of pain of wrangling with this data.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection