Exclusive: ‘Iterate through’: Why The Washington Post launched an error-ridden AI product

2 min read Original article ↗

The Washington Post began putting out AI-generated podcasts even after internal tests found that the AI tech introduced errors and bias into the publication’s reporting.

More than two-thirds of scripts generated by the feature, dubbed “Your Personal Podcast,” failed a metric intended to determine whether they met the publication’s standards, according to a readout of the tests shared with Semafor.

“Testers were asked to rate the quality of scripts on a pass/fail basis (news used the categorization of publishable vs. not) in order to give us the most comprehensive list of issues to examine,” the company said in its internal review. The review added that when in doubt, testers told to fail scripts “as a precaution.” In three rounds of testing, between 68% and 84% of scripts failed.

Four Washington Post staff also described mistakes in personalized podcasts ranging from minor pronunciation issues to misattributed or fabricated quotes, as Semafor reported Thursday. The tool also sometimes inserts commentary, they said — for instance, by interpreting a source’s quotes as the paper’s position on an issue.

The podcast tool’s prognosis was poor, the review concluded: “Further small prompt changes are unlikely to meaningfully improve outcomes without introducing more risk.”

Still, the company’s product review team recommended moving forward with the release, saying it would continue to “iterate through the remaining issues” with the newsroom and would label the tool as a work-in-progress that could generate errors.

The Post’s chief technology officer, Vineet Khosla, and head of product, Bailey Kattleman, announced the tool on Wednesday, saying in a note to staff that the podcasts are the “ultimate intersection across our critical initiatives of premium experiences, customer choice and AI platform and products.”