Watson DeepQA - NFHN Reader

researchers, the open-domain QA problem isattractive as it is one of the most challenging in therealm of computer science and artiﬁcial intelli-gence, requiring a synthesis of informationretrieval, natural language processing, knowledgerepresentation and reasoning, machine learning,and computer-human interfaces. It has had a longhistory (Simmons 1970) and saw rapid advance-ment spurred by system building, experimenta-tion, and government funding in the past decade(Maybury 2004, Strzalkowski and Harabagiu 2006). With QA in mind, we settled on a challenge tobuild a computer system, called Watson,

whichcould compete at the human champion level inreal time on the American TV quiz show,

Jeopardy

.The extent of the challenge includes ﬁelding a real-time automatic contestant on the show, not mere-ly a laboratory exercise.

Jeopardy!

is a well-known TV quiz show that hasbeen airing on television in the United States formore than 25 years (see the

Jeopardy!

Quiz Showsidebar for more information on the show). It pitsthree human contestants against one another in acompetition that requires answering rich naturallanguage questions over a very broad domain of topics, with penalties for wrong answers. The natureof the three-person competition is such that conﬁ-dence, precision, and answering speed are of criticalimportance, with roughly 3 seconds to answer eachquestion. A computer system that could compete athuman champion levels at this game would need toproduce exact answers to often complex naturallanguage questions with high precision and speedand have a reliable conﬁdence in its answers, suchthat it could answer roughly 70 percent of the ques-tions asked with greater than 80 percent precisionin 3 seconds or less.Finally, the

Jeopardy

Challenge represents aunique and compelling AI question similar to theone underlying DeepBlue (Hsu 2002)

—

can a com-puter system be designed to compete against thebest humans at a task thought to require high lev-els of human intelligence, and if so, what kind of technology, algorithms, and engineering isrequired? While we believe the

Jeopardy

Challengeis an extraordinarily demanding task that willgreatly advance the ﬁeld, we appreciate that thischallenge alone does not address all aspects of QAand does not by any means close the book on theQA challenge the way that Deep Blue may have forplaying chess.

The

Jeopardy

Challenge

Meeting the

Jeopardy

Challenge requires advancingand incorporating a variety of QA technologiesincluding parsing, question classiﬁcation, questiondecomposition, automatic source acquisition andevaluation, entity and relation detection, logicalform generation, and knowledge representationand reasoning.Winning at

Jeopardy

requires accurately comput-ing conﬁdence in your answers. The questions andcontent are ambiguous and noisy and none of theindividual algorithms are perfect. Therefore, eachcomponent must produce a conﬁdence in its out-put, and individual component conﬁdences mustbe combined to compute the overall conﬁdence of the ﬁnal answer. The ﬁnal conﬁdence is used todetermine whether the computer system shouldrisk choosing to answer at all. In

Jeopardy

parlance,this conﬁdence is used to determine whether thecomputer will “ring in” or “buzz in” for a question.The conﬁdence must be computed during the timethe question is read and before the opportunity tobuzz in. This is roughly between 1 and 6 secondswith an average around 3 seconds.Conﬁdence estimation was very critical to shap-ing our overall approach in DeepQA. There is noexpectation that any component in the systemdoes a perfect job

—

all components post featuresof the computation and associated conﬁdences,and we use a hierarchical machine-learningmethod to combine all these features and decidewhether or not there is enough conﬁdence in theﬁnal answer to attempt to buzz in and risk gettingthe question wrong.In this section we elaborate on the variousaspects of the

Jeopardy

Challenge.

The Categories

A 30-clue

Jeopardy

board is organized into sixcolumns. Each column contains ﬁve clues and isassociated with a category. Categories range frombroad subject headings like “history,” “science,” or“politics” to less informative puns like “tutumuch,” in which the clues are about ballet, to actu-al parts of the clue, like “who appointed me to theSupreme Court?” where the clue is the name of ajudge, to “anything goes” categories like “pot-pourri.” Clearly some categories are essential tounderstanding the clue, some are helpful but notnecessary, and some may be useless, if not mis-leading, for a computer.A recurring theme in our approach is the require-ment to try many alternate hypotheses in varyingcontexts to see which produces the most conﬁdentanswers given a broad range of loosely coupled scor-ing algorithms. Leveraging category information isanother clear area requiring this approach.

The Questions

There are a wide variety of ways one can attempt tocharacterize the

Jeopardy

clues. For example, bytopic, by difﬁculty, by grammatical construction,by answer type, and so on. A type of classiﬁcationthat turned out to be useful for us was based on theprimary method deployed to solve the clue. The

Articles

60AI MAGAZINE