Garry Kasparov on IBM's Watson
theatlantic.comWorse, by definition they do not understand what they do not understand and so cannot avoid them
Kasparov didn't seem to see what I did. Watson seemed very consistent in knowing what it did not know. There was maybe two questions I recall where it actually got the question wrong with 50%+ certainty. I believe it answered, "leg" when it should have been "mising a leg". The other it answered the 20s when the answer was the 10s. And I think for neither of those the percentage was much beyond 50%.
Also Kasparov seems to miss that Watson in medicine would be used with humans. I doubt a doctor will say, "Watson says to cut off his left leg -- I would just given him aspirin for the headache, oh well. Hopefully cutting off this leg makes his head feel better."
What Watson hopefully will do is help diagnosis. Especially tricky ones.
There's a great story in a book I read, I wish I could recall the name, but it begins with a lady who has some stomach issue that she has for like 20 years. Everyone thinks its in her head. She finally happens upon a doctor who happens to have seen something like this before, she gets diagnosed and healed. But she had to live with it for like 20 years after seeing doctor after doctor. Watson would be able to greatly help situations like this, I hope.
UPDATE: The book is "How Doctor's Think". Here's an excerpt that talks about this case, http://harvardmedicine.hms.harvard.edu/bulletin/winter2007/7... -- just in case anyone cares. :-)
Totally agree. It's shocking how many doctors seem to be lacking basic knowledge about things like drug sideeffects or lesser known symptoms about common diseases. You obviously still need both specialists and general practitioners but I think it's time to revisit the idea of a medical expert system .
Here are the reasons I was disappointed in Watson's showing (despite handily beating the human competitors). The most obvious was that Watson' auto-clicker was a big advantage over human thumbs, so that Watson got 100% of the points for clues to which all competitors knew the answer (if you asked Watson and the two humans "what's five plus five", Watson would win, but that's not necessarily proof of any sort of computer superiority).
The second reason is that IBM was representing Watson as something of a big push in knowledge representation (I just watched a video where they talk about Watson's "informed judgments" about complicated questions for instance). It looks instead like Watson just has an improved ability to disambiguate words relative to previous systems and to do quick lookups that match those words with nearby key terms.
For example, on the clue "Rembrandt's biblical scene 'Storm on the Sea of' this was stolen from a Boston museum in 1990", Watson correctly answered "Galilee". But its next two answers were "Gardner Museum" and "Art theft"; no one who "understood" the question in any conventional sense would even consider these as answers because they don't make any sense. Clearly, Watson looked for instances of "Rembrandt", "Storm on the sea of", "stolen", or other phrases from the clue in its text corpus, and found that "Galilee", "Gardner Museum", and "art theft" all frequently occurred when together (because the painting was stolen from the Gardner museum in an instance of art theft), and relatively rarely when not together. "Galilee" probably won out of these three because Watson is tuned to Jeopardy clue styles (whenever there is a quoted phrase in a clue followed by the word 'this', it's always asking for the answer that completes the phrase).
Similarly, Watson was far less confident on the clue "You just need a nap!" You don't have this sleep disorder that can make sufferers nod off while standing up." It still got the right answer of "Narcolepsy", but with a relatively low confidence of 64%. "Insomnia" had a confidence of 32% despite clearly being the opposite sort of sleep disorder, and "deprivation" appeared at 13%, despite not being a sleep disorder. Here Watson gets confused because the only term of the clue that appears more frequently with "narcolepsy" than "insomnia" is "standing up"; my guess is that if "standing up" had been replaced by some oddly phrased, uncommonly occurring synonym, Watson wouldn't have been able to come up with an answer, despite the clue conveying exactly the same information.
This kind of cleverness is certainly impressive, but it seems like it's an advance in tuning existing techniques to the format of Jeopardy, not an advance that will spark other successful projects down the line. IBM's goal of giving us "the computer from Star Trek" doesn't seem any closer; I don't see any evidence that Watson could have answered a question that required more thought or understanding than a simple text search. If there was the question "how many kings ruled England in between Henry the Fourth and Henry the Eigth" (8), then Ken and Brad would have been able to answer relatively easily, while my guess is that Watson would be stumped.
For example, on the clue "Rembrandt's biblical scene 'Storm on the Sea of' this was stolen from a Boston museum in 1990", Watson correctly answered "Galilee". But its next two answers were "Gardner Museum" and "Art theft"; no one who "understood" the question in any conventional sense would even consider these as answers because they don't make any sense.
But I think your peek into Watson's inner mind may give you a more insight than you have about the human mind.
I'm reminded of a story about how a girl told me she was good at froggy when it came to basketball. I was like, "What's froggy" and she said "when you get the ball after someone shoots it". I said, "I think its called a rebound". And she said, "that's the word, rebound... but froggy and rebound, they remind me of each other"
And your narcolepsy v insomnia example is a mistake I think a lot of humans make. Like if you ask me which way to turn a lightbulb to remove it, my brain will have both clockwise and counter-clockwise as responses. And clockwise is probably 80%, but counter clockwise is probably at 20% -- I have been known to accidentally tighten a bolt, rather than loosen it.
I can't count the number of times my father, when using my name, starts out my brother's name and corrects himself midway through. I've done similar things with pairs of friends I met at the same time and know in only one domain.
Not quite clear on why people keep pointing to the 'Toronto' question as proof that Watson is fundamentally flawed in some irreconcilable way.
Everyone seems to forget that Watson would never have answered Toronto if it didn’t have to. It wasn’t at all confident, you can’t even really say that it made a mistake. It just didn’t know the answer and was – correctly so – very sure that it didn’t know the answer.
Excellent point. But that "Toronto" was its highest guess makes you wonder if it could confidently select something that exploits whatever errors led to Toronto. And if you asked it a medical question, would it say "I don't know?" if it wasn't confident or would it give an answer anyway (along with a confidence level that might be ignored or rationalized away)?
That's not at all what Kasparov said:
If IBM wants to fix the "Toronto" problem, have at it. But those sorts of "embarrassing" errors could be quite costly in medical situations. During the show they showed Watson's progression from really stupid answers very frequently to less frequently, which makes me personally believe their fundamental process is flawed (not necessarily irreconcilable) and their current algorithms are just a bunch of hacks thrown together on top of Google rather than something more sophisticated like Wolfram Alpha.My concern about its utility, and I read they would like it to answer medical questions, is that Watson's performance reminded me of chess computers. They play fantastically well in maybe 90% of positions, but there is a selection of positions they do not understand at all. Worse, by definition they do not understand what they do not understand and so cannot avoid them. A strong human Jeopardy! player, or a human doctor, may get the answer wrong, but he is unlikely to make a huge blunder or category error-- at least not without being aware of his own doubts. We are also good at judging our own level of certainty. A computer can simulate this by an artificial confidence measurement, but I would not like to be the patient who discovers the medical equivalent of answering "Toronto" in the "US Cities" category, as Watson did. I would not like to downplay the Watson team's achievement, because clearly they did something most did not yet believe possible. And IBM can be lauded for these experiments. I would only like to wait and see if there is anything for Watson beyond Jeopardy!.Watson was not confident in that answer - only 30% (http://asmarterplanet.com/blog/2011/02/watson-on-jeopardy-da...). Had that been a normal question, it wouldn't have buzzed in. It only answered because Final Jeopardy is the only time when not answering and answering incorrectly have the same penalty.
> but I would not like to be the patient who discovers the medical equivalent of answering "Toronto" in the "US Cities" category, as Watson did.
Surprise, that kind of mistake happens far too frequently in the medical field now.
Why is Kasparov commenting on something so far out of his recognized area of expertise relevant anyway? I don't go to Knuth for advice on chess, nor Hawking for snarky banter on economics, etc. (Although if I had access to either of those 2, I might try it.)
Would Watson make that mistake happen more or less often? (Bringing in Watson can lead to blindly trusting or blindly ignoring the "stupid computer" depending on the doctor; seems like a problem with doctors rather than a lack of tools?)
> Why is Kasparov commenting on something so far out of his recognized area of expertise relevant anyway?
Isn't the asking obvious? (I won't comment on the relevance; people do and read many irrelevant things every day.) People asked for his thoughts 'cause he got beat by IBM's Deep Blue and he's had a lot of experience with computers in their relationship with chess (specifically combining humans and computers to make really strong opponents). People also asked for Ken Jennings' thoughts and AI isn't his expertise. And people recently asked Hawking for his thoughts on aliens...
Part of 'the "Toronto" problem' is simply the format of the challenge. In Jeopardy, Watson can give at most one response, in that particular situation exactly one. In a medical diagnosis situation, Watson's responses wouldn't be so constrained. He could give a list of 20 possibilities, with confidence margins for each one, and even as far as a list of possible additional tests designed to favor one possible diagnosis over another. This sort of information, utilized by a competent doctor, has a far, far smaller potential for disaster than the "what is a lobotomy?" scenarios that people are scared of.
My point is that the "Toronto" error is constantly cited as proof that Watson is fundamentally flawed, when it's actually a fairly reasonable bug if you understand the process it goes through to reach the answer- It's just seen as a stupid answer because it misses a key filter that humans would pick up.
In the medical case, it's actually better for the answer to be obviously, embarrassingly wrong than slightly wrong. Like the other commenter said, people aren't going to be getting amputations for headaches just because Watson says so. There's much more danger in something like prescribing medications with a fatal interaction, something that a hypothetical "Dr. Watson" would pick up.
there is a selection of positions they do not understand at all. Worse, by definition they do not understand what they do not understand and so cannot avoid them.
This is almost certainly true for humans too in terms of general problems rather than specifically chess. There are probably concepts which we have so little understanding and comprehension of that we can't even see our own ignorance. Rumsfeld's known unknowns.
Yeah.. for all those people saying it's not impressive, I'm forced to wonder why they didn't just build it, then.
EDIT: Wow this comment is way more controversial than I thought when writing it. Down to -1, up to 2, back to 0.
Anyone who finds it so objectionable as to downrate, please explain why that's so? Discussion > downvoting.
I didn't up or down vote you, but meta-edits about downvotes seem to either get downvoted to oblivion because of perceived whining or upvoted a lot out of a perceived injustice to the downvote. Also I think your comment is rather condescending. "You think Windows sucks? How about you build something better?"
Only added after I watched it go up and down twice. So it's not just the meta-whining, apparently several people found it highly offensive on content alone, and I was mystified as to why.
Thanks for the outside input regarding the Windows comparison, I don't think that's quite the same as this, though. The people talking down Watson aren't saying it sucks as much as they're saying it's trivial. Vista sucked but nobody would have called it trivial or inconsequential.
The problem is that the whole event was orchestrated to showcase IBM. Jeopardy didn't offer an open call. There's been no series of open competitions in Jeopardy-style trivia, as there was with gradually-improving chess computers.
Instead, IBM wanted a forum to show off its multi-million-dollar QA technology, and approached Jeopardy. (They may have also, though I haven't seen definitive information either way, offered Jeopardy promotional payments.) IBM then spent 3+ years optimizing for the Jeopardy domain. (In the Reddit QA, the Watson team answered: "At this point, all Watson can do is play Jeopardy and provide responses in the Jeopardy format.")
And in the matches, Watson dominated on one dimension of Jeopardy play – quickly pressing a button after a light goes off – that's the least interesting technical challenge. (Yes, it's an important part of any champion's skills, but a machine would have won that button-pressing competition 50 years ago, so it obscures rather than highlights any other 'breakthroughs' Watson may represent.)
While impressive in several dimensions, and drawn from much deeper research by IBM, the only thing we can say for sure about Watson is that it was a "Horse for the Course" in Jeopardy. And unfortunately, no other computer horses were invited to play, and offered the same prizes (in money and fame).
I suspect, now that the pattern has been set, we'll see leaner teams showing they can do as well or better than Watson with far less funding/hardware, over the next few years. Still, in the popular imagination, these efforts will live in the shadow of Watson, when a fair competitive process might have given them a chance to upstage Watson.
Quickly pressing a button after a light goes off is pretty unimpressive. Figuring out the answer to the question and measuring your confidence in order to decide whether to press the button is impressive.
Agree on your suspicion. Simply quartering the cost of memory and copying the approach from the paper with some home-grown improvements will get people ahead of IBM and probably inside IBM's decision loop so they're permanently ahead. But plowing something the first time is often the hardest. These weren't dumb people working on this thing for 3+ years.
I actually told my wife (who knew the correct answer - Chicago) half jokingly that perhaps it was the sportsmanship built into Watson to throw away an easy answer once in a while, because he was on a hot streak up to that point.
Kasparov is spot on in that Watson's DeepQA has yet to prove itself in a meaningful way. If it proves itself as an effective medical advisor, that will be far more impressive than the Jeopardy win (as impressive as that was in itself).
I think everyone was disappointed in the applicability of the Deep Blue accomplishment in other fields. Were any of the special purpose ASICs used to defeat Kasparov used in any other application? As far as I know a significant part of the Deep Blue development team left IBM relatively soon after the accomplishment.
As noted in Paul Hoffman's recent book "King's Gambit", after his matches with both "Deep Blue" and "Deep Junior" Kasparov was exhausted:
"As with Deep Blue, he had once again let an encounter with a machine play games with his head. He had been obsessed with the idea that Deep Junior would never tire. 'The machine is never distracted by an argument with its mother," he told me, 'or a lack of sleep.'
And in the linked piece Kasparov alludes to the reported next approach IBM wants to take with Watson - support in medicine.
Kasparov's human reaction to his encounters with Watson's distant cousins brings up one obvious benefit in the use of technology like Watson for supporting medical decision-making - simply that such software will be less likely to miss something. Software is less likely to miss considering a diagnosis, ordering a crucial test, or following up on a finding - unlike the fallible 'I' who may have skipped a class in med school, or was up all night on call and just can't think straight, or am just occasionally more stupid than usual.
Diagnosis is the first thing people think of with technology like this, but in my opinion that's not the big problem Watson should tackle. Medical diagnosis in and of itself (dramatizations like the TV show 'House' notwithstanding), is not really that difficult 99% of the time. When you hear hoofbeats, you're very likely going to find horses and not zebras. A future Dr. Watson might occasionally be very helpful in pointing out very obscure (but uncommon) diagnoses. However, in my opinion the most helpful thing a Dr. Watson could provide is collecting, evaluating, and comparing evidence and outcomes as they are developed globally and locally (ie across broad swaths of medicine, but also within a single physician's own patient population), continuously educating the physician, and monitoring cases.
There is plenty of untapped medical data/evidence out there, but it's almost all hidden away in plain sight...text/natural language. I have to agree with Kasparov here, in that the primary advancement Watson represents was in moving farther down the path from syntax to semantics.