Settings

Theme

Large Scale Visual Recognition Challenge 2011 - Results

vision.stanford.edu

63 points by is74 14 years ago · 37 comments

Reader

pmelendez 14 years ago

I don't think this proves a superiority of any algorithm against other. Just that SuperVision team did a great job on task 1 and task 2. I just would add two things: 1) There is a No Free Lunch Theorem (http://en.wikipedia.org/wiki/No_free_lunch_theorem) that had been applied to pattern recognition too and that states that there is not a significative difference in performance between most pattern recognition algorithms.

2) There is way more chance to get an increment on performance depending of the choose of the features being used, and that seems to be the case here.

  • is74OP 14 years ago

    Many comments expressed concern about the alleged inappropriateness of the title. Even the no-free lunch theorem has been invoked, and words like SVM mentioned.

    However: The original title, "Neural Networks officially best at object recognition", is much more appropriate than the current title, because it is by far the hardest vision contest. It is nearly two orders of magnituder larger and harder than other contests, which is why the winner of this contest is best at object recognition. The original title is much more accurate and should be restored.

    Second, the gap between the first and the second entry is so obviously huge (25% error vs 15% error), that it cannot be bridged with simple "feature engineering". Neural networks win precisely because they look at the data, and choose the best possible features. The best human feature engineers could not come close to a relentless data-hungry algorithm.

    Third, there was mention of the no-free lunch theorem and of how one cannot tell which methods are better. That theorem says that learning is impossible on data that has no structure, which is true but irrelevant. What's relevant that on the "specific" problem of object recognition as represented by this 1-million large dataset, neural networks are the best method.

    Finally, if somebody makes SVMs deep, they will become more like neural networks and do better. Which is the point.

    This is the beginning of the neural networks revolution in computer vision.

  • pjin 14 years ago

    To nitpick at the math: "No free lunch" results are asymptotic in the sense that they necessarily hold over the _entire_ domain of whatever problem you're trying to solve. Obviously, algorithms will and do perform differently over the relatively few inputs (compared to infinity...) that they actually encounter. It's similar to undecidability: just because a problem is generally undecidable doesn't mean you can't compute it for certain subsets of input, and compute it reasonably well (for some definition of reasonable).

    • pmelendez 14 years ago

      Agreed... I was in a rush to catch the train this morning and I didn't have chance to elaborate, I shouldn't do that.

      However, my point was that most of the algorithms used on that link (ANN, SVM, etc) had similar expressive power (VC dimension) and had been proved to have similar performance between them in object recognition.

      People normally take advantage on their specific properties rather than paying too much attention how well the algorithm would perform (since either SVM and ANN are expected to perform reasonably well). I still maintain my opinion that any difference in classification performance is more likely to be related to how the team managed the data instead of the chosen algorithm.

      Deep convolutional learning is the difference here and indeed seems to be an interesting architecture which the current state of the art only support ANN. But that doesn't mean that somebody wouldn't come up with a strategy for deep learning on SVM or another classification technique in the future.

      • Dn_Ab 14 years ago

        Although SVMs and layered neural nets have similar expressivity, the similarity is very much like turing completeness. i.e. Can't tell aparts the haskells from the unlambdas. SVMs express certain functions in a manner that grows exponentially with input vs a deep learner which tends to be more compact. The key to being a deep learner is in using unsupervised learning to seed a hierarchy of learners learning ever more abstract representations.

        Also, Multilayered Kernel learners already exist.

        • pmelendez 14 years ago

          "The key to being a deep learner is in using unsupervised learning to seed ..."

          Exactly! that was my whole point which doesn't makes sense now that the title had changed.

          "Also, Multilayered Kernel learners already exist"

          I didn't know that and I'll check that shortly, thanks for the info.

      • cf 14 years ago

        That's why they include which features they used, which is educational.

  • robrenaud 14 years ago

    Isn't NFL utter crap?

    When you average an learning algorithms performance over a whole bunch of domains that _NATURE WILL NEVER GENERATE_, all algorithms are equally bad.

    Paying attention to the theorem is mostly defeatist and counter-productive.

    Imagine some ads serving company improves their learning algorithms 10% and is making 100s of millions more dollars. Are you going to say, well, there are billions of other possible universes in which they'd be losing money, they just got lucky that we don't live in those universes?

  • is74OP 14 years ago

    Actually, it does, since the difference in performance between entry #1 and entry #2 is so huge (25% error vs 15% error!), and since this is by far the hardest computer vision challenge yet!

    • pmelendez 14 years ago

      Sorry for disagree, but it seems more related to the fact that they are using deep convolutional learning rather than the neural network itself. If you use an ANN with the same set of features side by side with a SVM you will see very equivalent results.

      I will be more agree with a title like "Deep Convolutional learning overperformed traditional techniques in Object Recognition"

      • jules 14 years ago

        Yeah, if you use the same raw RGB features for the SVM as the neural net then the neural net would blow the SVMs away even more utterly.

        • pmelendez 14 years ago

          No... but I'd bet that if you use the high dimensional features resulted from the deep convolutional learning process as an input of an SVM the difference would not be that significant.

          • jules 14 years ago

            Well yeah, but then you're basically putting the meat of the NN algorithm into the SVM. I'd call the resulting algorithm a neural network with an SVM frosting. You might as well train naive bayes directly on the final nth layer of the NN instead of SVM on the (n-1)th layer, would be an almost equally weak argument for the thesis that NNs are not superior to the other algorithms on this task, since basically all the power is coming from the NN.

  • kylebrown 14 years ago

    Re #2, automatic, optimal feature selection is one of the touted advantages of neural networks. (usually with the caveat that it doesn't work so well in practice, however).

  • levesque 14 years ago

    Have to agree that this doesn't prove anything. It is only one local contest. The title is very misleading.

iandanforth 14 years ago

Hinton's team (SuperVision) uses an interesting 'dropout' technique. He gave a Google Tech Talk on this back in June.

http://www.youtube.com/watch?v=DleXA5ADG78&feature=plcp

And an older talk that covers some of what a deep convolutional net is:

http://www.youtube.com/watch?v=VdIURAu1-aU

aroberge 14 years ago

Sensational title that misrepresent the results of a competition with limited (albeit high quality) participants. There is limited information of general value in this link.

sumodds 14 years ago

Am not sure if you can apply winner takes all for such marginal difference in error. Give a slightly different database and things go awry.

Check out : "Unbiased Look at Dataset Bias", A. Torralba, A. Efros,CVPR 2011.

  • jules 14 years ago

    The difference in error between the first and the rest is ENORMOUS.

    Task 1:

        1st 0.15315 (convolutional neural net)
        2nd 0.26172
        3rd 0.26979
        4th 0.27058
        5th 0.29576
        [...]
    
    Differences:

        0.10857
        0.00807
        0.00079
        0.02518
    
    As you can see the first is way ahead of the rest. The difference between the 1st and 2nd is ~11%, between the second and third ~1%.

    Task 2:

        1st 0.335463 (convolutional neural net)
        2nd 0.500342
        3rd 0.536474
    
    Idem dito.

    But the most exciting thing is that the results were obtained with a relatively general purpose learning algorithm. No extraction of SIFT features, no "hough circle transform to find eyes and noses".

    The points of the paper you cite are important concerns, but this result is still very exciting.

    • modeless 14 years ago

      the results were obtained with a relatively general purpose learning algorithm. No extraction of SIFT features, no "hough circle transform to find eyes and noses".

      This deserves even more emphasis. All of the other teams were writing tons of domain specific code to implement fancy feature detectors that are the results of years of in-depth research and the subject of many PhDs. The machine learning only comes into play after the manually-coded feature detectors have preprocessed the data.

      Meanwhile, the SuperVision team fed raw RGB pixel data directly into their machine learning system and got a much better result.

    • sumodds 14 years ago

      Lol.. my bad. I did not pay attention. I thought the error was in percentages. (I was comparing with MNIST and somehow assumed this too was percentages). Come to think of it, that is really dumb (what that would mean) !!

  • kylebrown 14 years ago

    Thanks for the reference. It goes well with "Machine Learning that Matters", a paper cited by Terran Lane in his recent blog post "On leaving Academia".

  • Evbn 14 years ago

    I worry you may have taken a biased look at "Unbiased Look at Dataset Bias".

freyr 14 years ago

Neural Networks officially best at object recognition in this particular competition of seven teams, on two of the three tasks.

Not to take away from the accomplishment of the SuperVision team, but claim in the title seems somewhat sensationalist. Is this competition like the world cup of object recognition or something?

gobengo 14 years ago

I found the title of this post really ironic.

"There is now clearly an objective answer to which inductive algorithm to use"

pmelendez 14 years ago

Just to add sense for newcomers, the original title of the thread was "Neural Networks officially best at object recognition" and most of the posts in here debated that the title was not appropriate for the link.

fchollet 14 years ago

Congrats to the awesome folks at ISI for scoring 1st at task 3 and 2nd at task 1! Keep rocking my world.

xenonite 14 years ago

why isn't there any solution of task 3 from team SuperVision with their Neural Nets?

anjc 14 years ago

*this implementation of a neural network designed for object recognition for this particular challenge

utopkara 14 years ago

So, this is what HN posts have come to? The level of tabloid science news coverage.

Evbn 14 years ago

The title has changes at least twice, confusing discussion. Can we have a title history on HN posts? Mutable state stinks.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection