Vladimir Vapnik Joins Facebook Research
facebook.comAm I the only one assuming that this many excellent scientists moving from academia is a loss for science in general? Will they really publish research in the same way they did before?
(Make no mistake, I can fully understand them, professors paid 80k per year, lacking resources, fighting bureaucrats, it is a great thing that they are recognised and at last paid what they deserve for devoting their lives to science.)
While I am curious too about your question, Vapnik was previously working in industry at NEC Labs in New Jersey.
Good point. But I do wonder about Facebook's journal publishing policy.
- Highly doubt it's purely a money decision
- Vapnik is joining a number of people he previously worked with
- Getting huge computational resources and seeing your ideas applied to real data is rewarding
This is another great example of the unreasonable effectiveness of data. LeCunn, Hinton, Ng, Vapnik were all recruited on the basic fact that there is simply no way to do cutting edge research today without access to the data and computing resources of Google/Facebook/Yahoo/Baidu.
Edit: "No way" is inaccurate. I should have said it is much easier to do at these companies. Also it is inaccurate to imply this is the only reason these great minds have joined these companies.
> LeCunn, Hinton, Ng, Vapnik were all recruited on the basic fact that there is simply no way to do cutting edge research today without access to the data and computing resources of Google/Facebook/Yahoo/Baidu.
I don't see many details here, are you sure that's the case?
There are other reasons a giant of the field might decide to work at Facebook. They might give him more freedom than his previous employer. Perhaps friends of his already work at Facebook. The location and compensation may also play into it.
I don't want to be skeptical for no reason, but you're championing a popular narrative which I don't see direct support for in this instance.
There is great, big data driven research coming out of Stanford using Common Crawl. For example, see http://www-nlp.stanford.edu/projects/glove/ . They successfully train an 840 billion token corpus.
Vapnik is a big theory guy. Though I am not sure he has done anything of big practical importance recently, his immense contribution to ML (the SVM) was done at a time when machines were many orders of magnitudes weaker than they are now.
"In writing this book I had one more goal in mind: I wanted to stress the practical power of abstract reasoning. The point is that during the last few years at different computer science conferences, I heard reiteration of the following claim:
"One of the goals of this book is to show that, at least in the problems of statistical inference, this is not true. I would like to demonstrate that in this area of science a good old principle is valid:Complex theories do not work, simple algorithms do.
-- From Vapnik's preface to The Nature of Statistical Learning TheoryNothing is more practical than a good theory.*
Vapnik is not well-described as a "theory guy". That implies that he's not interested in connections between theory and practice, and this is most profoundly not the case. He has arguably been the most successful ML researcher ever as far as connecting abstract theory to real-world outcomes.
Besides the SVM: the VC dimension started out as a lemma regarding set counting, and he pushed it to the surprising (even shocking) conclusion of universal consistency for very general classes of estimators.
I guess it depends on what kind of semantics you apply to "theory guy". In my mind it's not at all dismissive.
I mean it in it a foundation sense, rather than an applications sense. He has done great work with a whiteboard and pure thought, without the need for terabytes of data and thousands of machines.
Remember, though, that the AT&T group Vladimir came from, and that informed his work, was much in the mold of linking theory and practice. Where "practice" (at that time) was working on the handwritten digit problem -- the now-cliche NIST dataset.
There is great, big data driven research coming out of Stanford using Common Crawl. For example, see http://www-nlp.stanford.edu/projects/glove/ . They successfully train an 840 billion token corpus.
I haven't seen this paper before (thanks!!). How different is it to Word2Vec?
Clearly the pre-trained vectors at that scale (and much bigger than the ones released with Word2Vec) are new and very exciting.
The paper compares in detail against word2vec, but (spoiler alert) GloVe using 42 billion tokens from Common Crawl beats word2vec using 100 billion tokens from the Google News corpus!
They don't actually use the 840 billion token model in the paper as it was made with some parameters that didn't allow for direct comparison, but the code and the models are all released for anyone to use from their site.
This is one of many great examples of open datasets like Common Crawl allowing talented people from academia and start-ups to compete with the large proprietary datasets of Google or Bing.
(disclaimer: data scientist at Common Crawl who does the crawling)
The GloVe vs word2vec is not as clear-cut as that - see https://plus.google.com/114479713299850783539/posts/BYvhAbgG..., https://docs.google.com/document/d/1ydIujJ7ETSZ688RGfU5IMJJs... for more discussion.
Good link, thanks for pointing it out. Re: not clear cut, that's always the case to varying degrees :) To quote the author's in document response to the Google Doc you just linked to:
"Update by Richard Socher (Nov 2014): This document is outdated and its concerns have been addressed in the final version of the GloVe paper. Glove gets better performance on the same training data when actually run to convergence. See last section of Glove paper for details."
This is a good example of peer review in academia beyond just the paper review committee -- other researchers point out concerns or issues with methodology and they're addressed by the authors or other contributors. It's also great that the initial concerns could be properly tested thanks to the open source nature of both projects.
I will admit I didn't discuss the intricacies of the evaluation in my few paragraphs above, I was primarily speaking to the broader point that open data is helping academia compete with the goliaths of industrial research! =]
Interesting.
As I said in my other comment, one of the strengths of Word2Vec is how robust it is against various metrics.
While it looks like GloVe's advantages over Word2Vec may be not as much as initially claimed, it is mostly as robust (which is good). However, the jump in Word+Context over just Word vectors when evaluated on semantic relations is interesting.
(To be clear: I'm very interested be being able to use the same system over diverse datasets, without having to tune it differently for each system - hence my interest in the robustness of the methodologies)
Edit: Were you and Smerity at Sydney Uni at the same time?
The paper compares in detail against word2vec, but (spoiler alert) GloVe using 42 billion tokens from Common Crawl beats word2vec using 100 billion tokens from the Google News corpus!
Damn!!
Background for those who don't follow this field: Word2Vec is an apparently miraculous demonstration and poster-child of the unreasonable effectiveness of big data. Beating it at all is impressive, assuming the performance is as robust as Word2Vec is against different metrics.
Beating it with only 42% of the tokens is wondrous.
Vapnik is philosophically not big data. SVMs are data efficient, at the cost of O(n^3) partitioning algorithms. His work has been more about maximizing the utility of the data you have.
I think the real reason is because Facebook plans to recreate the Bell labs style of industry research, where researchers have license to do whatever they want.
That is not accurate at all. These people can and did do lots of research in academia and had plenty of data. They obviously get paid a lot more in industry, however.
Sources? I've seen first hand the limits of the 'data' we had at school. Papers constantly citing in their experiments: "the largest real world graphs we are aware of are on the order of 1B vertices" (a twitter graph, from something like 2011. The other highly cited one is the live-journal graph).
There's a massive dearth of data in academia. This is also why you see people like Kleinberg working directly with facebook on network research.
I'm probably not the only VR nut who confused the person in the title with Vladimir Vukićević, the Director of Engineering of Mozilla who has done worked on some Oculus-centric web vr stuff for Mozilla.
http://blog.bitops.com/blog/2014/06/26/first-steps-for-vr-on...
A few weeks ago an article on Nautil.us about innovations in machine learning. Vladimir Vapnik was mentioned, specifically how he used poetry to teach a machine handwriting. Very fascinating article in general:
Never heard of the guy. Who?
"The original SVM algorithm was invented by Vladimir N. Vapnik" http://en.wikipedia.org/wiki/Support_vector_machine