How New Are Yann LeCun's “New” Ideas?
garymarcus.substack.comNone of Gary’s comments were original either. I don’t know what I’d call this, but I’ve seen similar behavior elsewhere. This weird “flag planting” behavior to try to get credit without doing any actual work, as well as disregarding all prior work. Normally the “predictions” are vague or could be applied to anything. It seems borderline like a mental illness of some sort, but I’m not a mental health professional.
I think Marcus’s problem here is less with not being given credit as it is with how LeCun has suddenly shifted to similar opinions without any attempt at reconciling how, until very recently, he openly denigrated Marcus and his ideas.
Marcus has a strong case there, but he could do a better job focusing on that issue…
So what now (or, has anything changed since the last time this came up)?
Marcus seems to be in the right (though his seething saltiness seems to me to dilute his message), and LeCun has done nothing so far but double down on dickishness.
LeCun probably should issue a retraction and an apology, before this really bites him in the arse.
There is a guy at work who paints sometimes presumptuous "# TODO: ..." comments all over the code base without ever actually doing anything about the issues, or discussing with anyone. Similar phenomena. Haha.
The problem is that ideas are the currency of academia, so you have to plant your flags and defend them like someone might defend a trademark.
Luckily in the startup world, ideas are worthless and no one cares if you thought of it first but can't execute.
More on academia vs startups: https://twitter.com/mizzao/status/1505529295157948421
A lot of academic debates come down to flag-planting. But the muddiness of them involves an ambiguity of "Is X flag-planting or is X complaining rightfully about Y flag-planting?".
And your comment would be reasonable if you hadn't jumped to the "mental illness" comment, that's a bad way to do discussion.
Weighing in on Chomsky-related topics can be a minefield as well. (ref. drfun) https://pbs.twimg.com/media/BZ2wMx2CYAAVhOu?format=jpg&name=...
Wow, Gary Marcus just Schmidhubered Yann LeCun.
The ironic thing of course is that Yann has not been at the forefront of AI for many many years (and Gary, of course, never has). Facebook's research has failed to rival Google Brain, DeepMind, OpenAI, and groups at top universities.
So to the extent that Yann is copying Gary's opinions, it's because they both converge at a point far behind the leaders in the field. Yann should be much more concerned than Gary about that.
So to the extent that Yann is copying Gary's opinions, it's because they both converge at a point far behind the leaders in the field.
Behind? Why do you say so? If anything, they may both (now) be a bit ahead of the curve. AFAICT, while the idea of neuro-symbolic integration is pretty old (Ron Sun, among others, was talking about it ages ago), the idea is still far from widely pursued by the mainstream thread of AI research.
In either case, it's interesting to finally start to see more weight accumulating behind this particular arrow. But I've long been on record as advocating that neuro-symbolic integration is a critical area of research for AI, so I'm a bit biased.
Also having "an idea" expressed as one or two sentences is something different than implementing, trying out and writing paper about "an idea".
Schmidhuber Schmidhubered LeCun himself :) https://openreview.net/forum?id=BZ5a1r-kVsf¬eId=GsxarV_Jy...
Sorry, could you explain Schmidhubering as a verb? I know who Schmidhuber is, but not familiar enough to understand this. Is it that Schmidhuber makes claims that LeCun's and others' ideas are derivative of his own?
Schmidhuber is a prolific flag-planter who is notorious for publicly raising a stink when he deems he should've been cited, but wasn't. It's happened enough that it's now a meme in the ML community.
It's important to mention that Schmidhuber is usually correct, in that his lab has been decades ahead in both theory and proofs-of-concept. The reason his lab is so under-cited is that his lab made these advances before the hardware to practically do it was available. Now that the techniques can be run, it's the people running them that tend to get all the credit for being "first".
> mention that Schmidhuber is usually correct
You present this statement as fact when it is still highly debated. A lot of researchers will claim a popular new approach is a reformulation of experiments they did X years ago. It's usually best to see it as a spectrum where some idea are on the same axis, where one end is "totally different" and the other is "renamed approached".
Not taking any sides one way or the other regarding whatever debate exists between Yann and Gary. But for what it's worth, I'd just like to point out that this overall notion of "neural symbolic" integration is fairly old by this point in time. It's gone a little bit in and out of vogue (sort of like neural networks in general, but not to the same degree) over the years. Outside of Gary, the other "big name" I'd cite who has spoken about this topic is Ron Sun. See:
* https://books.google.com/books?id=n7_DgtoQYlAC&dq=Connection...
* https://link.springer.com/book/10.1007/10719871
* https://www.amazon.com/Integrating-Connectionism-Robust-Comm...
Gary Marcus' contribution to the field is to post the same rant about how it's not real intelligence, every 6 months. Why does he keep getting up voted?
Because it's not real intelligence and because lots of money and expertise are thrown at something that's not going to get us closer to AGI.
As a neo-luddite myself I'm personally fine with that (as I'm personally fine with us throwing money away at CERN), but there are people who still think that AGI is possible and who also think that reaching AGI is a worthy goal, so those people might not be ok with chasing windmills.
No actually he links to old research which is becoming relevant now. I mean the main beef in this article is that LeCun used to dismiss certain ideas and now he's presenting them as his own ideas. So there is value in revisiting old theoretical research and turning it into practical applications now that we finally have enough GPU power to do so.
He frequents HN so it's not totally out of the realm of possibility that he boosts his own posts.
Old enough to remember when Marcus was picking out-of-scope fights with parallel distributed processing models and scholars. On the one hand, he’s right, symbol manipulation is different in kind, not degree. On the other, we’ve known that since the dawn of neural networks. To claim credit for theoretical gaps that others try to fill in practice seems petty and myopic.
I expected this to be a smear / petty argument article. In fact, it's a concise, highly specific, quote by quote critique.
I don't have enough context to take a side, but this is not just a rant.
Beyond their interpersonal disagreements, I do wonder if LeCunn is seeing diminishing marginal returns to deep learning at FB...
The points are indeed very specific, but they are about opinions, mostly not-even-wrong statements, just reasonable unquantifiables. The elephant in the room is the use of the word "deep" in the field IMHO: it means something else than "many layered neural network" in common parlance...
What does it mean? Techniques that avoid the vanishing gradient problem?
Diminishing returns? Have you read the Gato, Palm, Stable Diffusion, etc. papers? Progress is racing ahead. Nothing is stalling... the only thing stopping progress from accelerating even faster is data.
He is talking about Deep learning at FB
Many of these scaling patterns are logarithmic with respect to data size. You can only double the dataset size so many times that it’s really not clear the scaling will continue.
Low data modes are also progressing quite fast. There is Dreamer and more recent papers based on RL in learned world models.
This is fully pathetic. I expect poor quality from Marcus bit this really takes the cake.
>LeCun, 2022: Reinforcement learning will also never be enough for intelligence; Marcus, 2018: “ it is misleading to credit deep reinforcement learning with inducing concept[s] ”
> “I think AI systems need to be able to reason,"; Marcus 2018: “Problems that have less to do with categorization and more to do with commonsense reasoning essentially lie outside the scope of what deep learning is appropriate for, and so far as I can tell, deep learning has little to offer such problems.”
>LeCun, 2022: Today's AI approaches will never lead to true intelligence (reported in the headline, not a verbatim quote); Marcus, 2018: “deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.”
These are LeCun's supposed great transgressions? Vague statements that happen to be vaguely similar to Marcus' vague statements?
Marcus also trots out random tweets to show how supported his position is and one mentions a Marcus paper with 800 citations as being "engaged in the literature". But a paper like Attention is all you need that currently has over 40,000 citations. THAT is a paper the community is engaged with. Not something with less than 1/50th the citations.
This is a joke...
Marcus' moaning gets old, especially when his criticism is so self-referential; he's hardly the only voice against AI hype, though no doubt he's one of the loudest.
However he does seem to have legitimate complaints about the echo chamber the big names seem to be operating in.
Is Marcus trying to create the impression that somehow he is a more impactful AI contributor than LeCun? It's going to be a tough sell because I know LeCun's name from his technical work whereas I know Marcus' name from him constantly moaning about LeCun on social media. In what _tangible_ ways did Marcus contribute?
The article answers this question in detail.
Gotta love when the question proves they didn't read what they're asking about.
What is the Gary Marcus equivalent of a convolution neural network?
These guys know better than to rev the tachometer up in the lay press talking about AGI and “achieve human level intelligence” and stuff. This fluff, unfortunately, sells and so when you’ve got an ego big enough to be talking this way in the first place I suppose you feel like you have to do it?
Machine learning researchers optimize “performance” on “tasks”, and while those terms are still tricky to quantify or even define in many cases, they’re a damned sight closer to rigorous, which is why people like Hassabis who get shit done actually talk about them in the lay press, when they deal with the press at all.
We can’t agree when an embryo becomes a fetus becomes a human with anything approaching consensus. We can’t agree which animals “feel pain” or are “self aware”. We can sort of agree how many sign language tokens silverbacks can remember and that dolphins exhibit social behavior.
Let’s keep it to “beats professionals at Go” or “scores such on a Q&A benchmark”, or “draws pictures that people care to publish”, something somehow tethered to reality.
I’ve said it before and I’ll say it again: lots of luck with either of the words “artificial” or “intelligent”, give me a break on both in the same clause.
My personal thoughts about AGI is that we’ll never “achieve” it for pretty much the same reasons you, and philosophers for hundreds, if not thousands, of years have said. We can’t even be sure that anyone else is conscious and intelligent, or just a clever facsimile.
As AI (in the broadest sense) has developed, we always end up moving the goal posts. Sometimes this is because we genuinely don’t know what is difficult and what is easy due to several billion years of evolution. But some of this is because we know how the system works, and so it can’t be “intelligence”.
I think of it as like a magic trick. When you watch a someone do an illusion well, it’s amazing. They made the coin disappear. It’s real magic! But then you find out all they did was stick in their pocket, or used a piece of elastic, and then “magic” is gone.
Essentially this is partially what the Chinese Room is about. You think the Chinese speaker is real, but then you find out it’s just some schlub executing finite state machine.
So the idea is that statistical language modelling is not enough. You need a model based on logic too for "real" artificial intelligence. I wonder what the evidence for this claim is? Because the inferences and reasoning GPT3 is already capable of is incredible and beats most expert systems that I know of. And GPT4 is around the corner, Stable Diffusion was published like only a few months ago. I don't see why not more compute, more training data, and better network architectures couldn't lead to leaps and bounds of model improvements. At least for a few more years.
> Because the inferences and reasoning GPT3 is already capable of is incredible and beats most expert systems that I know of.
This is patently FALSE. You can, however, re-run a given prompt 10+ times, tweaking and nudging it into the direction you know you want, until it produces a seemingly miraculously deep result (by pure chance).
Rinse and repeat a dozen times and you have enough material for a twitter thread or medium post fawning over gpt-3.
I don't necessarily doubt you but can you give me an example of an expert system that is more capable ?
GPT3 can’t perform algebra over all 32 bit numbers. A trivial Python script can.
It behaves more like your nephew than a computer in that case. Interesting that this is often the example given for why computers are bad at certain tasks, and humans are good at others.
It is quite incredible that nothing changed about the architecture in gpt-2 vs gpt-3 (just way more connections), yet it aquired fundamentally new behavior - that if performing arithmetic calculation - despite not having large amounts of training data on the subject. I think this is the type of phenomenon that shows we are quite poor at estimating what these systems will be capable of when scaling up. So acting as if we're sure it won't lead to improvements in AI is as idiotic as claiming that it will. There are far too many people on hacker news that follow this fad of being dismissive of AI, because they make the common mistake of equating cynicism with intelligence.
It’s smoke and mirrors trying to fool you into thinking it’s generating intelligent text. In some applications e.g., a chatbot, that’s appropriate. But it’s really no comparison to an expert system for most applications, where you know exactly the right and wrong solutions. Not adding numbers correctly with the huge budget GPT3 has for training and inference is a poignant case of that fact. A linear layer taking in x and y will learn x+y just by setting the weights to 1.0, so it’s not even a hard problem for neural nets, just in the particular tokenization and architecture used for GPT models.
> Stable Diffusion was published like only a few months ago
Honest question: what's "intelligence"-like about Stable Diffusion?
Because being able to draw "a painting of joe biden as king kong on top of a skyscraper in the style of monet" was something that until very recently were thought of as requiring intelligence. Of course, now it is not so impressive anymore because it is all mathematics and digital logic. But that is the problem with defining artificial intelligence. Any time a task is implemented on a computer you can point to that implementation as evidence that the task didn't require intelligence after all. Many decades ago researchers thought that playing chess on a high level required intelligence, then go, then poker, then composing music, then driving a car, etc... Nowadays researchers are more cautious and don't state that "solving task X implies intelligence". Thus it becomes a moving target and a computer can never prove itself intelligent.
Turing test.
The Turing test in its original formulation has already been soundly defeated. People now hedge their bets and require that an AI must fool leading AI researchers to pass the test. But the original test supposed that the interrogator was "average" and fooling 99.99% of the world's population must be good enough. Either way, as LaMDA demonstrates, it is only a matter of time before even the strongest imaginable version of the Turing is also defeated.
Was a publicly run Turing test ever defeated? I could not find any record of such.
Yes, it was. Your weak google skills notwithstanding.
The bear in the movie Annihilation passed the Turing test, but it didn't seem to have much intelligence
By some metrics, intelligence is self evident. Especially in the context when you mean approximately the same thing between intelligence/conscious.
There is some intangible property I observe when I look at a human and determine they are conscious. There is some intangible property I observe when I look at a dog and determine it is conscious. There is some intangible property I observe when I look at stable diffusion and determine it is conscious.
Some attempts to explain this intangible property have been made. Almost all of the time disagreements in these explanations boil down to semantics. Yes, I consider the ability so solve problems a demonstration of intelligence. Yes, I consider to Stable Diffusion to be solving problems in this way. Also yes, I consider a hard-coded process to be behaving in a similar way.
At the end of the day we seem to define consciousness as something that makes us sufficiently sad when we hurt it.
Yann LeCun’s Facebook post from a few days ago now makes more sense to me:
https://www.facebook.com/722677142/posts/pfbid035FWSEPuz8Yqe...
From the comments on that post, written by LeCun:
"'[...] Yann LeCun, [...] is on a mission to reposition himself, not just as a deep learning pioneer, but as that guy with new ideas about how to move past deep learning'
First, I'm not 'repositioning myself'. My position paper is in the direct line of things I (and others) have thought about, talked about, and written about for years, if not decades. Gary has merely crashed the party.
My position paper is not at all about 'moving past deep learning'. It's the opposite: using deep learning in new ways, with new DL architectures (JEPAs, latent variable models), and new learning paradigms (energy-based self-supervised learning).
It's not at all about sticking symbol manipulation on top of DL as he suggests in vague terms. It's about seeing reasoning as latent-variable inference based on (hopefully gradient-based) optimization.
Gary claims that my critiques of supervised learning, reinforcement learning, and LLMs (my 'ladders') are critiques of deep learning (his 'ladder'). But they are not. What's missing from SL, RL and LLM are SSL, predictive world models, joint-embedding (non generative) architectures, and latent-variable inference (my rockets). But deep learning is very much the foundation on which everything is built.
In my piece, reasoning is the minimization of an objective with respect to latent variables. If Gary wants to call this 'symbol manipulation' and declare victory, fine. But it's merely a question of vocabulary. It certainly is very much unlike any proposal he has ever made, despite the extreme vagueness of those proposals."
How many days, cumulatively, has Gary Marcus held the SOTA record for any well known machine learning task?
Any specific reason this is relevant to the arguments in his post?
What is he even arguing here? That he has been cheated out of some kind of credit? Credit for what? Afaict he has never actually shown something novel based on his ideas to work in a way that has mattered.
> What is he even arguing here? That he has been cheated out of some kind of credit?
He's (at least to some extent) arguing that if you're going to say someone's paper is "mostly wrong", saying the same things 4 years later should probably warrant a "ok, you were right" at least.
His argument may be silly or pathetic or false or even true, but it has nothing to do with how long he held the SOTA record of any known machine learning task.
He is arguing that Yann LeCun is taking ideas from other researchers without citation or credit, and that this is a sign of insecurity and ego.
"Ideas" here being few commonly used words put together barely forming a sentence, not some algorithm, research or deep paper.
Ie. the whole "idea" being "hey, for gai we need something different than this gtp3" tweet, not "idea" as in "hey I invented this new thing I call LSTM, check it out [link to paper, results what not]".
> hey I invented this new thing I call LSTM
Well, the LSTM fella does pop up later calling out Lecun for "rehashes but doesn't cite essential work of 1999-2015". Which I guess does mean people with real "ideas" are also fed up with him?
"Deep learning pioneer Jürgen Schmidhuber, author of the commercially ubiquitous LSTM neural network, arguably has even more right to be pissed [...]"
They should make an ai to do auto-citation for them.
Because Marcus isn't a practitioner and never has been. He's a public intellectual from a different field acting like he's an AI expert. You would never listen to criticisms of a Physics theory from a Biologist and you shouldn't listen to criticisms of Neural Networks from a Psychologist.
He's proven time and time again that he doesn't understand the methods at work and doesn't even seem interested in trying to do so.
> You would never listen to criticisms of a Physics theory from a Biologist and you shouldn't listen to criticisms of Neural Networks from a Psychologist.
Why? If their arguments are sound, why shouldn't we listen to them?
Where is this silly credentialism coming from?
It's not credentialism at all, I said nothing about having a PhD in Physics for example. These theories are highly technical and if you aren't actively engaged in reading/replicating the most important papers and even lack the technical training to do so how can you really make coherent criticisms of them? It really beggars belief that a Biologist with nothing more than basic stats training can make sense of or poke holes in String Theory or Quantum Chromodynamics for example. Hence we get Marcus' pseduointellectual mush that passes for valid criticism of DL in the media or among other non-experts.
>Why? If their arguments are sound, why shouldn't we listen to them?
Generally arguments from non-experts like this fall into the "not even wrong" category and don't merit much attention.
Experts, always and everywhere, tend to massively exaggerate the scope of their expertise.
How exactly holding the SOTA record of any, or even all, machine learning task gives you any authority on true intelligence?
What gives LeCun any authority on true intelligence?
Even Lecun points out that his paper is not technical.
The only thing that matters is the of the argument.
> LeCun, 2022: Today's AI approaches will never lead to true intelligence (reported in the headline, not a verbatim quote); Marcus, 2018: “deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.”
I swear same thing was being said 10+ years ago
Just throw more GPUs at it bro!
Yes, it's been a common view.
Ideas are cheap. There are a thousand nameless people who have already had these ideas. Doing the work is the thing.
Gary talking about himself. Nothing really to see here. This is literally someone on the internet arguing about pointless crap.
Isn't this just a case of over-fitting? Recent LeCun has perhaps been over-fitted to Marcus's past writing. Maybe some augmentation (with new ideas) will resolve the issue?
I mean, there's that old saying about "standing on the shoulders of giants" and similar refrains for a reason. All science is cumulative and builds on things that came before. And a lot of times it seems that old ideas "go dormant" for a time, and then come roaring back due to some small tweak or change in available technology, etc. See the entire history of neural networks for example.
So I guess I'd say that if Gary has a legitimate beef, it would just be in regards to acknowledgement / citation / whatever. If Yann really was familiar with Gary's older work, then came around to the same ideas, but refused to acknowledge Gary, that could be seen as somewhat petty and vindictive. That said, I have no idea to what extent that is actually the case. Not trying to take sides here. I respect both guys to a tremendous degree.
Time will tell if we need symbolic representations or if continuous ones are sufficient. In the meantime, it would be more productive to present alternative methods or at least benchmarks where deep learning models are outperformed, instead of arguing about who said what first and criticising without offering quantitative evidence or alternatives
This is frustrating:
Consider this:
LeCun, 2022: Today's AI approaches will never lead to true intelligence (reported in the headline, not a verbatim quote); Marcus, 2018: “deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.”
How can that be something that LeCun did not give Marcus credit for? It is borderline self evident, and people have been saying similar things since neural networks were invented. This would only be news if LeCun had said that "neural nets are all you need" (literally, not as a reference to the title of the transformers paper).
And furthermore, if LeCun had said that, there are literally dozens of people who have also said that you need to combine the approaches.
He cites a single line:'LeCun spent part of his career bashing symbols; his collaborator Geoff Hinton even more so, Their jointly written 2015 review of deep learning ends by saying that they “new paradigms are needed to replace rule-based manipulation of symbolic expressions.”'
Well, sure because symbol processing alone is not the answer either. We need to replace it with some hybrid. How is this a contradiction?
To summarize: people have been looking for a productive way to combine symbolic and statistical systems -- there are in fact many such systems proposed with varying degrees of success. LeCun agrees with this approach (no one has anything to lose by endorsing adding things to any model), but Marcus insists he came up with it and he should be cited.
Ugh.
Gary Marcus is the definition of petty. He brands himself as an ai skeptic but in reality he's just a clout chaser more obsessed with being right and his own image than anything else.
In his mind he is always right. Every single tweet he made, every single sentence he has said is never wrong. He is 100% right everyone else is 100% wrong.
So what? Is he actually right, or is he wrong? A good argument delivered badly is still a good argument.
He's a fool who hurls criticisms, gets repeatedly disproven, and doesn't actually execute on anything. It's obvious why le cun's words carry more weight; he and his labs get shit done; he speaks from experience, not sophistry.
In other words, Gary Marcus has managed to match some linguistic sub-patterns between two articles, but has not proved he is intelligent.
He’s matched some “linguistic patterns” that seem to indicate that LeCun has adopted ideas for which people like you call Marcus a fool. I’m going to cut him some slack.
Just giving back exactly the level of consideration he gives to ML as a field... Pattern matching isn't intelligence, after all.
>LeCun, 2022: Today's AI approaches will never lead to true intelligence (reported in the headline, not a verbatim quote); Marcus, 2018: “deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.”
If you think that is substantive evidence for a stolen idea then it's surely not possible for anyone to ever have an original thought.
> If you think that is substantive evidence for a stolen idea
He says in the article that he doesn't think it's a stolen idea:
"I won’t accuse LeCun of plagiarism, because I think he probably reached these conclusions honestly, after recognizing the failures of current architectures."
Name one academic you look up to who never admits he is wrong.
I'll be glad to do so after you explain how that's relevant any argument they may make.
As far as I know our brains are mostly unchanged for thousands of years. So any novel ideas anyone has are a result of standing on the shoulders of giants, idea-wise and technology-wise, so it seems rather silly to give any individual the lion's share of the credit for any new idea of any kind, anywhere.
I think I prefer the Emily Bender approach of asserting that no one should be allowed to train deep learning models at all. If you're going to claim some sort of authority over a technology you don't actually develop then you might as well go hard.
This guy and his weird AI feud nobody cares about.
Why do people keep upvoting his stuff?
Does anyone actually care about stuff like this?