Settings

Theme

Chomsky and the Two Cultures of Statistical Learning (2011)

norvig.com

103 points by atomicnature 7 days ago · 126 comments

Reader

adamddev1 2 days ago

> Chomsky has focused on the generative side of language

The answers to "why" that Chomsky pushes so hard for are very valuable to adult language learners. There are basic syntactic rules to generating broadly correct language. Having these rules discovered and explained in the simplest possible form is irreplaceable by statistical models. Neural networks, much like native speakers can say "well this just sounds right," but adult learners need a mathematical theory of how and why they can generate sentences. Yes, this changes with time and circumstances, but the simple rules and theories are there if we put the effort in to look for them.

There are many languages with a very small corpus of training data. The LLMs fail miserably at communicating with them or explaining things about their grammar, but if we look hard for the underlying theories Chomsky was looking for, we can make huge leaps and bounds in understanding how to use them.

intalentive 2 days ago

This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).

  • gsf_emergency_6 2 days ago

    A related* essay (2010) by a statistician on the goals of statistical modelling that I've been procrastinating on:

    https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf

    To Explain Or To Predict?

    Nice quote

    We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.

    Hagerty+Srinivasan (1991)

    *like TFA it's a sorta review of Breiman

    • 0928374082 a day ago

      is it more than a commentary on overfitting to the tune of "with enough epicycles you can make the elephant wiggle its trunk"?

      • gsf_emergency_6 a day ago

        If you are referring to Hagerty+Srinivasan:

        They certainly didn't think that a better fit => "truer".

        They used the term "truer" to describe a model that more accurately captures the underlying causal structure or "true" relationship between variables in a population.

        As for the paper I linked, I still haven't read it closely enough to confirm that D-Machine's comment below is a good dismissal.

        I'm inclined to think it's more like "interpolating vs extrapolating"

  • tripletao 2 days ago

    This essay frequently uses the word "insight", and its primary topic is whether an empirically fitted statistical model can provide that (with Norvig arguing for yes, in my opinion convincingly). How does that differ from your concept of a "cause"?

    • musicale 2 days ago

      > I agree that it can be difficult to make sense of a model containing billions of parameters. Certainly a human can't understand such a model by inspecting the values of each parameter individually. But one can gain insight by examing (sic) the properties of the model—where it succeeds and fails, how well it learns as a function of data, etc.

      Unfortunately, studying the behavior of a system doesn't necessarily provide insight into why it behaves that way; it may not even provide a good predictive model.

      • tripletao 2 days ago

        Norvig's textbook surely appears on the bookshelf of researchers including those building current top LLMs. So it's odd to say that such an approach "may not even provide a good predictive model". As of today, it is unquestionably the best known predictive model for natural language, by huge margin. I don't think that's for lack of trying, with billions of dollars or more at stake.

        Whether that model provides "insight" (or a "cause"; I still don't know if that's supposed to mean something different) is a deeper question, and e.g. the topic of countless papers trying to make sense of LLM activations. I don't think the answer is obvious, but I found Norvig's discussion to be thoughtful. I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples.

        • atomicnatureOP 2 days ago

          You can look into Judea Pearl's definitions of causality for more information.

          Pearl defines a ladder of causation:

          1. Seeing (association) 2. Doing (intervention) 3. Imagining (counterfactuals)

          In his view - most ML algos are at level 1 - they look at data and draw associations, and "agents" have started some steps in level 2 - doing.

          The smartest of humans operate mostly in level (3) of abstractions - where they see things, gain experience, and later build up a "strong causal model" of the world and become capable of answering "what if" questions.

        • musicale a day ago

          Thanks for the response, but (per the omitted portion of my sentence before the semicolon) I was not talking about the M in LLM. I was talking about a conceptual or analytic model that a human might develop to try to predict the behavior of an LLM, per Norvig's claim of insight derived from behavioral observation.

          But now that I think a bit about it, the observation that an LLM seems to frequently produce obviously and/or subtly incorrect output, is not robust to prompt rewording, etc. is perhaps a useful Norvig-style insight.

        • foldr 2 days ago

          Chomsky's talking about predictive models in the context of cognitive science. LLMs aren't really a predictive model of any aspect of human cognitive function.

          • tripletao a day ago

            The generation of natural language is an aspect of human cognition, and I'm not aware of any better model for that than current statistical LLMs. The papers mapping between EEG/fMRI/etc. and LLM activations have been generally oversold so far, but it's active area of research for good reason.

            I'm not saying LLMs are a particularly good model, just that everything else is currently worse. This includes Chomsky's formal grammars, which fail to capture the ways humans actually use language per Norvig's many examples. Do you disagree? If so, what model is better and why?

            • D-Machine a day ago

              Also, in case you missed the recent big thread, fMRI has taught us almost nothing due to its serious limitations and various measurement and design issues in the field. IMO it is way too slow and clunky to ever yield insights into something as fast as linguistic thought.

              https://news.ycombinator.com/item?id=46288415

            • foldr a day ago

              I’m not really sure what you’re getting at. Could you point to some papers exemplifying the kind of work that you’re thinking of? Of course there are lots of people training LLMs and other statistical models on EEG data, but that does not show that, say, GPT-5, is a good model of any aspect of human cognition.

              Chomsky, of course, never attempted to model the generation of natural language and was interested in a different set of problems, so LLMs are not really a competitor in that sense anyway (even if you take the dubious step of accepting them as scientific models).

              I certainly don’t agree with Norvig, but he doesn’t really understand the basics of what Chomsky is trying to do, so there is not much to respond to. To give three specific examples, he (i) is confused in thinking that Gold’s theorem has anything to do with Chomsky’s arguments, (ii) appears to think that Chomsky studied the “generation of language” (because he he’s read so little of Chomsky’s work that he doesn’t know what a “generative grammar” is), and (iii) believes that Chomsky thinks that natural languages are formal languages in which every possible sentence is either in the language or not (again because he’s barely read anything that Chomsky wrote since the 1950s). Then, just to make absolutely sure not to be taken seriously, he compares Chomsky to Bill O’Reilly!

              On point (iii), see http://www.linguistics.berkeley.edu/~syntax-circle/syntax-gr..., and the last complete paragraph of p. 145.

              • D-Machine a day ago

                This comment and GP comment are why the word "causal model" is needed. LLMs are predictive* models of human language, but they are not causal models of language.

                If you believe that some of human cognition is linguistic (even if e.g. inner monologue and spoken language are just the surface of deeper more unconscious processes), then, yes, we might say LLMs can predictively model some aspects of human cognition, but, again, they are certainly not causal models, and they are not predictive models of human cognition generally (as cognition is clearly far, far more than linguistic).

                * I avoid calling LLMs "statistical" because they really aren't even that. They are not calibrated, and including a softmax and log-loss in things doesn't magically make your model statistical (especially since ad-hoc regularization methods, other loss functions and simplex mappings, e.g. sparsemax, often work better and then violate the assumptions that are needed to prove these things are behaving statistically). LLMs really are more accurately just doing (very, very fancy and impressive) curve/manifold-fitting.

                • foldr a day ago

                  They are not predictive models in the domains Chomsky investigated. LLMs make no predictions about, say, when non-surface quantifier scope should or should not be possible, or what should or shouldn’t be a wh-island. They are predictive in a sense that’s largely irrelevant to cognitive science. (Trying to guess what words might come after some other words isn’t a problem in cognitive science.)

                  • tripletao 20 hours ago

                    "What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"! An LLM encodes billions of such statements, just unfortunately in a quantity and form that makes them incomprehensible to an unaided human. That part is strictly worse; but the LLM's statements model language well enough to generate it, and that part is strictly better.

                    As I read Norvig's essay, it's about that tradeoff, of whether a simple and comprehensible but inaccurate model shows more promise than a model that's incomprehensible except in statistical terms with the aid of a computer, but far more accurate. I understand there's a large group of people who think Norvig is wrong or incoherent; but when those people have no accomplishments except within the framework they themselves have constructed, what am I supposed to think?

                    Beyond that, if I have a model that tells me whether a sentence is valid, then I can always try different words until I find one that makes it valid. Any sufficiently good model is thus capable of generation. Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.

                    As to the relationship between signals from biological neurons and ANN activations, I mean something like the paper linked below, whose authors write:

                    > Thus, even though the goal of contemporary AI is to improve model performance and not necessarily to build models of brain processing, this endeavor appears to be rapidly converging on architectures that might capture key aspects of language processing in the human mind and brain.

                    https://www.biorxiv.org/content/10.1101/2020.06.26.174482v3....

                    I emphasize again that I believe these results have been oversold in the popular press, but the idea that an ANN trained on brain output (including written language) might provide insight into the physical, causal structure of the brain is pretty mainstream now.

                    • foldr 18 hours ago

                      > What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"!

                      This gets at the nub of the misunderstanding. Chomsky is interested in modeling the range of grammatical structures and associated interpretations possible in natural languages. The wh-island condition is a universal structural constraint that only indirectly (and only sometimes) has implications for which sequences of words are ‘valid’ in a particular language.

                      LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

                      If you want a more concrete example of why wh-islands can’t be understood in terms of permissible or impermissible sequences of words, consider cases like

                      How often did you ask why John took out the trash?

                      The wh-island created by ‘why’ removes one of the in-principle possible interpretations (the embedded question reading where ‘how often’ associates with ‘took’), but the sequence of words is fine.

                      > Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.

                      No, Chomsky really was working on a different task: a solution to the logical problem of language acquisition and a theory of the range of possible grammatical variation across human languages. There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text. From a cognitive point of view, text generation rather obviously involves the contribution of many non-linguistic cognitive systems which are not modeled (nor intended to be modeled) by a generative grammar.

                      >the paper linked below

                      This paper doesn’t make any claims that are obviously incompatible with anything that Chomsky has said. The fundamental finding is unsurprising: brains are sensitive to surprisal. The better your language model is at modeling whether or not a sequence of words is likely, the better you can predict the brain’s surprisal reactions. There are no implications for cognitive architecture. This ought to be clear from that fact that a number of different neural net architectures are able to achieve a good degree of success, according to the paper’s own lights.

                      • tripletao 6 hours ago

                        > LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

                        The human-designed architecture of an LLM makes no such prediction; but after training, the overall system including the learned weights absolutely does, or else it couldn't generate valid language. If you'd prefer to run in the opposite direction, then you can feed in sentences with correct and incorrect wh-movement, and you'll find the incorrect ones are much less probable.

                        That prediction is commingled with billions of other predictions, which collectively model natural language better than any machine ever constructed before. It seems like you're discounting it because it wasn't made by and can't be understood by an unaided human; but it's not like the physicists at the LHC are analyzing with paper and pencil, right?

                        > There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text.

                        Imagine that claim in human form--I'm an expert in the structure of the Japanese language, but I'm unable to hold a basic conversation. Would you not feel some doubt? So why aren't you doubting the model here? Of course it would have been outlandish to expect that of a model five years ago, but it isn't today.

                        I see your statement that Chomsky isn't attempting to model the "many non-linguistic cognitive systems", but those don't seem to cause the LLM any trouble. The statistical modelers have solved problem after problem that was previously considered impossible, and the practical applications of that are (for better or mostly worse) reshaping major aspects of society. Meanwhile, every conversation I've had with a Chomsky supporter seems to reduce to "he is deliberately choosing not to produce any result evaluable by a person who hasn't spent years studying his theories". I guess that's true, but that mostly just makes me regret what time I've already spent.

                        • foldr 5 hours ago

                          > The human-designed architecture of an LLM makes no such prediction; but after training, the overall system including the learned weights absolutely does, or else it couldn't generate valid language.

                          It makes a prediction about whatever language(s) are in the training data, but it doesn’t make any (substantial) predictions about general constraints on human languages. It really seems that you’re missing the absolutely fundamental goal of Chomsky’s research program here. Remember that whole “universal grammar” thingy?

                          > -I'm an expert in the structure of the Japanese language, but I'm unable to hold a basic conversation. Would you not feel some doubt?

                          I expect anyone learning Japanese as a second language will get a chuckle out of this one. It’s in fact a common scenario. You can learn a lot about the grammar of a language, but conversation requires the ability to use that knowledge immediately and fluidly in a wide variety of situations. It is like the difference between “knowing how to solve a differential equation” and being able to answer 50 questions within an hour in a physics exam.

                          > I see your statement that Chomsky isn't attempting to model the "many non-linguistic cognitive systems", but those don't seem to cause the LLM any trouble.

                          Of course they don’t, because researchers creating LLMs are (in the vast majority of cases) not attempting to model any particular cognitive system; they have engineering goals, not scientific ones. You seem to be stuck in the view that Chomsky is somehow trying and completely failing to do the thing that LLMs do successfully. This certainly makes for a good straw man (if Chomsky had the same goals, then yeah, he never got anywhere), but it’s a misunderstanding of his research program.

                          > "he is deliberately choosing not to produce any result evaluable by a person who hasn't spent years studying his theories"

                          You could say this of many perfectly respectable fields. Andrew Wiles has not produced any result evaluable by me or by almost anyone else. It would certainly take me a lot more than “a few years” of study to evaluate his work.

                          I’m afraid there are no intellectual shortcuts. If you want to evaluate Chomsky’s work, you will have to at least read it, and maybe even think about it a bit too! It seems a bit churlish to whine about that. All you are being deprived of by opting out of this time investment is the opportunity to make informed criticisms of his work on the internet.

                          (The good news is that generative linguistics is actually pretty accessible, and one year of part time study would probably be enough to get the lay of the land.)

                  • D-Machine a day ago

                    Correct, LLMs are predictive also only in a narrow sense!

        • D-Machine 2 days ago

          > I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples.

          I struggle to motivate engaging with it because it is unfortunately quite out of touch with (or just ignores) some core issues and the major advances in causal modeling and causal modeling theory, i.e. Judea Pearl and do-calculus, structural equation modeling, counterfactuals, etc [1].

          It also, IMO, makes a (highly idiosyncratic) distinction between "statistical" (meaning, trained / fitted to data) and "probabilistic" models, that doesn't really hold up too well.

          I.e. probabilistic models in quantum physics are "fit" too, in that the values of fundamental constants are determined by experimental data, but these "statistical" models are clearly causal models regardless. Even most quantum physical models can be argued to be causal, just the causality is probabilistic rather than absolute (i.e. A ==> B is fuzzy implication rather than absolute implication). It's only if you ask deliberately broad ontological questions (e.g. "Does the wave function cause X") that you actually run into the problem of quantum models being causal or not, but for most quantum physical experiments and phenomena generally, the models are still definitely causal at the level of the particles / waves / fields involved.

          IMO I don't want to engage much with the arguments because it starts on the wrong foot and begins by making, in my opinion, an incoherent / unsound distinction, while also ignoring or just being out of date with the actual scientific and philosophical progress and issues already made here.

          I would also say there is a whole literature on tradeoffs between explanation (descriptive models in the worst case, causal models in the best case) and prediction (models that accurately reproduce some phenomenon, regardless of if they are based on and true description or causal model). There are also loads of examples of things that are perfectly deterministic and modeled by perfect "causal" models but which are of course still defy human comprehension / intuition, in that the equations need to be run on computers for us to make sense of them (differential equation models, chaotic systems, etc). Or just more practically, we can learn to do all sorts of physical and mental skills, but of course we understand barely anything about the brain and how it works and co-ordinates with the body. But obviously such an understanding is mostly irrelevant for learning how to operate effectively in the world.

          I.e. in practice, if the phenomenon is sufficiently complex, an accurate causal model that also accurately models the system is likely to be too complex for us to "understand" anyway (or you just have identifiability issues so you can't decide between multiple different models; or you don't have the time / resources / measurement capacity to do all the experiments needed to solve the identifiability problem anyway), so there is almost always a tradeoff between accuracy/understanding. Understanding is a nice luxury, but in many cases not important, and in complex cases, probably not achievable at all. If you are coming from this perspective, the whole "quandary" of the essay seems just odd.

          [1] https://plato.stanford.edu/entries/causal-models/

          • tripletao 2 days ago

            Unless and until neurologists find evidence of a universal grammar unit (or a biological Transformer, or whatever else) in the human connectome, I don't see how any of these models can be argued to be "causal" in the sense that they map closely to what's physically happening in the brain. That question seems so far beyond current human knowledge that any attempt at it now has about as much value as the ancient Greek philosophers' ideas on the subatomic structure of matter.

            So in the meantime, Norvig et al. have built statistical models that can do stuff like predicting whether a given sequence of words is a valid English sentence. I can invent hundreds of novel sentences and run their model, checking each time whether their prediction agrees with my human judgement. If it doesn't, then their prediction has been falsified; but these models turned out to be quite accurate. That seems to me like clear evidence of some kind of progress.

            You seem unimpressed with that work. So what do you think is better, and what falsifiable predictions has it made? If it doesn't make falsifiable predictions, then what makes you think it has value?

            I feel like there's a significant contingent of quasi-scientists that have somehow managed to excuse their work from any objective metric by which to evaluate it. I believe that both Chomsky and Judea Pearl are among them. I don't think every human endeavor needs to make falsifiable predictions; but without that feedback, it's much easier to become untethered from any useful concept of reality.

            • D-Machine a day ago

              I would think it was quite clear from my last two paragraphs that I agree causal models are generally not as important as people like Chomsky think, and that in general are achievable only in incredibly narrow cases. Besides, all models are wrong: but some are useful.

              > You seem unimpressed with that work

              I didn't say anything about Norvig's work, I was saying the linked essay is bad. It is correct that Chomsky is wrong, but is a bad essay because it tries to argue against Chomsky with a poorly-developed distinction while ignoring much stronger arguments and concepts that more clearly get at the issues. IMO the essay is also weirdly focused on language and language models, when this is a general issue about causal modeling and scientific and technological progress, and so the narrow focus here also just weakens the whole argument.

              Also, Judea Pearl is a philosopher, and do-calculus is just one way to think about and work with causality. Talking about falsifiability here is odd, and sounds almost to me like saying "logic is unfalsifiable" or "modeling the world mathematically is unfalsifiable". If you meant something like "the very concept of causality is incoherent", that would be the more appropriate criticism here, and more arguable.

              • tripletao 21 hours ago

                I could iterate with an LLM and Lean, and generate an unlimited amount of logic (or any other kind of math). This math would be correct, but it would almost surely be useless. For this reason, neither computer programs nor grad students are rewarded simply for generating logically correct math. They're instead expected to prove a theorem that other people have tried and failed to prove, or perhaps to make a conjecture with a form not obvious to others. The former is clearly an achievement, and the latter is a falsifiable prediction.

                I feel like Norvig is coming from that standpoint of solving problems well-known to be difficult. This has the benefit that it's relatively easy to reach consensus on what's difficult--you can't claim something's easy if you can't do it, and you can't claim it's hard if someone else can. This makes it harder to waste your life on an internally consistent but useless sidetrack, as you might even agree (?) Chomsky has.

                You, Chomsky, and Pearl seem to reject that worldview, instead believing the path to an important truth lies entirely within your and your collaborators' own minds. I believe that's consistent with the ancient philosophers. Such beliefs seem to me halfway to religious faith, accepting external feedback on logical consistency, but rejecting external evidence on the utility of the path. That doesn't make them necessarily bad--lots of people have done things I consider good in service of religions I don't believe in--but it makes them pretty hard to argue with.

                • D-Machine 20 hours ago

                  I'm not sure how you can square anything you said in your last paragraph with anything I said about all models being wrong, and causal modeling being extremely limited.

  • D-Machine 2 days ago

    I had this exact reaction, no discussion of "causal modeling" makes the whole thing seem horribly out of touch with the real issues here. You can have explanatory and predictive models that are causal models, or explanatory and predictive models that are non-causal, and that this the actual issue, not "explanation" vs. "prediction", which is not a tight enough distinction.

MoravecsParadox 2 days ago

> derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior.

It's crazy how wrong Chomsky was about machine learning. Maybe the real truth is that humans are stochastic parrots who have an underlying probability distribution - and because gradient descent is so good at reproducing probability distributions - LLMs are incredibly good at reproducing language.

  • AuthAuth 2 days ago

    Is it crazy? Chomsky is wrong on so many of the topics he speaks about.

barrenko 2 days ago

Is this bayesian vs. frequentist?

  • tgv 2 days ago

    In one word: no.

    In more detail: Chomsky is/was not concerned with the models themselves, but rather with the distinction between statistical modelling in general, and "clean slate" models in particular on the one hand, and structural models discovered through human insight on the other.

    With "clean slate" I mean models that start with as little linguistically informed structure as possible. E.g., Norvig mentions hybrid models: these can start out as classical rule based models, whose probabilities are then learnt. A random neural network would be as clean as possible.

pmkary 2 days ago

I have many books from Chomsky, and I want to throw them away because it disgusts me to have them. Then I think, why should I throw away things I spent so much on? It makes me more angry. So I have pilled them up somewhere to figure out what ti do with them and each time I walk past it I feel sad to ever passed by his work.

  • eucyclos 2 days ago

    There's an interview with Dan schmachtenberger where he talks about the worst book ever written (his opinion is that it's 'the 48 laws of power'). He made the point that being consistently wrong is actually pretty impressive, and there are worthwhile lessons from watching someone getting taken seriously despite being wrong. Maybe you could revisit them with that approach.

    • aleph_minus_one 2 days ago

      > There's an interview with Dan schmachtenberger where he talks about the worst book ever written (his opinion is that it's 'the 48 laws of power').

      Could it be this?

      > https://www.youtube.com/watch?v=eIzRV4TxHo8

    • malvim 2 days ago

      I don’t think they’re disgusted by Chomsky’s work because it’s wrong. They’re disgusted because of the recently surfaced ties with Epstein.

      Not sure the approach holds.

      • pmkary 16 hours ago

        Actually, it's both. I wanted to study media theory, and it was interesting that his work both appeared in compilers and philosophy, so I thought, “Let’s buy some books and dig into them.” The content was stupid, but I didn't need to throw the books away. After writing that comment here, I actually went and sent all of them to paper recycling...

  • rixed 2 days ago

    Are you reacting with as much intensity when you walk past any scientific work older than 20 years?

    • pmkary 16 hours ago

      It's not about the science, I keep all the deprecated or rendered wrong/irrelevant books because they shaped me at some point and I'm proud of that. But finding out an author sitting on your bookshelf can possibly be a child abuser and definitely in-ties with Epstein disgusts me and I no longer keep anything from them.

      • rixed 10 hours ago

        So that's what it is, isn't it? A FUD campaign against the old political rival who is now dying and unable to defend himself?

        I guess there is no point for me asking you if you even cared to look at the "evidence ".

  • IndySun 2 days ago

    Make sure to vet your entire circle - friends, relatives, books, movies, everything... it's going to take a while. In the meantime you'll stop learning/growing too.

    Mine is as ludicrous a suggestion as it is to damn by association.

  • f1shy 2 days ago

    I assume this comes from his views in politics and/or association with things like Epstein. I would say, independent of that, some works of him can be very valuable. Private life of persons and their work, are better put in totally different context, and not mixed.

    • darubedarob 2 days ago

      Is that a Werner von Braun quote?

    • spwa4 2 days ago

      The thing is, nothing that usually changes things applies to Chomsky. What he did was most certainly not a normal thing to do in his time. Like one might say about George Washington or even further back, like Clovis. By today's standards they were morally wrong, but not by the standards of their time and they advanced morals. They made things better.

      Chomsky is wrong by the standards of his time and is making things worse rather than better.

      It was very much the opposite of Chomsky's ideology as well. So it additionally means he's fake. BOTH on his morals and politics/activism, from both sides (ie. both helping a paedophile, and helping/entertaining a billionnaire).

      So it's (yet another) case of an important figure that supposedly stands for something, not just demonstrating he stands for nothing at all, but being a disgusting human being as well.

      • mikojan 2 days ago

        > It was very much the opposite of Chomsky's ideology as well.

        On the contrary. Chomsky was open about his civil-libertarian principles: If you are convicted, and you complete your court-ordered obligations, you have a clean slate.

        • spwa4 2 days ago

          Tell me, did that attitude extend to helping billionnaires who are having sex with minors? Because that's what he did. Is that what this ideology stands for?

          • mikojan 2 days ago

            Yes, of course. It is the whole point. Nobody cares about your 20 year old parking tickets.

  • andyjohnson0 2 days ago

    I don't understand. What is it about Chomsky's work that disgusts you? Or is this a reference to his political opinions?

    • cubefox 2 days ago

      Read the article above. There is a link at the top of this submission to an essay by Peter Norvig, arguing (correctly, in retrospect) that Chomsky's approach to language modelling is mistaken.

      • andyjohnson0 2 days ago

        Obviously I did read the article. And I know how the hn site works.

        I have a passing familiarity with the debate over Chomsky's theories of universal grammar etc. I didn't notice anything in the article that would cause disgust, and so I wondered what I was failing to understand.

        • cubefox 2 days ago

          If you have read many books by Chomsky, it might make you angry that you have wasted so much time on what turned out to be a fundamentally mistaken theory.

          • cubefox a day ago

            The people who downvoted this apparently didn't read the article.

            • andyjohnson0 18 hours ago

              Isn't "theory turned out to be wrong" just the price of doing science? And its a good thing: something has been learned.

    • pmkary 16 hours ago

      The fact that he wrote volumes about manufacturing consent, death of the American dream and Israel's invasion of Palestine while he used to travel in luxurious jets with Epstein who was everything that he pretended to fight against.

    • darubedarob 2 days ago

      His russian imperialism support and his broad rejection of the eastern european civilian uprising against the communist project. Like many idealists he took a utopian, idealizing view and ran with it reality and real suffering caused be damned. Like many idealists he offered basically a API for sociopaths to be hijacked and used as a useful idiot against humanity. This way predictable leads to ruin and ashes as legacy and it did so for him. The epstein connection is just the cherry on top.

      • wanderlust123 2 days ago

        Sounds like bit of an over-reaction if I am being honest.

        Some of his books are deeply insightful even if you decide to draw the opposite conclusion. I wouldn’t say anything would create disgust unless you had a conclusion you wanted supported before reading the book.

        Regarding the Epstein thing, bizarre to bring that up when discussing his works, seems like you hate him on a personal level.

        • kroaton 2 days ago

          I think it is fair to hate pedophiles.

          • wanderlust123 2 days ago

            Pretty massive stretch making that inference based on the data don’t you think? Or is this an underhand way to get back at someone you disagree with politically?

            • darubedarob a day ago

              No, but he should be in a prison cell with trump, clinton and the other creeps

codeulike 2 days ago

(this is from 2017)

tripletao 2 days ago

Here's Chomsky quoted in the article, from 1969:

> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.

He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.

I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.

  • thomassmith65 2 days ago

    I agree that Chomsky's influence, especially in this century, has done more harm than good.

    There's no point minimizing his intelligence and achievements, though.

    His linguistics work (eg: grammars) is still relevant in computer science, and his cynical view of the West has merit in moderation.

    • tripletao 2 days ago

      If Chomsky were known only as a mathematician and computer scientist, then my view of him would be favorable for the reasons you note. His formal grammars are good models for languages that machines can easily use, and that many humans can use with modest effort (i.e., computer programming languages).

      The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP.

      As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead.

      • thomassmith65 2 days ago

        I have the feeling we're focusing on different time periods.

        Chomsky already was very active and well-known by 1960.

        He pioneered areas in Computer Science, before Computer Science was a formal field, that we still use today.

        His political views haven't changed much, but they were beneficial back when America was more naive. They are harmful now only because we suffer from an absurd excess of cynicism.*

        How would you feel about Chomsky and his influence if we ignored everything past 1990 (two years after Manufacturing Consent)?

        ---

        * Just imagine if Nixon had been president in today's environment... the public would say "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?" Too much skepticism is as bad as too little.

        • thomassmith65 2 days ago

          I wrote "when America was more naive" but that isn't entirely correct. Americans are more naive today in certain areas. If my comment weren't locked, I would change that sentence to something like "when Americans believed most of what they read in the newspaper"

        • tripletao 2 days ago

          I agree that his contributions to proto-computer-science were real and significant, though I think they're also overstated. Note the link to the Wikipedia page for BNF elsewhere in these comments. There's no evidence that Backus or Naur were aware of Chomsky's ideas vs. simply reinventing them, and Knuth argues that an ancient Indian Sanskrit grammarian deserves priority anyways.

          I think Chomsky's political views were pretty terrible, especially before 1990. He spoke favorably of the Khmer Rouge. He dismissed "Murder of a Gentle Land", one of the first Western reports of their mass killing, as a "third rate propaganda tract". As the killing became impossible to completely deny, he downplayed its scale. Concern for human rights in distant lands tends to be a left-leaning concept in the West, but Chomsky's influence neutralized that here. This contributed significantly to the West's indifference, and the killing continued. (The Vietnamese communists ultimately stopped it.)

          Anyone who thinks Chomsky had good political ideas should read the opinions of Westerners in Cambodia during that time. I'm not saying he didn't have other good ideas; but how many good ideas does it take to offset 1.5-2M deaths?

          • thomassmith65 2 days ago

            Judging by that comment, you probably know more about him than I do. I won't try to rebut it, but I enjoyed reading it.

        • jeremyjh 2 days ago

          > Just imagine if Nixon had been president in today's environment... the public would say "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?" Too much skepticism is as bad as too little.

          Today it would not matter in the least if the president were understood to have covered up a conspiracy to break into the DNC headquarters. Much worse things have been dismissed or excused. Most of his party would approve of it and the rest would support him anyway so as not to damage "their side".

  • dleeftink 2 days ago

    > novel sentence

    The question then becomes on of actual novelty versus the learned joint probabilities of internalised sentences/phrases/etc.

    Generation or regurgitation? Is there a difference to begin with..?

    • tripletao 2 days ago

      I'm not sure what you mean? As the length of a sequence increases (from word to n-gram to sentence to paragraph to ...), the probability that it actually ever appeared (in any corpus, whether that's a training set on disk, or every word ever spoken by any human even if not recorded, or anything else) quickly goes to exactly zero. That makes it computationally useless.

      If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc.

      • dleeftink 2 days ago

        My point is, that even in the new paradigm where probabilistic sequences do offer a sensible approximation of language, would novelty become an emergent feature of said system, or would such a system remain bound to the learned joint probabilities to generate sequences that appear novel, but are in fact (complex) recombinations of existing system states?

        And again the question being, whether there is a difference at all between the two? Novelty in the human sense is also often a process of chaining and combining existing tools and thought.

  • techsystems 2 days ago

    He did say 'any known' back in the year 1969 though, so judging it to today's knowns would still not be a justification to the idea's age.

    • tripletao 2 days ago

      Shannon first proposed Markov processes to generate natural language in 1948. That's inadequate for the reasons discussed extensively in this essay, but it seems like a pretty significant hint that methods beyond simply counting n-grams in the corpus could output useful probabilities.

      In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative.

  • agumonkey 2 days ago

    wasn't his grammar classification revolutionary at the time ? it seems it influenced parsing theory later on

    • eru 2 days ago

      His grammar classification is really useful for formal grammars of formal languages. Like what computers and programming languages do.

      It's of rather limited use for natural languages.

      • koolala 2 days ago

        "BNF itself emerged when John Backus, a programming language designer at IBM, proposed a metalanguage of metalinguistic formulas ... Whether Backus was directly influenced by Chomsky's work is uncertain."

        https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form

        I'm not sure it required Chomsky's work.

        • eru 2 days ago

          Oh, lots of stuff gets invented multiple times, when it's "in the air". Nothing special about Chomsky here. And I wouldn't see that distracting from this particular achievement.

      • ogogmad 2 days ago

        Don't you think people would have figured it out by themselves the moment programmers started writing parsers? I'm not sure his contribution was particularly needed.

        • eru 2 days ago

          Lots of things get invented / discovered multiple times when it's in the air. But just because Newton (or Leibnitz) existed, doesn't mean Leibnitz (or Newton) were any less visionary.

          For your very specific question: have a look at the sorry state of what's called 'regular expressions' many programming languages and libraries to see what programmers left loose can do. (Most of these 'regular expressions' add things like back-references etc that make matching their franken-'xpressions take exponential time in the worst case; but they neglect to put in stuff like intersection or complement of expressions, which are matchable in linear time.

          • ogogmad 20 hours ago

            Just checked after reading your comment. Surprisingly to me, AFAs (Alternating Finite Automatons) do let you introduce the Complement operation into Regex while preserving the O(mn) complexity of running NFAs.

            That's really subtle, because deciding Regex universality (i.e. whether a regex accepts every input) is PSPACE-COMPLETE. And since NFAs make it efficient to decide whether a regex matches NO inputs, any attempts to combine NFAs with regex Complement would trip on a massive landmine.

            • ogogmad 15 hours ago

              Actually, I've now checked more thoroughly, and RE+complement matching is O(n^2 m), which is not that good.

              • eru 12 hours ago

                See https://en.wikipedia.org/wiki/Regular_language#Closure_prope...

                The complement of a regular language is a regular language, and for any given regular language we can check whether a string is a member of that language in O(length of the string) time.

                Yes, depending on how you represent your regular language, the complement operator might not work play nicely with that representation. But eg it's fairly trivial for finite state machines or when matching via Brzozowski derivatives. See https://en.wikipedia.org/wiki/Brzozowski_derivative

                See also https://github.com/google/redgrep

                Regular languages have a bunch of closure properties. In practice, intersection and complement are really useful. So you can say things like:

                [regular_expression_for_matching_email_addresses] & ![my_list_of_naughty_words]

                Exercise for the reader: expand the expression above to allow the town of 'Scunthorpe'.

      • adamddev1 2 days ago

        It's incredibly useful for natural languages.

        • foldr 2 days ago

          I'm a big Chomsky nerd, Chomsky fan, and card-carrying ex Chomskyan linguist. I hate to break it to you, but not even Chomsky thought that the Chomsky hierarchy had any very interesting application to natural languages. Amongst linguists who (unlike Chomsky) are still interested in formal language classes, the general consensus these days is that the relevant class is one of the so-called 'mildly context sensitive' ones (see e.g. https://www.kornai.com/MatLing/mcsfin.pdf for an overview).

          (I suppose I have to state for the record that Chomsky's ties to Epstein are indefensible and that I'm not a fan of his on a personal level.)

bo1024 2 days ago

Is this essay from 2011?

cubefox 2 days ago

(2011)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection