Why are LLMs general learners?

intuitiveai.substack.com

64 points by pgspaintbrush 3 years ago · 63 comments

Reader

This seems like just another way of saying that when you train an LLM on a text, its weights incorporate the tokens in that text, which is nothing really profound.

I think the real magic here comes from the fact that LLMs are a specialized sort of neural network, and that neural networks are universal approximators [0]. In other words, LLMs are general learners because they are neural networks.

This is also not particularly profound, except that there are mathematical proofs of the universal approximation theorem that give us insight into why it must be so.

---

[0]: https://en.wikipedia.org/wiki/Universal_approximation_theore...

tarvaina 3 years ago

The ingredients you need for training a useful machine learning model are expressivity, learnability, and generalization. Many methods are universal approximators but that only takes care of the first ingredient. Arguably the reason neural networks are so successful is that they can offer a good balance between the three.
Before transformers we built different neural network architectures for each domain. These architectures offered better inductive biases for their respective domains and thus traded off some of the expressivity for better learnability and generalization.
Nowadays the best architectures seem to be merging towards transformers. They appear to offer more generally useful inductive biases and thus a better trade-off between the three ingredients than the earlier architectures.
im3w1l 3 years ago

A lot of universal approximators are piss poor at general learning. It's taken a lot of hard work and clever people to get LLM's to where they are. It's not as simple as neural network and done.
- Retric 3 years ago
  
  Current LLM’s are also piss poor general learners, they are however really good at learning specific things which people value highly.
  - im3w1l 3 years ago
    
    Some 15 years ago, textbooks taught that multi level perceptrons (fully connected feed forward network) with one hidden layer were sufficient because they were universal approximators. That thought kinda held back the field for a long time. Going against that dogma was so revolutionary that new paradigm was given its own name: deep learning.
    Just because you can find some gotcha counterexample LLM's struggle with doesn't invalidate that we've come a very long way.
    
    Retric 3 years ago
    
    I think that was largely a misunderstanding. 20+ years ago I took an AI class that mentioned using multiple levels was useful for training neural networks. It also mentioned a 2 layer network was only a universal approximator given arbitrarily large numbers of nodes which again seems to be forgotten about.
    Though the teacher worked in industry for a while which may have been relevant as we didn’t focus that much on theory.
    PS: Deep learning was also more about improving computational power than some major theoretical advancement.
    
    mehh 3 years ago
    
    Nah keep hearing this, was doing multilayer in 90s, the problem was my machine didn’t even have a floating point unit, had to hand roll my own fixed point math and cpu was about 100mhz
    
    ChatGTP 3 years ago
    
    What's the destination, we've come along way, and where do you think we're going?
- Paul-Craft 3 years ago
  
  Correction: hard work, clever people, and massive increases in computational power. I'm sure all three matter quite a lot here.
  I'm not saying that if your goal is to come up with a usable general learning algorithm that it is just "as simple as neural network and done." What I'm saying is the converse: that the general learning capabilities of LLMs are most likely explained by the fact that, well, they are general learners, via the universal approximation theorem.
  Your other comment, I think, suggests why we're just now starting to see more general learning capabilities out of neural networks, when the theory says that a single hidden layer is enough: with a single hidden layer, you really need to get all the weights pretty close to "right" to see general learning/universal approximator behavior. When you have more than one hidden layer, then some of your weights can be wrong, as long as the errors are corrected in later layers.
  Now, I'm not an AI researcher or even anyone who works anywhere near this area, but I did take a course or two in grad school, and this seems at least intuitively plausible to me. If there are researchers in the field reading this, I'd definitely like to hear their takes, because I'm totally open to being completely wrong here. I'd rather be one of the lucky 10,000 than just have this half-baked idea that seems right. :-)
  - butyEah 3 years ago
    
    Hardware matters most. No matter how clever there’s no storing such large parameter sets on an Intel 286 with 4MB RAM.
    No matter how clever the programmer there’s no encoding GPT4 with that. It was the hardware constraints that required programmers to be clever to begin with. These days it’s much more “copy paste the math directly because our data set is so robust and our hardware and networks so performant clever low level hacks don’t matter.”
    Especially at big tech where they’ve used their own AI to guide them; the ability to just ask an ML system to simplify math has existed for a few years now, we’ve all seen how clever outputs were set aside for safe linear hacking.
    Truly clever work is occurring in more traditional sciences like chemistry and biology these days.

dgreensp 3 years ago

LLMs are not particularly good at arithmetic, counting syllables, or recognizing haikus, though, because (contrary to the thesis of the article) they don’t magically acquire whatever ability would “simplify” predicting the next token.

I don’t feel like the points made here align with any insight about the workings of LLMs. The fact that, as a human, I “wouldn’t know where to start” when asked to add two numbers without doing any addition doesn’t apply to computers (running predictive models). They would start with statistics over lots of similar examples in the training data. It’s still remarkable LLMs do so well on these problems, while at the same time doing somewhat poorly because they can’t do arithmetic!

pgspaintbrushOP 3 years ago

Author here. First off, thank you for reading and for your thoughts. I provided examples that I thought would be intuitive for humans to help folks understand that an understanding of the underlying phenomena is useful for next token prediction (I've added this as a note). Could you share what part of the article came across as suggesting that LLMs "magically" acquire whatever ability helps them to predict? I'd like to make that section clearer, so that doesn't come across.
Re: "LLMs are not particularly good at arithmetic". There are published results that show that LLMs using certain techniques reach close to 100% accuracy on 8-digit addition: https://arxiv.org/pdf/2206.07682.pdf. There are also recent results from OpenAI where their model obtained solid results on high school math competition problems, which are harder than arithmetic: https://openai.com/research/improving-mathematical-reasoning... I haven't looked into counting syllables or recognizing haikus but I bet that this is a result of tokenization and not an inability of the model to create a representation of the underlying phenomena.
- dgreensp 3 years ago
  
  Thanks for responding to my comment.
  I'm not an expert in the field, but, there are lots of previous algorithms for predicting the next token in a series (Markov chains, autocomplete). None of them felt so much pressure to make an accurate prediction that they had no alternative but to teach themselves arithmetic! It seems what is different about LLMs (as far as the post goes) is that we can anthropomorphize them.
  More seriously, I guess I just feel like a meaningful sketch of an explanation for why algorithm X (where X is LLMs in this case) for continuing a piece of text is good at problem A should involve something about X and A. Because it is clearly highly dependent on the exact values of X and A, not just whether A can be posed as a text completion problem and humans would prefer the computer learn to solve the underlying problem to produce better text. For example, it could help to imagine a mechanism by which algorithm X could solve problem A. The closest thing to a mechanism (something algorithm X, i.e. LLMs, might be doing that's special) in the post is the talk of necessity being the mother of invention and "a deeper understanding of reality simplifies next-token prediction tasks," and the suggestion that if you were an LLM you might want to use "the rules of addition."
  It's true that modeling arithmetic in some way could help a LLM account for known arithmetic problems in the training data, which could help it on unseen arithmetic problems, but what problems an LLM can solve is a function of what it can model. Anything an LLM can't model or can't do, it just doesn't. LLMs are really bad at chess, for example. The patterns of digits in addition may be similar enough to the hierarchical patterns in language the LLM is modeling. But it's not clear if the LLM is using the "rules of addition" or not. As far as I know, we don't actually understand why LLMs are able to store so much factual information, produce such coherent stories, and do the specific things they can do.
  - galaxyLogic 3 years ago
    
    > what problems an LLM can solve is a function of what it can model.
    Well said. The model that LLM has is very simple: If text X precedes current conversation then the most likely continuation of discussion is, according to the model held by LLM, Y. Right?
    So the point is LLM does not create models. It has only a single model based on probabilities of text-sequences, created by its programmers. So it can (mostly?) only solve the problem of what would be a good textual response to an earlier text. It can do it well but most difficult problems don't fall into that category of "having a great chat".
    
    ludwik 3 years ago
    
    A lot of things that LLMs can already do reliably don't fall into the category of "having a great chat" either. Examples include retrieving data from external sources using commands (known as "plugins" in ChatGPT / Langchain) or writing working code to calculate information needed for answers or to create artifacts, such as charts.
    Yes, all of this stems from the task of continuing text. However, more and more, this is veering into the category of behavior. I don't mean "conscious behavior," but "behavior" nevertheless. It's surprising, but it is also the reality in which we currently live.
    
    flangola7 3 years ago
    
    What would be an example of a difficult problem?
  - pgspaintbrushOP 3 years ago
    
    Hmm, yea, I agree with you on several points. For one, we don't fully understand the internal mechanisms of LLMs. I'm also with you on Markov chains and autocomplete tools not having an understanding of the underlying concepts. They merely use statistical patterns in the data.
    Based on what you've said, it sounds like your take is that unless we can specify the exact mechanism by which LLMs understand, we have no business saying that they understand. In a lot of cases, this is a reasonable approach. In many areas, if someone tells you X, and you ask for a mechanism of action, and they can't produce one, you have solid grounds for thinking they're bullshitting.
    But this case isn't quite the same. We know that LLMs learn to represent their inputs in a high-dimensional vector space (embeddings) and learn the relationships between those vectors. We also see them effectively solve problems in a variety of domains using this representation. I think these two ingredients: having a semantic representation and being able to effectively solve problems amount to something like "understanding." The lack of both properties is why I'd say Markov chains and autocomplete tools don't "understand" -- they haven't learned an effective representation of the underlying phenomena. (I'd also argue this is similar to us as humans. We don't have a good understanding of the human brain or precise mechanisms of action underlying thought. All we know is we as humans have semantic representations and can effectively solve problems.)
    small note on your chess point: it now looks like chat gpt 3.5 can achieve draws against stockfish 8: https://marginalrevolution.com/marginalrevolution/2023/06/th...
    bigger note on your chess point: this example illustrates that LLMs are "semi-decidable." We thought they were bad at chess, but we just hadn't discovered the right way to prompt. More generally, we can confirm when an LLM is good at X because we feed it a prompt that produces performance in X, but given the size of the input space we're dealing with here, we can't confirm that LLMs are bad at X just because we haven't seen them do well at it. Maybe we just haven't discovered the right prompt. (These input spaces are massive, by the way. ChatGPT-3.5, for example, has a context window of 4,096 tokens, so if we were considering only the English alphabet, we're looking at more than 26^{4,096} possible inputs.)
    
    lsy 3 years ago
    
    Two points in response to this:
    I think it's a category error to call word embedding in a vector space "semantic" representation when discussing concepts like understanding. Semantics deals with the referents of words, but in this case there are no referents, merely a list of representational tokens which are defined as being "close in meaning" to the original due to proximity in text or some other structural characteristic. We call the embedding "semantic" because it is useful for human semantic purposes as we can mechanize some translations from one vector to another and receive a useful response that we then assign meaning to, but that usefulness doesn't indicate that the machine itself has any access to the referents of the tokens it's processing or semantic understanding. Put more simply, "semantics" does not merely mean the relationship between several ungrounded tokens, but that is all a vector embedding can accomplish.
    Secondly, I think in the chess thread, the prompt being "engineered" in the example is extremely complex and constrains the output space sufficiently to produce high-quality results, but you start to wonder at what point the LLM is not doing most of the work. Meanwhile deeper in the thread we learn that even this prompting is not reliable and occasionally requires giving feedback that the move was bad(!) and repetition to achieve good results "the majority of the time in less than 3 tries". You can see where the practical problem arises, if we want to rely on LLMs for answers we don't already know. Claiming that we have a "general" function that "just" requires arbitrarily varying the input over an uncountably large space until you achieve the desired result is akin to saying f(x) = rand() * x is a universal computer as long as you find the right x. The ad absurdum version of the chess example is running Stockfish, sending a prompt that contains the Stockfish move and a request to repeat it, and then claiming that the LLM draws against Stockfish. However as we have seen with tokens like "SolidGoldMagikarp", LLMs are not even yet capable of reliably implementing the identity function, so I am not sure we can even say this.
iliane5 3 years ago

> LLMs are not particularly good at arithmetic, counting syllables, or recognizing haikus
I suspect most of this is due to tokenization making it difficult to generalize these concepts.
There are some weird edge cases though, for example GPT-4 will almost always be able to add two 40 digits number but it is also almost always wrong when adding a 40 digit and 35 digit number.
- rcme 3 years ago
  
  It doesn't have anything to do with tokenization. You can define binary addition using symbols, e.g. a and b, and provide properly tokenized strings to GPT-4. GPT-4 appears to solve the arithmetic puzzles for a few bits, but quickly falls apart on larger examples.
  - iliane5 3 years ago
    
    What I was saying is that because you need to go out of your way to make sure it's tokenized properly, I wouldn't be surprised if there are enough non properly tokenized examples in the dataset.
    If that was the case, it would make it difficult to generalize these concepts.
- doctor_eval 3 years ago
  
  Could it also be that syllables are intrinsically mechanical? They are strongly related to how our mouths work. While it may be possible to extract syllables from written text - following the consonants and vowels - I'm not sure that many humans could easily count syllables without using their mouths.
  - aidenn0 3 years ago
    
    Many humans are also often really bad at doing speech related things when writing.
    I've known many native English speakers who write things like "an healthy" (because they learned to write "an" before words starting with "h") and write poems that don't rhyme because the words end with the same letters (e.g. "most" and "cost").
    
    doctor_eval 3 years ago
    
    Yeah, I find it weird how LLMs make a lot of the kind of mistakes that people do, but somehow this is held up as being a reason why LLMs don’t work similarly to brains.
    Since discovering LLMs I’ve become convinced that my brain works like them. I really don’t know the next word I’m going to say until it’s nearly out. And since learning about how LLMs work, I really can’t argue it away.
    It’s a reasonably disturbing feeling.
Terr_ 3 years ago

> LLMs are not particularly good at arithmetic
I'm reminded of "Benny's Rules", where someone sat down with a "self-directed" 6th grader of high IQ who had been doing okay in math classes... but their success so far was actually based on painstakingly constructing somewhat-lexical rules about "math", mumbo-jumbo that had been just good enough to carry them through a lot of graded tests.
> Benny believed that the fraction 5/10 = 1.5 and 400/400 = 8.00, because he believed the rule was to add the numerator and denominator and then divide by the number represented by the highest place value. Benny was consistent and confident with this rule and it led him to believe things like 4/11 = 11/4 = 1.5.
> Benny converted decimals to fractions with the inverse of his fraction-to-decimal rule. If he needed to write 0.5 as a fraction, "it will be like this ... 3/2 or 2/3 or anything as long as it comes out with the answer 5, because you're adding them" (Erlwanger, 1973, p. 50).
[0] https://blog.mathed.net/2011/07/rysk-erlwangers-bennys-conce...
8note 3 years ago

I'm surprised they're bad at predicting haikus.
I assume because there's little documentation about how many syllables every word has on the internet?
ipnon 3 years ago

Transformers don’t predict next tokens, right? They predict sequences based on their self-attention to some preceding token sequence?
- ludwik 3 years ago
  
  No, what they do is predict a single token that follows the preceding token sequence (which was indeed analyzed using self-attention). Longer output sequences are created by repeating this simple task multiple times, where the previously output tokens become part of the preceding token sequence.
Ireallyapart 3 years ago
> LLMs are not particularly good at arithmetic, counting syllables, or recognizing haikus, though, because (contrary to the thesis of the article) they don’t magically acquire whatever ability would “simplify” predicting the next token.
LLMs understand it to a certain extent. It's more then "predicting" the next token. When people ascribe "predicting the next token" it's a niave and unintelligent description to cover up what they don't understand.
I mean you can describe a human brain as simply wetware, a jumble of signals and chemical reactions that twitch muscles and react to pressure waves in the air and light. But obviously there is a higher level description of the human brain that is missing from that description.
The same thing could be said about LLMs. I can tell you this, researchers completely understand token prediction that much can be said. What we don't currently understand is the high level description. Perhaps it's not something we can understand as we've never been able to understand human consciousness at a high level either.
That's the thing with people. Nobody actually understands the high level description of a fully trained LLM. People are lambasting others because they "think" they understand when they only actually understand the low level primitives. We understand assembly, but you don't understand the Operating system written in assembly.
Take this for example:
```
     Me: 4320598340958340958340953095809348509348503480958340958304985038530495830 + 1
     chatGPT: 4320598340958340958340953095809348509348503480958340958304985038530495830 + 1 equals 4320598340958340958340953095809348509348503480958340958304985038530495831.
```
The chances of chatGPT memorizing or even predicting the next tokens here are in a probability too low to even consider. There are so many possible numbers here even numbers that aren't true but have a "higher probability" of being close to the truth from a token/edit-distance standpoint. It's safe to say, from a scientific standpoint, chatGPT in this scenario understands what it means to add 1.
Realize that this calculation results in an overflow. chatGPT needs symbolic understanding to perform the feat it did above.
But there are, of course, things it gets wrong. But again we don't truly understand what's going on here. Is it lying to us? Perhaps it can't differentiate between just a generated statistical token or a actual math equation. It's hard to say. But from the example above, by probability, we know that an aspect of true understanding and ability exists.
- BobbyJo 3 years ago
  
  I do think that LLMs have emergent properties that do some interesting things, however I would like to point out simple next token prediction would work on your example quite well.
  <numbers>0 + 1 -> <numbers>1
  Even simple attention mechanisms would handle that quite well with enough examples of <numbers>
  - doctor_eval 3 years ago
    
    I agree with you, but it also works for
    4320598340958340958340953095809348509348503480958340958304985038530999999 + 1 ? The sum of 4320598340958340958340953095809348509348503480958340958304985038530999999 and 1 is 4320598340958340958340953095809348509348503480958340958304985038531000000.
    which is more complex.
    I'm too lazy to get it to add two large numbers together.
    Also, I've never been convinced that "ability to do arithmetic" has any relationship to intelligence. We don't expect regular humans to be able to add two large numbers together reliably, either.
- oneearedrabbit 3 years ago
  
  This number is tokenized as a list: 43 20 59 83 409 58 340 9 58 340 95 30 95 809 34 850 9 34 850 34 809 58 340 9 58 30 49 850 385 30 49 58 30. If GPT recognizes the context "+ 1 equals" through the attention mechanism, it can predict that the next number in the sequence should be 31: ... 58 30 -> ... 58 31

HALtheWise 3 years ago

I don't see enough discussion of the fact that LLMs are actually trained with two losses: text prediction and a regularization loss of some sort that effectively encourages the network to use "simple" internal structure. That means the training process isn't only trying to predict the next token, it's specifically trying to find the simplest explanation that predicts the next token.

Given that the history of science is mostly driven by trying to find the simplest explanation for observed phenomenon, thinking about regularization makes it much less surprising that LLMs end up learning how the world "actually works".

mxkopy 3 years ago

> Yet, they demonstrate a crucial point: a deeper understanding of reality simplifies next-token prediction tasks.

I'm not sure LLMs are trained to simplify anything. They have billions of parameters after all.

dTal 3 years ago

They "simplify" the training data, which they are vastly smaller than. LLMs are like compression algorithms. You could imagine feeding the training data back in, letting it guess the next token, and entropy coding the residual - this would result in an excellent compression ratio. This compression performance is a direct consequence of abstract features of the dataset that it has managed to encode - knowing that the capital of France is Paris allows you to make predictions about many sentences, not just "The capital of France is...".
- mxkopy 3 years ago
  
  True, but I still think there's some fallacy here. Are we sure that models of the world (i.e. understanding) are the only way to achieve compression?

braindead_in 3 years ago

It's so mind-boggling to think that our everyday reality can be encoded as weights and biases in a giant matrix. Maybe we are just weights and biases.

freecodyx 3 years ago

the main thing about LLM's in my opinion is the tokenization part, words are already clustered and converted into numbers(vectors) it's already a big deal. we are using learned weights, the attention part feels like a brute force approach to learn how those vectors are likely used together (if you add positional encoding as an additional information).

statistics on large amount of amount of data just seems to work after all.

sanxiyn 3 years ago

This is wrong, byte-level models work fine, even if not as well as word-level models. From comparison of byte-level models and word-level models, we know tokenization part is responsible for minuscule part of performance.

courseofaction 3 years ago

Intuitively, I think this also hints at why LLMs get more prone to confusion when trained to be "safe" - the underlying representations for applying human morality in context are much more complex to learn than simpler but potentially psychopathic logic.

clarge1120 3 years ago

This sounds correct. Humans are highly fickle and contradictory when it comes to morality. Even the Golden Rule is hotly contested. LLMs lose touch with reality as they try to navigate humanity’s moral landscape. Our current solution is to align an LLM to a worldview.
The good news is that this will pit one LLM against others, and virtually eliminate any potential for a single powerful AI to emerge and do something harmful.

sandsnuggler 3 years ago

Why do people keep saying its good at math when we have no clue about the training data, and all they do is insert some examples in an unscientific way in a program we have no clue what's behind it or whether its one system or even multiple.

golemotron 3 years ago

In what world does "Quietly, quietly," have five syllables?

pgspaintbrushOP 3 years ago

Japan =) Here's the original: https://basho-yamadera.com/en/yamadera/horohoro/

IIAOPSW 3 years ago

Because the language we are teaching it is sophisticated enough to embed a Turing Machine?

kypro 3 years ago

I'm not sure I'm personally convinced LLMs are bad at arithmetic, I think they might just approach it differently to us.

Something you'll find if you ever train a neural network to learn a mathematical function is that it will only ever approximate that function. It won't try to guess what the function is exactly like a human might do.

For example consider, f(1) = 2, f(2) = 4, f(3) = 6, f(4) = 8, f(5) = 10.

As a human you know how important precision is in maths and you know generally humans like round numbers so you naturally assume that, f(x) = x2

Neural networks don't have these biases by default. They'll look for a function that gets close enough maybe something like, f(x) = x1.993929910302942223

From a neural network's perspective the loss between this answer and the actual answer is almost so trivial that it's basically irrelevant.

Then a human who likes round numbers comes along and asks the network, what's f(1,000)? To which the neural network replies, 19939.3

Then the human then goes away convinced the AI doesn't know maths, when in reality the AI basically does know maths, it just doesn't care as much about aromatic precession as the human does. Because again, to the AI 19939.3 is a perfectly acceptable answer.

So now for fun let me ask ChatGPT some arithmetic questions...

> ME

> what's 2343423 + 9988733?

> ChatGPT

> The sum of 2343423 and 9988733 is 12392156.

WRONG! It's actually 12332156. That's an entire digit out and almost 0.5% larger than the actual answer!

> ME

> what is 8379270 + 387299177?

> ChatGPT

> The sum of 8379270 and 387299177 is 395678447.

Er, okay, that was right. Bad example, let me try again.

> ME

> what is 2233322223333 + 387299177?

> ChatGPT

> The sum of 2233322223333 and 387299177 is 2233322610510.

WRONG! It's actually 2233709522510. That's 6 digits out and almost 0.02% smaller than the actual answer!

If you take a more open minded view I think it's fair to say ChatGPT basically does know arithmetic, but its reward function probably didn't prioritise arithmetic precision in the same way a decade of schooling does for us humans. For ChatGPT having a few digits wrong in an arithmetic problem is probably less important that its reply containing that sum being slightly improperly worded.

I guess what I'm saying is that I'm not sure I quite agree with the author that LLMs don't do arithmetic at all. It's not that they're trying to guess the next word without arithmetic, but more that they're not doing arithmetic the same as we humans do it. Which is may have been the point the author was making... I'm not really sure.

SkyPuncher 3 years ago

LLMs are bad at math because they don't actually understand the rules of math.
They can write code to do math, but without code they can only estimate how likely a series of numbers are to be seen together.
They're very likely to get things like 2+2=4 correct because that's probably unique and common in their training data. They're unlikely to get two random numbers correct because it doesn't actually know what those numbers mean.
- pgspaintbrushOP 3 years ago
  
  What would an LLM have to do to convince you it was good at math? Check out this recent post by OpenAI where one of their models is solving 60%+ of problems from a high school math competition dataset: https://openai.com/research/improving-mathematical-reasoning...
  - wrs 3 years ago
    
    It’s actually better at math than it is at arithmetic, and I think this discussion has been about arithmetic. I could make up something about how math is more like language than arithmetic is. I suspect the hypothesis that math tests tend to have a lot of stereotypical problem structures from a shared curriculum is also relevant. But who knows at this point?
    Anyway, to convince me it’s good at arithmetic is not complicated…just be good at arithmetic! That is do it correctly, every time, for any size number.
    
    famouswaffles 3 years ago
    
    >That is do it correctly, every time, for any size number.
    Then no human is good at arithmetic.
    
    SkyPuncher 3 years ago
    
    I suspect most people on this forum can do arithmetic for any "reasonable" size number. It might take weeks to complete, but most people on this forum can calculate large numbers by hand.
    
    famouswaffles 3 years ago
    
    Post moving. "Reasonable" is just an arbitrary line. Especially since most if not all would make some mistake somewhere along the line.
    You can greatly increase GPT's arithmetic capabilities tackling it like a problem to solve "on paper" in context. And this was done on 3.5 not 4. https://arxiv.org/abs/2211.09066
    
    8note 3 years ago
    
    If its going to take weeks, most people will get it wrong. That's a lot of calculations to never get wrong and never misinterpret some prior note you left
    
    Tainnor 3 years ago
    
    Okay, but we have since invented machines that can do arithmetic correctly, every time. When we try to do maths via an LLM, we're just throwing all of that away.
    
    famouswaffles 3 years ago
    
    So ? I didn't tell you to use GPT-4 for arithmetic over a calculator. I simply pointed out that the only standard where GPT-4 is not good at arithmetic is a standard humans wouldn't fit the bill either. Especially since zero shot "mental" arithmetic is not even close to GPT-4 at its most accurate.
    
    Tainnor 3 years ago
    
    The discussion started "what would it take to convince people that [insert favourite LLM] is good at maths", and the response to that IMHO is that we have much better tools to do arithmetic (I don't even want to say maths), even if humans themselves are also poor at arithmetic.
    What's the point of building a system to be equally bad as humans at something that we know humans are bad at? LLMs have their uses but (at least at the current stage) performing arithmetic calculations is not one of them (to say nothing of more advanced mathematics).
    
    wrs 3 years ago
    
    Fair enough, I’ll allow a 1% error rate per 10 addend digits.
    
    Tainnor 3 years ago
    
    ChatGPT, and probably GPT-4 too, is also hilariously bad at "more advanced" mathematics, including trying to come up with even slightly original proofs.
- kypro 3 years ago
  
  I think the statement that LLMs don't understand the rules of maths is far too strong. And this notion that LLMs are not able to answer a random arithmetic question "correctly" only holds if you assume "correctness" exists as binary and not a scalar.
  I'd propose that your claim that LLMs don't understand at maths is very similar to the claim that Neuton didn't understand the Laws of Motion.
  Yes – Neuton's laws are wrong, but they're also practically correct for 99.999% of applications. If correctness is viewed as a binary, Neuron is 100% wrong, but as a scalar Neuron is basically right.
  Neural networks are inherently bad at finding exact rules, but they're excellent at approximating them to an accuracy that is acceptably good, this is bit that people miss when they say LLMs can't do maths.
  When you claim they don't understand the rules of maths, I agree that they don't understand the explicit rules, but with the caveat that they probably understand something that allows them to approximate those rules "well enough".
  This is why if you ask ChatGPT a question like 23435234 + 3243423 it's not going to say -33.1. It might not give the right answer, but it will almost always give you something that's close and very plausible. So while it might not understand the exact rules, it basically understands what happens when you add two numbers and 99% of the time will give you an answer that is basically correct.
  The larger point I was trying to make here is that I think we humans are kinda biased when it comes to maths because we understand character precision which is the bias I think you're basing your reasoning on here. We humans believe precision is extremely important in the context of maths unlike other textual content. But an LLM isn't operating with that bias. It's just trying to approximate maths in a way that is correct enough in a similar way that it's trying to approximate the likely next character (or more correctly token) of other text content.
  I don't think approximations are 100% wrong and perhaps us humans being bothered about LLMs giving answers to maths questions that are 0.1% wrong actually says more about our values and how we view maths than it says about an LLMs mathematical abilities.
  - Tainnor 3 years ago
    
    You're just trying to redefine "mathematics" in order to be able to say that ChatGPT is good at it. But mathematics is about precision.
Tainnor 3 years ago

The exactness matters, though. Unless you'd like things like encryption to stop working.

Settings

Why are LLMs general learners?

Keyboard Shortcuts