Settings

Theme

Simple Explanation of LLMs

blog.oedemis.io

94 points by oedemis a year ago · 32 comments

Reader

A_D_E_P_T a year ago

It's all prediction. Wolfram has been saying this from the beginning, I think. It hasn't changed and it won't change.

But it could be argued that the human mind is fundamentally similar. That consciousness is the combination of a spatial-temporal sense with a future-oriented simulating function. Generally, instead of simulating words or tokens, the biological mind simulates physical concepts. (Needless to say, if you imagine and visualize a ball thrown through the air, you have simulated a physical and mathematical concept.) One's ability to internally form a representation of the world and one's place in it, coupled with a subjective and bounded idea of self in objective space and time, results in what is effectively a general predictive function which is capable of broad abstraction.

A large facet of what's called "intelligence" -- perhaps the largest facet -- is the strength and extensibility of the predictive function.

I really need to finish my book on this...

  • mdp2021 a year ago

    With the critical difference that predicting facts and predicting verisimility are massively different operations.

    • A_D_E_P_T a year ago

      I don't think that anybody predicts "facts" -- there are no oracles, and if you predict a physical concept, it's very easy to get things wrong. Outcomes are, in some cases, almost statistical.

      (A physical concept could be something as simple as how to catch a frisbee, or, alternatively, imagine a cat trying to predict how best to swipe at a fleeing mouse. If the mouse zigs when it could have zagged, the cat, for all its well-honed instincts, may miss. It may have predicted wrongly.)

      Predicting tokens is really quite similar. I really think that it's the same type of thing.

      Getting facts right is a matter of error correction and knowledgebase utilization, which is why "reasoning models" with error correction layers and RAG are so good.

      • mdp2021 a year ago

        > there are no oracles

        If you mean "guessing without grounds", that is exactly the phenomenon which is expressed by bad thinkers in both the carbon and the silicon realms, and that is what we are countering.

        > predict[ing] "facts"

        It's called "Science". In a broader way, it's called "intelligence" ("Intelligence is being able to predict the outcomes of an experience you never had" ~~ Prof. Patrick Winston)

        > Getting facts right is a matter of

        It is a matter of procedurally adhering to an attitude of iterative quality refinement of ideas, and LLMs seem to be dramatically bad at "procedures".

        • A_D_E_P_T a year ago

          When you say "predicting facts" you imply "predicting true future events." Delphi is no longer operational, so it simply can't be done. (At least, not past a certain -- very, very low -- complexity threshold in the macroscopic non-quantum world.)

          "Science" is coming up with, and testing, theories -- they may be true, they may be false, and you can't know, and shouldn't hold a very strong position, until you test them. It's true that a more intelligent person will come up with better hypotheses and more inventive ways to put them to the test, but that's not what you seemed to be talking about, nor are we in any disagreement on that point.

          A more intelligent cat will also catch mice more effectively -- it'll have a more accurate mental model of the mouse and of its own physical capabilities in time and space. Still, the outcome of the hunt is never perfectly predictable. Some outcomes are statistical -- and, intriguingly, LLMs mirror this in how they predict tokens.

          > LLMs seem to be dramatically bad at "procedures".

          How do you figure, and how did you reach this conclusion?

          • mdp2021 a year ago

            > When you say "predicting facts" you imply "predicting true future events"

            And Michelson and Morley did through Einstein's theory. And Jack did when he said "if my theory is correct, that falling brick will break my skull more probably than not". And it's a matter in which LLMs tend to fail, when they go "surely your operating system will have a `scratchmyback` command to allow you to work more hours sitting in front of it, it just makes sense".

            > How do you figure [that «LLMs seem to be dramatically bad at "procedures"»], and how did you reach this conclusion?

            I just tried with a main widespread engine, and it failed. And it showed that it still seemed to be guessing an output instead of actually checking to build the output (as if remembering that very often "2+2=4" instead of checking "1 and 1, and 1 and 1: 1, 2, 3, 4").

            • A_D_E_P_T a year ago

              Here's the issue: Prediction isn't only about performing experiments in science, or engineering tasks. It's an ongoing process and something that may very well be tied to our very existence as conscious observers, in that it extends our spatiotemporal sense.

              Forget Einstein for a minute. When you drive a car, you hold a mental model of your position and velocity in time and space, of the expected behaviors of other drivers, of the conditions of the road, and you continually adjust your behavior in accordance with that model. Almost anything that requires attention is something that requires us to build a mental model of the future -- and predict that future.

              So, yeah, you can hew closely to validated scientific theories and "predict" how things will happen in that sense. But, as you walk home from your meeting at the astronomical society, you stop at a crosswalk, look both ways, and you're back to making essentially probabilistic predictions about how crossing the road is going to go.

              I get the sense that you dislike them, but really LLMs are not so different. How they handle probability and prediction is different in degree, but I don't think that it's entirely different in kind.

              > And it showed that it still seemed to be guessing an output instead of actually checking to build the output (as if remembering that very often "2+2=4" instead of checking "1 and 1, and 1 and 1: 1, 2, 3, 4").

              You've never memorized your multiplication tables?

              Boss Terry Tao has a reasonably high opinion of the abilities of LLMs as mathematicians, which is remarkable -- really astounding -- considering how they're built and trained, as essentially language prediction and manipulation machines.

              • mdp2021 a year ago

                > "predict"

                I must stress that the idea of "Science predicting facts" is a consolidated formula in Philosophy of Science.

                And there has never been a doubt that prediction is probabilistic. But, see the example in in the parallel additional post about "dreaming and wake", the predicting activities of a junkie under psychedelics and that of a lucid thinker are substantially different.

                > You've never memorized

                You have the framework very very wrong: the point is not that we memorize, the point is that those LLMs don't check. When you state an idea, you are supposed to have checked it in other occasions before memorization.

                Procedural operations, of which counting is just an example, can fail in those LLMs, which means they are simulating it instead of doing it, which suggests that they «seem to be guessing an output instead of actually checking to build the output», which makes them structurally untrustworthy, unreliable - broken by design.

                Being black boxes (bad), they must be stress tested to see whether proper functioning is present or just simulated: the chief problem is not that they can't count, it is that they must be missing the roots of counting: procedural lucid thinking.

                Check the parallel submission about the detective game ("Temporal Clue")*: an algorithm that cannot fully reason with a lucid world model, solving logic puzzles, is unreliable. The probabilistic nature of the architecture in this case is below the intelligent, as opposed to the sophistication of considering less probable unexpected branches of possibilities.

                * https://news.ycombinator.com/item?id=43284420

                • A_D_E_P_T a year ago

                  > I must stress that the idea of "Science predicting facts" is a consolidated formula in Philosophy of Science.

                  Respectfully, I'd suggest that you are misinterpreting it or using the wrong terminology. Science is not a thing, it is a process: A hypothesis is a prediction about the world, which is validated or disproven via experiment. A validated hypothesis -- like Newton's physics -- is a model for how the world works, which may later be superseded by more accurate models. Newton's physics, though a great stride in our understanding of the world, is not a fact, instead it is an approximation of reality.

                  > * the predicting activities of a junkie under psychedelics and that of a lucid thinker are substantially different.*

                  There's also a substantial difference between the predicting activities of a cat and those of a man.

                  Scratch the surface, though, and the same type of thing is happening.

                  Of course LLMs don't predict things exactly as you do. But at what they were trained to do -- in much the same way a cat was "trained" by long eons to hunt mice -- they're extremely capable, and they're extensible and capable of abstraction much as humans are, and much unlike cats. It's not even clear that, in the general case, how they work is any worse than how we work. It's still early.

                  Your point, that they're structurally flawed, is noted -- but look at the average human and try to tell me that human reasoning is flawless. Human reasoning is perhaps even more unreliable. As for your detective game, how many humans, picked at random, could solve it?

                  > You have the framework very very wrong: the point is not that we memorize, the point is that those LLMs don't check.

                  Use DeepSeek R1 and try and tell me that it doesn't check. Not only does it check, it'll openly agonize over the answer it gives you. And at solving math problems for engineering purposes, it's in the 99.9th percentile of humans, if not far beyond, despite being ~1 year old. In edge cases, it's postgrad level. In the very near future, the successors of today's LLMs will be solving new theorems.

                  Reasoning models, in general, disprove what you're trying to state here. It's more costly, but they're capable of procedural thinking.

                  • mdp2021 a year ago

                    > Respectfully

                    I have titles in the discipline. I know and I am supposed to know what you wrote there well. What I was telling you is that the use of 'predict' for the nature of Science is well established; of course it is a rhetoric simplification - but language in use is. Please see (I had to return to it a few weeks ago for another discussion) the article about Imre Lakatos in the Stanford Encyclopedia of Philosophy - https://plato.stanford.edu/entries/lakatos/ .

                    > look at the average human

                    We do not look at the average human to determine a specific ability: we look at specimen that show and express that particular ability. There is a difference between John who has a keen ethical sense, Ron who does not exercise it, and Don who is a clinical psychopath with missing cerebral modules making it completely Values-blind.

                    > As for your detective game, how many humans, picked at random, could solve it?

                    And I would certainly not ask them advice. On the contrary, LLMs are there to give outputs... So,

                    > But at what they were trained to do -- in much the same way a cat was "trained" by long eons to hunt mice -- they're extremely capable

                    There may be a very great misunderstanding about what they are trained to do («predicting verisimility», telling a convincing story) and what we should expect them to do (producing outputs like those who «predict[] facts», i.e. reason subtly over a world model). In fact,

                    > structurally flawed

                    Until we know they are "_sober_", I'd exercise all care. "Sobriety" must be implemented.

                    > Use DeepSeek R1 and try and tell me that it doesn't check

                    I have used it. (I have used it before many of you: I am one who gave the alarm to this community(, ignored,) well before the stock market crash.) Those of the "detective" game used it, and it failed especially. This tells you that with LLMs we are still in the realms of the "oracular" that you supposed "dead and gone with Delphi".

                    Edit: and especially,

                    > Scratch the surface, though, and the same type of thing is happening

                    You may have "intuitions" and say that according to "your guts and best subconscious guesses", "that A is B". But if you then "believe" that intuition and consider it "final" instead of checking it, vetting it through conscious processes, to determine if it was correct and make it solid - then you are doing it wrong.

                    • A_D_E_P_T a year ago

                      Sure, and I apologize if I came across as condescending or rude.

                      You raise interesting points.

                      I'd propose a wager, but I'm not sure what the terms ought to be.

                      In general, I think that procedural thinking is a problem that is basically already cracked, and that all (or nearly all) hard problems that the 99.5th percentile human can solve, in any given domain, will be soluble by artificial intelligences in the near enough future. Five years, I think, would be a wild over-estimate. Maybe two?

                      I also think that, as a general rule, "prediction = intelligence" and that the breadth, accuracy, and extensibility of one's predictive capabilities is essentially correlated with just how intelligent one is. It doesn't matter how it happens; it can be a black box. Humans, to be sure, are black boxes. I think that scientists have been trying to simulate the nematode c.elegans brain for about two decades, and as far as I know they still haven't succeeded, despite it only having 900 neurons.

          • mdp2021 a year ago

            I'll give you another example:

            current Neural Network architectures seem to perform in a dreamlike state in which "oh in that area there should be a piece of finger this way oriented";

            humans also have a wake state module in which they count them fingers.

            These NNs seem to dream; we can be awake.

  • nurettin a year ago

    > Wolfram has been saying this from the beginning, I think.

    Wolfram has been distinguishing between probabilistic output and deterministic output from a neural network since the beginning? Trying to monopolize on such basic concepts doesn't make much sense. It's like saying he has been thinking of sporks since the beginning.

  • Delomomonl a year ago

    Besides that I don't think that the prediction thing is a bad thing, there should be an argument that depending on the architecture there can be a self discovery of rules though compression.

    The compression leads to rules which could feel like understanding.

    People say 'ah it's just a parrot repeating statically most common words' like this alone makes it unimpressive, which it doesn't. Not when an LLM responds to you like it does

    If that basic thing talks like a human, why would be a human be something different?

    Intelligence isn't that also correlated with speed of connections? At least when you do an IQ test, speed is factored in.

    • mdp2021 a year ago

      > If that basic thing talks like a human, why would be a human be something different?

      Because properly intelligent humans actually think instead of being thinking simulators, as is apparent from the quality of the LLM outputs.

      > parrot ... like this alone makes it unimpressive

      "What could possibly go wrong".

      • Delomomonl a year ago

        And you have any argument at all?

        After all the output of these LLMs is often significant better than what a lot of humans are capable

        • mdp2021 a year ago

          > And you have any argument at all?

          To state what exactly?

          > than what a lot of humans are capable

          And what is that supposed to imply?

          I suggest you read the exchange with member A_D_E_P_T just parallel, there are reasons to think it contains the requested replies.

oedemisOP a year ago

Hello, tried to explain Large Language Models with some visualizations, especially the attention mechanism.

  • itronitron a year ago

    You should probably mention that embeddings are just a renaming of text vectors, aka vector space model, which have probably been used since before neural networks.

antonkar a year ago

Here’s an interpretability idea you may find interesting:

Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.

Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to put related things close together and draw arrows between related items.

When a person "prompts" this place AI, the player themself runs from one item to another to compute the answer to the prompt.

For example, you stand near the monkey, it’s your short prompt, you see around you a lot of items and arrows towards those items, the closest item is chewing lips, so you step towards them, now your prompt is “monkey chews”, the next closest item is a banana, but there are a lot of other possibilities around, like an apple a bit farther away and an old tire far away on the horizon (monkeys rarely chew tires, so the tire is far away).

You are the time-like chooser and the language model is the space-like library, the game, the place. It’s static and safe, while you’re dynamic and dangerous.

DebtDeflation a year ago

Would love to see a similar explanation of how "reasoning" versions of LLMs are trained. I understand that OpenAI was mum about how they specifically trained o1/o3 and that people are having to reverse engineer from the DeepSeek paper which may or may not be a different approach, but would like to see a coherent explanation which is not just an regurgitation of Chain of Thought or handwavy "special reasoning tokens give the model more time to think".

  • Philpax a year ago

    This may be useful: https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1

    but the tl;dr of the idea is that we can use reinforcement learning on a strong base model (i.e. one that hasn't been fine tuned) to elicit the generation of tokens that help the model reach a result that can be verified to be correct. That is, if we have a way of verifying that a specific output is correct, the model can be trained to consistently produce tokens that will lead to that result for a given input, and that this facility generalises the more problems you train it on.

    There are some more nuances (the Interconnects article goes into that), but that's the fundamental idea of Reinforcement Learning from Verifiable Rewards.

rco8786 a year ago

I'm not sure if I would call this "simple" but I appreciated the walk through. I understood a lot of it at a high level before reading, and this helped solidify my understanding a bit more. Though it also serves to highlight just how complex LLMs actually are.

noodletheworld a year ago

While I appreciate the pictures, really at the end of the day all you have is a glossary and slightly more detailed arbitrary hand waving.

What specific architecture is used to build a basic model?

Why is that specific combination of basic building blocks used?

Why does it work when other similar ones don’t?

I generally approve of simplifications, but these LLM simplifications are too vague and broad to be useful or meaningful.

Here my challenge: take that article and write an LLM.

No?

How about an article on raytracing?

Anyone can do a raytracer in a weekend.

Why is building an LLM miles of explanation of concepts and nothing concrete you can actually build?

Where’s my “LLM in a weekend” that covers the theory and how to actually implement one?

The distinction between this and something like https://github.com/rasbt/LLMs-from-scratch is stark.

My hot take is, if you haven’t built one, you don’t actually understand how they work, you just have a kind of vague kind-of-heard of it understanding, which is not the same thing.

…maybe that’s harsh, and unfair. I’ll take it, maybe it is; but I’ve seen a lot of LLM explanations that conveniently stop before they get to the hard part of “and how do you actually do it?”, and another one? Eh.

hegx a year ago

Warning: these "fundamentals" will become obsolete faster than you can wrap your head around them.

betto a year ago

Why don't you come on my podcast to explain LLMs? I would love it.

https://www.youtube.com/@CouchX-SoftwareTechexplain-k9v

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection