Billion-Parameter Theories

109 points by seanlinehan 2 months ago · 91 comments

Reader

Two handwavey ideas upon reading this:

- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.

- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.

pash 2 months ago

> Even for billion-parameter theories, a small amount of vectors might dominate the behaviour.
We kinda-sorta already know this is true. The lottery-ticket hypothesis [0] says that every large network contains a randomly initialized small network that performs as well as the overall network, and over the past eight years or so researchers have indeed managed to find small networks inside large networks of many different architectures that demonstrate this phenomenon.
Nobody talks much about the lottery-ticket hypothesis these days because it isn’t practically useful at the moment. (With the pruning algorithms and hardware we have, pruning is more costly than just training a big network.) But the basic idea does suggest that there may be hope for interpretability, at least in the odd application here or there.
That is, the (strong) lottery-ticket hypothesis suggests that the training process is a search through a large parameter space for a small network that already (by random initialization) exhibit the overall desired network behavior; updating parameters during the training process is mostly about turning off the irrelevant parts of the network.
For some applications, one would think that the small sub-network hiding in there somewhere might be small enough to be interpretable. I won’t be surprised if some day not too far into the future scientists investigating neural networks start to identify good interpretable models of phenomena of intermediate complexity (those phenomena that are too complex to be amenable to classic scientific techniques, but simple enough that neural networks trained to exhibit the phenomena yield unusually small active sub-networks).
0. https://en.wikipedia.org/wiki/Lottery_ticket_hypothesis
- seanlinehanOP 2 months ago
  
  Super interesting, I've never heard of this before. Thanks for sharing!
_hark 2 months ago

You literally can do a kind of model PCA, using the Hessian (matrix of second derivatives of the loss function w/r/t the parameters, aka the local curvature of the loss landscape), and diagonalizing. These eigenvectors and eigenvalues (the spectrum of the Hessian) tend to be power-law distributed in just about every deep NN you can think of [1].
That is, there are a few "really important" (highly curved) dimensions in parameter space (the top eigenvectors) which control the model's performance (the loss function). Conversely, there are very many "unimportant"/low curvature dimensions in the model. There was a recent interesting paper that showed that "deleting" these low-curvature dimensions appeared to correspond to removing "memorized" information in LLMs, such that their reasoning performance was left unchanged while their ability to answer questions which require some memorized knowledge was reduced [2].
It appears that sometimes models undergo dramatic transitions from memorization to perfect generalization, which corresponds to the models becoming much more compressible [3].
I'm hopeful that we'll find a way to distill the models down to the most useful core cognitive/reasoning capabilities, and that that core will be far simpler than the current scale of LLMs. But they might need to look stuff up like we do without all that memorized world knowledge!
[1]: https://openreview.net/pdf?id=o62ZzfCEwZ
[2]: https://www.goodfire.ai/research/understanding-memorization-...
[3]: https://arxiv.org/abs/2412.09810
aldousd666 2 months ago

I don't disagree, but neither does the article. It's just talking about the fact that we previously considered anything that can't be easily and tersely written down as nearly or entirely intractable. But, as we have seen, the three body problem is not really a hum-dinger as far as the universe goes, it's not even table stakes. We need to be able to do the same kind of energy arbitrage on n-body problems that we do on 2. And now we have the beginnings of a place to toy with more complicated ideas -- since these won't fit on a blackboard.
- pixl97 2 months ago
  
  Problems with opaque stability boundaries that observe non-liner effects are always great. Chaos theory makes it even more fun as your observation can change the outcome.
simianwords 2 months ago

Maybe we can come up with smaller models that perform almost as well as the bigger ones. Could that just be pca of some kind?
Gpt nano vs gpt 5 for example.

wavemode 2 months ago

Not to sound condescending, but this reads like someone fimiliar with LLMs but very unfamiliar with statistics in general.

If we could understand economics, or poverty, or any number of other social structures, simply by cramming data into a statistical model with billions of parameters, we would've done that decades ago and these problems would already be understood.

In the real world, though, there is a phenomenon called overfitting. In other words you can perfectly model the training data but be unable to make useful predictions about new data (i.e. the future).

snarkconjecture 2 months ago

Deep neural networks can generalize well even when they're far into the overparametrized regime where classical statistical learning theory predicts overfitting. This is usually called "double descent" and there are many papers on it.
curao_d_espanto 2 months ago

> The emerging field of mechanistic interpretability suggests otherwise. Researchers are developing tools to understand how neural networks do what they do, from network ablation and selective activation to feature visualization and circuit tracing. These techniques let you study a trained model the way a biologist studies an organism, through careful experimentation and observation.
honestly, when I read that part of the article I imagined that author never studied how computers were made and where the engineering ideas came from, all technology just "popped" and here we are talking about complexity and stuff like the LLM is truly alive
- stevenhuang 2 months ago
  
  The author is not wrong. You seem unaware of how nascent the field of LLM interpretability research is.
  See this thread and article from earlier today showing what we're still able to learn from these interpretability experiments.
  https://news.ycombinator.com/item?id=47322887
phyzix5761 2 months ago

I could be wrong but I think we crossed the 1 billion parameter threshold in 2019. I'm not sure we had this ability for decades.
- jayd16 2 months ago
  
  They mean with traditional hard computation, not LLM magic.
clickety_clack 2 months ago

Really good data only goes back a couple or more decades, so any data you put in your model has only been influenced by the kinds of things we’ve seen in that time. Impact of a hot war between major powers? The gold standard? Stagflation? Invention of the car or train? Transition of major world powers to democracy or communism? All these events left almost no data compared to today, to say nothing of run-of-the-mill changes in styles of monetary policy, economic drivers or shifts in style of government.
alex-moon 2 months ago

I think this is a really important distinction to make. The OP seems to be making a fallacious equivocation on the word "parameter" - specifically, any individual "parameter" in a large ML model has no unit of measurement because it doesn't mean anything on its own. I watched a great documentary about the "Soft Hair on Black Holes" paper where they talk about having to move from the blackboard to the computer because the equation explodes into thousand of parameters - the key thing to understand being that each of those parameters represents some "real" thing, a momentum, a charge, a curvature, etc.

roughly 2 months ago

There's a lot of ink in this spent on how Poverty, Climate Change, Urban Decay, and Financial Markets are Complex Hard Complicated problems.

The problem with these is they're also problems where there are actors profiting from the failure to fix the system - the issue isn't that we don't understand the complex nature of the domain, it's that the components of the system actively and agentically resist changes to the system. George Soros called this Reflexivity - the fact that the system responds to your manipulations means you can't treat yourself and the system as separate agents, and you can't treat the system as a purely mechanistic/passive recipient of your changes. It's maybe the biggest blind spot for people who want to apply the rules and methods of physics to social issues - the universe may be indifferent, but your neighbors are not.

adamzwasserman 2 months ago

This is the strongest point in the thread. The article treats poverty, climate, and markets as though the obstacle is insufficient model capacity. But these systems contain agents with values and motivations who actively resist interventions. A billion-parameter model of a system whose components are trying to game the model will never be a theory of that system. The agents will simply route around it.
More broadly, the article assumes that scaling model capacity will eventually bridge the gap between prediction and understanding. I have pre-registered experiments on OSF.io that falsify the strong scaling hypothesis for LLMs: past a certain point, additional parameters buy you better interpolation within the training distribution without improving generalization to novel structure. This shouldn't surprise anyone. If the entire body of science has taught us anything at all, it is that regularity is only ever achieved at the price of generality. A model that fits everything predicts nothing.
The author gestures at mechanistic interpretability as the path from oracle to science. But interpretability research keeps finding that what these models learn are statistical regularities in training data, not causal structure. Exactly what you'd expect from a compression algorithm. The conflation of compression with explanation is doing a lot of quiet work in this essay.
newyankee 2 months ago

Literally countries with so much surplus land: Canada, Australia etc. have housing crisis where most of the top 10-20% of the population has become speculators in housing and openly NIMBY with no interest in supply side solutions unless forced down.
- strken 2 months ago
  
  The problem we've got is that 10-20% of the population are speculating while another 50% of the population have almost their entire net worth stuffed into their family home. We're finding it difficult to rein in the top without ruining the middle too.
  - red-iron-pine 2 months ago
    
    there are plenty of examples of the top getting reined in, and if people have their wealth in their house then don't take their houses.
    this isn't a complicated problem, and it's not difficult to build guillotines
- munchler 2 months ago
  
  If there’s surplus land, why build something unwanted in someone’s backyard? I’m a suburban NIMBY homeowner and I feel like you’re actually making my argument without realizing it. I’m all for building new houses on unused land. Can you please just do it without ruining my neighborhood? Build nice new neighborhoods and make them as dense as you’d like, but don’t try to force density on older, established neighborhoods that can’t support it.
  - xvedejas 2 months ago
    
    The empty land is not very valuable. Suburban homeowners are sitting on relatively valuable land, and it's valuable because of access to jobs and services.
    In my personal experience, adding density to established neighborhoods improves those neighborhoods' character. Sometimes it gets those afraid of change to move out, improving it even more.
  - roughly 2 months ago
    
    I'm actually curious - have you spent time in cities like Bern or Bilbao? I think urbanism's been a hard sell in the US because we don't really have a lot of great examples of it - New York's maybe the closest we've got to a European style city, but that's only in certain places and it's still a bit much. I was in Europe last year and I was surprised how calm some of the cities were - green, walkable, a lot of nice cafes and parks, good public transit, and it never really felt overwhelming the way that, say, Chicago or LA does. I grew up in the suburbs, and I felt like some of the smaller European cities delivered the suburban sales pitch better than a lot of places I've been in the US.
    (Don't take this as an attack or critique - genuine curiosity.)
- red-iron-pine 2 months ago
  
  most of the surplus land is marginally habitable, and costs increase dramatically when you get rural.
  plus "land" doesn't mean anything if you're not near the people and things you want do to, places to work, etc.
  do you want to do a 1.5 hour commute and hustle to live around Toronto, or do you want to live in Outer Nowhere, Manitoba, population 400, and where it regularly gets to -40C?
- cyanydeez 2 months ago
  
  Rentseeking is the super-capitalism in the room.
seanlinehanOP 2 months ago

Reflexivity is nodded to in the definition of complex systems in the piece!
I think what you're saying is poverty is actually simple, and the solution is to stop the bad actors causing poverty? But at the same time, you are correctly recognizing that attempts to stop bad actors from causing poverty triggers reflexive responses and cascading repercussions. Which sounds mighty like a complex system?
- pyrale 2 months ago
  
  I think you need to distinguish between complex systems, and byzantine systems. You can have complex systems where every piece shares a common goal, but feedback loops are hard. You can also have systems which, if a common goal was shared, wouldn't be that hard to understand, modelize and optimize, but where the actors of the system are not acting in good faith.
  And I agree with the above poster: often, a problem is described as "hard" as a way to make an excuse for the agents. Sure, the problem is hard. The reason why it's hard isn't some esoteric arcane complexity, it's that some of the agents aren't even trying.
- roughly 2 months ago
  
  No, I'm not saying the problem is simple, but I'm saying that in many of these cases a systematic understanding of the problem isn't what we're lacking in pursuit of fixes - the reason the problem seems so intractable is because parts of the system benefit from perpetuating the problem and take agency to ensure the problem does not get fixed.
  Poverty is one of these, but I think Climate Change is the most direct - the climate is complex, but climate change is simple: we're releasing too much carbon into the atmosphere, we have been for a century, and we've known that for at least half a century*. The issue isn't that we don't have the capacity to model or understand the problem, the issue is that powerful actors have used the leverage available to them within the system to prevent us from making changes to fix the problem.
  And, you're right, that makes the problem difficult, because the system includes those actors resisting changes to the system, but again, it's not difficult because we don't understand it, it's difficult because we're being actively resisted by people who do not want to solve the problem, and that should be acknowledged by people looking to make it an abstract mathematical modeling problem.
  * This isn't a conspiracy theory: https://en.wikipedia.org/wiki/ExxonMobil_climate_change_deni...

b450 2 months ago

Reminds me of the blog post about Waymo's "World Model". Training on real-world data results in a sufficiently rich model to start simulating novel scenarios that aren't in the training data (like the elephant wandering into the street), which in turn can feed back into training. One could imagine scientific inquiry working the same way.

It strikes me that many of these complex systems have indeterminate boundaries, and a fair amount of distortion might be baked into the choice of training data. Poverty (to take an example from this post) probably has causes at economic, psychological, ecological, physiological, historical, and political levels of description (commenters please note I didn't think too hard about this list). What data we feed into our models, and how those data are understood as operationalizations of the qualitative phenomena we care about, might matter.

delichon 2 months ago

> like the elephant wandering into the street
Or a dinosaur that looks like it might:
https://x.com/phatman_19/status/2030728278437491102
gwerbin 2 months ago

This "world model" concept has been a big deal in AI research, in LLMs.

niemandhier 2 months ago

He talks about the Santa Fe institute and how they failed to carry their findings into the real world.

They did not.

They showed that for certain problems one could not do more than figure out some invariant and scaling laws. Showing what is impossible is not failure.

For the rest: Modern gene networks and lots of biological modelling is based on their work as well as quite a few other things. That’s also not failure.

I agree that modern AI is alchemy.

MarkusQ 2 months ago

Clarke's second law:
When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
Also see Minsky's "Perceptrons"
The problem with almost all such proofs is that people (even those who know better) read them as "this can't be done" when in fact they tell you "it can't be done unless you break one of the following assumptions."
I agree that it's unfair to say they failed, but it's likewise unfair to say that their success was in telling us our limits rather than exploring what we need to do to get around the roadblocks.
- niemandhier 2 months ago
  
  A positive lyapunov exponent means that your flow has locally diverging properties.
  No matter what you do you are bound by that. As soon as you uncertainties become of the order of the systems scale you cannot predict.
  You might push from 1 lyapunov time to 3 or use ensemble methods to get probabilities, but the fundamental impossibility remains.
  Age of the scientist does not matter.
seanlinehanOP 2 months ago

True -- I didn't mean to communicate that Santa Fe was a failure writ large. Their contribution was very important!
Though I think it's fair to say that the torch was picked up and carried by others with a different set of strategies.

js8 2 months ago

I disagree with the article. I think it is always possible to come up with reasonably small theories that capture most of the given phenomena. So in a sense, you don't need complex theories in the form of large NNs (models? functions? programs?), other than for more precise prediction.

For example - global warming. It's nice to have AOGCMs that have everything and the carbon sink in them. But if you want to understand, a two layer model of atmosphere with CO2 and water vapor feedback will do a decent job, and gives similar first-order predictions.

I also don't think poverty is a complex problem, but that's a minor point.

pdonis 2 months ago

> I also don't think poverty is a complex problem, but that's a minor point.
I'm not sure it's a minor point. I don't think poverty is a "complex" problem either, as that term is used in the article, but that doesn't mean I think it fits into one of the other two categories in the article. I think it is in a fourth category that the article doesn't even consider.
For lack of a better term, I'll call that category "political". The key thing with this category of problems is that they are about fundamental conflicts of interest and values, and that's a different kind of problem from the kind the article talks about. We don't have poverty in the world because we lack accurate enough knowledge of how to create the wealth that brings people out of poverty. We have poverty in the world because there are people in positions of power all over the world who literally don't care about ending poverty, and who subvert attempts to do so--who make a living by stealing wealth instead of creating it, and don't care that that means making lots of other people poor.
- JackFr 2 months ago
  
  When all of humanity was hunting and gathering and living at subsistence levels, the was no poverty. It only shows up with wealth.
  Pretty simple.
  - DoctorOetker 2 months ago
    
    This.
    Every sedentary society has historically scared its members of the dangers of the nomadic lifestyle, heathens, ...
    The implied conclusion being that since our ancestors switched from nomadic to sedentary it must have been preferable, a kind of informal democratic collectively and individually approved choice.
    Surely sedentary must have been better, how else could such a transition have been sustained?
    Rather easy how else: its perfectly possible for average or mean life quality under sedentary lifestyle to be a net setback compared to nomadic lifestyle, since slavery can't be effectively implemented in a nomadic lifestyle, whereas the sedentary lifestyle creates both the demand for labor (routine monotonous work in the fields) and the means to enable slavery (escaping nomadic tribes under Brownian motion is much easier than escaping from a randomly assigned position deep in a larger sedentary empire, even if you escape the sedentary village, the stable neighbouring village will happily return you to "your owner" so that he would hopefully return the favor if ever he catches one of "their slaves").
    It's easy to claim a net improvement in life quality ... by discounting the loss of life quality of the slaves!
    Nomadic lifestyle was simply outcompeted by sedentary-enabled slavery!
    
    pdonis 2 months ago
    
    > even if you escape the sedentary village, the stable neighbouring village will happily return you to "your owner" so that he would hopefully return the favor if ever he catches one of "their slaves")
    Tell that to all the people who ran the Underground Railroad in the pre-Civil War US, not to mention all the other ways that Fugitive Slave laws were persistently violated.
    I think you are vastly underestimating the benefits of a modern "sedentary" society. But as I pointed out in my other post, if you really don't think they're benefits, then you can simply forgo them. Go and live an off grid subsistence lifestyle. There are people who do that. But of course they don't post on the Internet.
    
    DoctorOetker 2 months ago
    
    What makes you think they don't post on the internet?
    
    pdonis 2 months ago
    
    How is someone living a hunter-gatherer subsistence life going to get Internet access? That requires a technological society, which requires a lot of wealth creation way above a subsistence level.
    If you're saying that someone might claim they're living a hunter-gatherer subsistence life except when they're not, well, that's just hypocrisy. If you're going to make use of things that require a modern technological society, then you're saying life in a modern technological society is preferable to a hunter-gatherer subsistence life, whether you like it or not. You can't have it both ways.
    
    pdonis 2 months ago
    
    If you think a subsistence nomadic lifestyle is preferable to a modern "sedentary" one, then how are you able to post here? Subsistence nomads don't have Internet access (to name just one of umpteen things we "sedentary" moderns have access to that they don't). There are ways to live off grid if you really think it's preferable.
    
    DoctorOetker 2 months ago
    
    I am a homeless bum, I literally live under a bridge.
    
    pdonis 2 months ago
    
    Fine. And whatever device you're using to post here just happened to emerge spontaneously from the dirt, instead of being built by the efforts of thousands of people spread all over the world as part of a modern technological society.
    Also: where do you get your food? Do you grow it? Or hunt for it in a natural wilderness, untouched by technology, using tools you made yourself, without the benefit of modern technology?
    Where do you get your clothes? Do you make them yourself? Out of natural materials that would be there if our modern, technological society did not exist?
    I'm going to make a wild guess that the answers to those questions are "no"--that you are relying on sources of food and clothes that also require a modern technological society. Not to mention transportation and whatever else you need to do the things that occupy your day.
    So no, you are not living a hunter-gatherer subsistence life. You are taking advantage of the fact that it is possible in a modern technological society to be a homeless bum living under a bridge, without having to do all the things that actual hunter-gatherers living a subsistence life have had to do all through human history to survive.
munificent 2 months ago

> I think it is always possible to come up with reasonably small theories that capture most of the given phenomena.
I can write a program (call it a simulation of some artificial phenomenon) whose internal logic is arbitrarily complex. The result is irreducible: the entire byzantine program with all of its convoluted logic is the smallest possible theory to describe the phenomenon, and yet the theory is not reasonably small for any reasonable definition.
- js8 2 months ago
  
  That's true but I can still approximate what the system does with a simpler model. For example, I can split states of the system into n distinct groups, and measure transition probabilities between them.
  Thermodynamics is a classic example of a phenomenological model like that.
  - munificent 2 months ago
    
    > That's true but I can still approximate what the system does with a simpler model.
    For any strategy you might apply to do that, I can craft a program that similates a phenomenon that defies that strategy.

quinndupont 2 months ago

Summary: good scientific theories have “reach,” which is not defined in any precise way. Reach has complexity and this can be handled with large parameter neural networks. Assumptions: mechanistic and deterministic worldview; epistemological perfection is the goal (perfect knowledge of facts).

curuinor 2 months ago

Connectionist models have lots of theory by theoreticians explicitly pissed off about Chomsky's assertion that there is an inbuilt ability for language. Jay McClelland's office had a little corkboard thingy with Chomsky mockery on the side, for example. Putting forth even the implicature that the present direct descendants are intellectual descendants of Chomsky is like saying Protestants are intellectual descendants of Pope Leo X.

seanlinehanOP 2 months ago

Perhaps a failure of communication -- I was indeed attempting to say that Chomsky was wrong and his ideas were interesting, but more or less a dead end.
suddenlybananas 2 months ago

>Jay McClelland's office had a little corkboard thingy with Chomsky mockery on the side, for example.
I've never understood why the idea of linguistic nativism is so upsetting to people.
- cwmoore 2 months ago
  
  Indeed, operating human lips, teeth, tongue, and larynx is far beyond language models.
  - bbor 2 months ago
    
    Apologies if I'm stepping on a joke, but just in case: Nativism is about cognitive capacities, not sensorimotor ones. All apes could easily communicate just as well as Helen Keller, yet none of them have ever asked a question, much less written a book!
    
    cwmoore 2 months ago
    
    No joke. Same sensorimotor neurons in the human speech apparatus have cognitive analogues, developed together over vast expanses of history.
  - pixl97 2 months ago
    
    Give language models 500 million years and lets revisit this. One of the reasons robots are harder to reach parity than higher intelligence, evolution has been cooking it a long time.
- bbor 2 months ago
  
  Well that anecdote is referencing the Scruffies v. Neat war[1], within which the nativism debate was merely a somewhat-archaic undercurrent.
  IMHO, a lot of the more specifically anti-nativist sentiments of today are based in linguistics itself rather than philosophy, CS, or CogSci, where again it is part of a broader (and much dumber) debate: whether linguistics is the empirical study of languages or the theoretical study of language itself. People get really nasty when they're told that they work in an offshoot field for some reason, which is why I blame them for the ever-too-common misunderstandings of Chomsky -- the most common being "Universal Grammar has been disproven because babies don't speak English in the womb".
  If Chomsky weren't so obviously right, this would be a worrying development! Luckily I expect it to be little more than a footnote in history, so it's merely infuriating rather than depressing.
  [1] Minsky, 1991: https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...

lkm0 2 months ago

It's an optimistic point of view. Still, when people use large neural nets to model physics, they also have a lot of parameters but they replicate very simple laws. So there's something deeper about this. Something like a simulation of theory.

pixl97 2 months ago

The deeper may just be the uncertain nature of quantum physics. That is any complex system must be built from redundant and repeatable actions, and/or have a self correction mechanism to fix itself if a bit happens to flip out of the universe. This leads to the evolutionary weeding out of indivisible complex systems as the system gains more components its improbable that a load bearing structure in that system will not fail.
Hence every system we get to see in nature is built from smaller components that generate complexity via repetition.
Our computers don't escape from this either. As the components get smaller you end up with your charge probability field outside of your component traces.

dakiol 2 months ago

> You could capture the behavior of every falling object on Earth in three variables and describe the relationship between matter and energy in five characters.

What we can do is to approximate. Newton had a good approximation some time ago about gravitation (force equals a constant times two masses divided by distance squared. Super readable indeed) But nowadays there's a better one that doesn't look like Newton's theory (Einstein's field equations which look compact but nothing like Newton's). So, what if in a 1000 years we have yet a better approximation to gravity in the universe but it's encoded in millions of variables? (perhaps in the form of a neural network of some futuristic AI model?)

My point is: whatever we know about the universe now doesn't necessarily mean that it has "captured" the underlaying essence of the universe. We approximate. Approximations are useful and handy and will move humanity forward, but let's not forget that "approximations != truth"

If we ever discover the underlaying "truth" of the universe, we would look back and confidently say "Newton was wrong". But I don't think we will ever discover such a thing, thereore sure approximations are our "truth" but sometimes people forget.

bee_rider 2 months ago

Einstein’s equations look like Newton’s in the limit. It would be a little weird if we ended up having to add millions of additional parameters over the next thousand years. At the current rate we seem to get multiple years per parameter, rather than hundreds of parameters per year, right?
b450 2 months ago

This kind of view tends to logically conclude in the idea of a noumenal, unknowable reality. I think it's more reasonable to say that truth itself is gold star we award to descriptions that suit our purposes. After all, descriptions are necessarily approximations (or reductive or "compressions"), since the only model of a thing with 100% fidelity is... the thing itself.
seanlinehanOP 2 months ago

Agreed!

ileonichwiesz 2 months ago

This might be an unkind reading, but to me this just sounds like an attempt to reinvent the very same kind of mysticism that it mentions in the first paragraph.

“No need to study the world around you and wonder about its rules, peasant - it’s far beyond your understanding! Only ~the gods~ computers can ever know the truth!”

I shudder to think about a future where people give up on working to understand complex systems because it’s hard and a machine can do it better, so why bother.

galaxyLogic 2 months ago

Mark Cubain had a good line, I don't know if he came up with it or who, but he reportedly said:
" There are 2 types of people using AI: Those who use it so they can know everything, and those who use it so they don't have to know anything. " :-
- empath75 2 months ago
  
  I think probably the sweet spot is using them so you can focus on knowing only the things you care about or need to know about about.
  - galaxyLogic 2 months ago
    
    True, but I think the observation is spot on. There are people who want to know things, whether with some help from AI or by other means. Then there are people who prefer ignorance.
    Personally I take great comfort from the fact that I no longer, to a large degree, face the dilemma of "Who should I ask about this?"
seanlinehanOP 2 months ago

Not the intention at all. The part about mechanistic interpretability was meant to gesture at how building such systems can provide new tool kit for building further intuition and understanding.
lobofta 2 months ago

Might we ever distinguish what is complex and complicated? Probably not, but I guess the author argues that this gives us a way forward because we can try to distill large models.

rbanffy 2 months ago

If we think of spacetime as some sort of cellular automaton, where each state of a given point is a function (with some randomness, because God likes to throw dice) of previous states of the surrounding points, if the rules for a new state generation are extremely complex, there will be some significant overhead in dimensions we don't see, because the rules need to be somehow represented outside the observable reality. Another issue with this idea is that while the rules might be "outside", the parameters themselves have to be somehow encoded in the state of a cell, and can't propagate faster than light, or one cell (an indivisible unit of space) per indivisible unit of time), which limits the number of parameters accessible to any given cell to the ones immediately surrounding it.

Disclaimer: I hope it's obvious, but I'm no physicist. This is just how I would build a universe.

mistivia 2 months ago

> The deepest truths fit on a napkin.

If you have really done physics or engineering, you would never believe this. Simple and elegant formulas usually can only solve the "spherical chicken in a perfect vacuum" kind of problems. The real world is incredibly messy. Beneath those clean and beautiful-looking partial differential equations lies a mathematical nightmare. And these equations often only hold at certain scales or rely on extremely strict boundary conditions.

zkmon 2 months ago

> It's remarkable how much of reality turned out to be modelable by theories that fit in a few symbols.

The admiration for "remarkable" things puts humanity on a dangerous path that is disconnected from the real goals of human progress as a species. You don't need any of this compression of knowledge or truths. Folklore tales about celestial bodies are fine and hood enough. The vulgar pursuit for knowledge is paving the way for extinction of humans as biological creatures.

pixl97 2 months ago

Right, dinosaurs were perfectly fine, their ignorance worked out well for them.
The universe is uncaring, simply not giving a shit if you have knowledge or not. Knowledge gives you the ability to survive minor conniption fits of cosmic magnitude, and at the same time gives you a gun to shoot your own foot off.
There ain't no such thing as a free lunch.
- zkmon 2 months ago
  
  So you think your tech can help you survive the event that made dinosaurs go bust.
  - pixl97 2 months ago
    
    Yep, given enough forewarning with current technology asteroid redirection isn't out of the question.
    Then again, we'll likely get ourselves with global warming first.

brunohaid 2 months ago

Very skeptical Adam Curtis hat on while reading this, but it is quite well written. Thanks & kudos!

us-merul 2 months ago

I think this also creates a vulnerability where, the more time and effort is spent to craft the “correct” solution, it becomes easier to dismiss topics out of hand. Even if our modeling tools have changed, emotions and the human mind have not.

jjk166 2 months ago

> And the epistemology shifts in ways that might be uncomfortable. Instead of "I understand the causal mechanism and can predict what happens if I change X," you get something more like "I have a sufficiently rich model that I can simulate what happens if I change X, with probabilistic confidence." The answers are distributions, not deterministic outputs. That's a different kind of knowing.

Being able to simulate something is not a kind of knowing. It is, in fact, the opposite of knowing. If you know how a system behaves, there is no need to simulate it. In particular, if the model you need to simulate it is way more complicated then the phenomenon itself, you really really don't understand it.

I'm reminded of Feynman's observation that to simulate a quantum system, like an atom, with classical methods requires a tremendous number of atoms, and his intuition that there should be a much smaller way to perform such calculations. This is the conceptual underpinning of quantum computation.

A billion parameter neural network may work as a functional tool, but the fact is these supposedly complex problems simply don't have billions of relevant free parameters. You're not going to understand a hurricane by feeding terabytes of data to find the butterfly that flapped its wings in just the wrong way at just the wrong time. Sure extremely small differences in starting conditions can have lead to radically different outcomes, and a butterfly flapping its wings could have influenced a hurricane in some way. But if you understand how hurricanes work, you know that butterfly's influence is just noise - the hurricane started and progresses as it does because of temperature gradients on the ocean. If you found and stopped the butterfly from flapping its wings, the conditions for the hurricane would still exist and something else would set it in motion.

Billion parameter theories work in practice because if you throw everything at the wall, the small amount of stuff that can stick will. Likewise if you throw enough data at a problem, whatever data is actually relevant will be analyzed. This can be useful as a stepping stone to understanding, interrogating the model to reveal which parameters have more relevance and the wights of their interactions. But the idea that because you have a tool that addressed a symptom of your ignorance means you are no longer ignorant is folly.

BobbyTables2 2 months ago

I think “Hitchhikers’ Guide to the Galaxy” passage talking about the train crashes from a broken clock was extremely prescient.
I feel like enormous models will end up this way…

ihumanable 2 months ago

In the author's own analogy of blacksmithing and metallurgy, I see an interesting parallel.

Humans worked metal for a long time and you can make better and better forges without knowing the metallurgy of why the result is better. If I make the fire hotter the metal comes out better, and I can get to work making forges that produce hotter and hotter fire.

LLMs could in this analogy be the forge. We can make them bigger and bigger and get better and better answers out, in the same way a pre-metallurgy human could make their forges hotter and hotter and get better and better metal out.

But the hottest forge doesn't mean you get metallurgy.

ashton314 2 months ago

The core of this little essay seems to be this:

Instead of "I understand the causal mechanism and can predict what happens if I change X," you get something more like "I have a sufficiently rich model that I can simulate what happens if I change X, with probabilistic confidence." The answers are distributions, not deterministic outputs. That's a different kind of knowing.

At the beginning this sounded like, "hard problems are complex, machine learning can help us manage complexity, therefore we will be able to solve hard problems with machine learning", which betrays a shallowness of understanding. I think what this essay argues here is a little deeper than that trite tech-bro hype meme.

But I disagree with this conclusion: I don't know that we can begin to build these models to begin with or that our new LLM/transformer-powered tools can help solve these problems. If simulation were the answer to everything, why will new ML tools make a significant difference in ways that existing simulation tools do not?

Stuff like AlphaFold is amazing—I'm not saying that better medical results won't come about from ML—but I feel like there's some substance missing and that even this level of excitement that the author expresses here needs more and better backing.

bbor 2 months ago

  There's a parallel in linguistics. Chomsky showed that all human languages share deep recursive structure. True, and essentially irrelevant to the language modeling that actually learned to do something with language.

...this is so absurdly and blatantly wrong that it's hard to move past. Has the author ever heard of programming languages??

bigbuppo 2 months ago

Maybe I missed the point, but this read like Big Think Thought Leadership that would make a good TED talk but not much else. I'll just put it on the big pile over there.

meltyness 2 months ago

... but then why not a model model to perform that outer analysis and overcome the representations shortcomings of an encoder network?

gnarlouse 2 months ago

AI slop DNR

xikrib 2 months ago

Let's gather authors of 15 different world languages together in a room and see if they can collaboratively write a short story. Surely their inability to do so will prove their inadequacy in their native language. /s

Simplicity brings us closer to truth — Occam's razor has underpinned the development of our species for centuries. It's enterprise, empire, and capital that feed off of complexity.

We're entering a period of human history where engineers and businesspeople drive academic discourse, rather than scientists or philosophers. The result is intellectual chicken scratch like this article.

gozucito 2 months ago

>Simplicity brings us closer to truth — Occam's razor has underpinned the development of our species for centuries.
I keep thinking of emergent complexity. Even starting with very simple rules and components, the amount of complexity that arises as a consequence of ever rising interactions can boggle the mind and seems to validate our current predilection for elegant and succinct laws of physics to be enough to model the universe.
Coincidentally, LLMs being so good at coding that it became the #1 source of income for Anthropic is one such example of emerging complexity from deceptively simple ingredients:
A giant pile of matrix multiplies, next-token prediction, and enough data somehow climbs the ladder from autocomplete to writing code well enough that people will pay $20-200/month per seat for it. It is completely bonkers.

Settings

Billion-Parameter Theories

Keyboard Shortcuts