Situational Sanity - Steelmen for Slow AI Adoption

11 min read Original article ↗

As someone deep into AI research and twitter, I often get pretty existential about how AI progress will affect our lives in the short to medium term. This blog is a rough attempt at me putting across two of the best arguments (steelmen) I can think of for why things aren’t actually going to change as quickly as I and lots of AI researchers in labs might think.

To be clear, AI is changing the world, what I’m discussing here is the often accepted idea in AI research circles that AI will lead to very rapid societal and economic shifts in the next 5-10 years, most significantly as and when we start to reach human level general intelligence (AGI) - particularly in knowledge work.

AI progress to a significant degree relies on evals and data, we’ve reached a point where the general evals and the general text data are becoming increasingly exhausted. So what’s left still to tackle before we see AI rapidly take over all of human knowledge and digital work?

The answer is: all the outputs and niche skills that sustain the billions of small barely recorded processes and tasks, developed in some cases over 100s of years, running in millions of private companies, institutions, and communities, all interacting in a largely decentralised way through the capitalist market system. Only a tiny fraction of which make it into the training datasets of AI models.

Therefore, one steelman argument I find somewhat convincing is that going from trying to get machines to do well at maths tests, answering individual questions (even with deep research), using some tools, and doing multiple choice exams - to general machines that can reliably automate these real world outputs is very much not a linear progression in difficulty.

An analogy in software would be going from asking a team of top engineers to build a demo LLM chatbot website, to asking them to build a genuine alternative to CUDA. In one case they could do it in a few hours (or a few minutes with AI) but they would laugh you out of the room when you proposed the second. While both look superficially like similar skills (coding) in a similar domain (AI), the difference in difficulty is many orders of magnitude. For this reason, Nvidia’s CUDA is often called a ‘swamp’, and not just any swamp. It’s a swamp that’s worth trillions of dollars because it provides a messy complicated impenetrable barrier to would be competitors trying to sell GPUs to do the same thing.

Worryingly for us AI researchers, the web of human processes required for any complex real world knowledge output (a legal case, investment decision, product design, etc.) looks quite a lot like a CUDA, these economic processes are messy organic highly optimised and specialised systems iteratively developed over many years. Except in the case of knowledge work things are even worse, it's also as if you told your engineers that they aren’t allowed to use CUDA to find out the functionality (companies don’t generally share their trade secrets), Nvidia had never put any documentation online, the person in Nvidia who wrote the code left 5 years ago, and was then unfortunately hit by a bus.

To add some insult, it also isn’t just one nice standalone swampy CUDA for any one knowledge output, you often also need to deal with the potentially thousands of other CUDA like intermediary outputs (finance, legal, regulators, etc.) that your process depends on, all of which have undocumented dependencies with one another.

So if replicating CUDA is like trying to cross a swamp, genuinely replicating in an end to end automated way a human process involved in creating knowledge outputs, is like trying to cross a swamp blindfolded in a hurricane with both arms tied behind your back.

From this perspective, solving knowledge work doesn't look like a few impressive releases from a top AI lab that suddenly change the world, but rather an incredibly long and arduous endeavour of millions of engineers bit by bit draining small ponds across the massive unmapped swamp plain and we call the knowledge economy.

Now there’s always a chance that these processes are only so complicated and swampy because we didn’t have machines that could help, and we had to rely on messy limited humans. We need to recognise though then that in addition to the bet that we will get AGI, there is an additional implicit bet many people in AI are making when they see rapid economic change in knowledge work. This bet is that there is some hidden bridge over this swamp that only machines can take. Some commonality in knowledge work that capitalist human systems haven’t been able to exploit. This might be true… but it also might well not be true.

So when people rather grandly say their AGI is going to automate knowledge work in 5-10 years, maybe politely suggest that they should just focus on one (relatively) tiny piece of knowledge work (CUDA) in one (relatively) tiny and tractable domain (GPU accelerators) in the next say 4 years, and make do with a measly few trillion of dollars of net worth.

This line of argument I think also explains a lot of the disconnect between economists who see AI as a small bump up in long run GDP growth (gradually draining the swamp), and AI researchers worrying it will have overtaken everything by 2030 (one release from an AI lab). If scaling raw general intelligence (which we seem to be able to do) naturally leads us to find easy bridges over this mess of human activity then the economists might be wrong. Yet intelligence isn’t the only input required for real world processes, economists think in terms of limiting factors and marginal gains, and if intelligence isn’t the primary factor that constrains these processes, then incredible AI advances might be disappointingly boring in the real world. Remember Germany still uses fax machines.

Lots of AI researchers point to the human brain as the existence proof for general intelligence and take inspiration from it - for everything from the structure of neural nets to reinforcement learning algorithms. While these analogies are often tenuous, they give us some theoretical basis for believing our current approaches could scale to AGI or at least that AGI is within the set of possibilities.

Fewer AI researchers like to note that the existence proof for learning general intelligence is evolution not backpropagation. This is not just unfortunate because we lack a nice analogy. The reason this should linger at the back of all ML researchers minds who currently ‘feel the AGI’ (like myself), is that the evolutionary process required an entirely different scale of compute to any pre-training run on the foreseeable horizon. Whilst it’s difficult to calculate the compute required to simulate all the trillions of organisms over billions of years that preceded human general intelligence - we are very plausibly talking 10s of orders of magnitude more compute than the largest pre-training runs today (and would be comically larger if we included properly simulating the environment and multi-agent interactions).

Now the obvious answer that assuages the ML researcher’s fears is that evolution just couldn’t do large scale unsupervised backpropagation… because there wasn’t any data. In other words, the default position is that we can safely ignore the scale of compute that was required for v1 (evolution) as a staggeringly inefficient but necessary way to get to the first general intelligence. What makes our AI researcher even more chill is that current AI capabilities suggest that for general text and image based tasks, this position seems demonstrably true.

Putting my steel hat on though, I think writing off the scale of compute implied by evolution could be slightly premature. The reason for this comes back to Moravec's paradox and the swampy nature of human activity discussed previously.

Moravec’s paradox observes that what humans find hard is easy to teach machines, and what humans find easy is hard to teach machines. Some of the most common items cited on the list of things humans find easy but machines find hard are tasks in the physical world, like motor control, reflexes, dynamic audio visual perception etc. It’s this paradox that underpins the weird time we live in where currently an AI would trivially beat any human chess player but would fail miserably to help tidy up the tables and chairs after the match.

However, when discussing this paradox, a much less commonly mentioned item in the human trivial - machine hard column (potentially because we find it so natural) is just interacting with one another, delegating tasks, collaborating, competing, and the basic long horizon planning necessary for living. This means that whilst it might sound stupid, it remains possible that one of the most difficult things a top NVIDIA computer scientist does, isn’t writing an optimised CUDA kernel, it’s deciding whether Jeff or Charlotte from devops should be invited to the next meeting.

Now this wouldn’t be such a problem for AI adoption if these quasi social tasks weren’t that important for getting things done. But as discussed in regards to the swamp, these tasks are worryingly intrinsic to the vast majority of human economic activity, particularly in knowledge work. They are also intrinsic to the operation of capitalism and the incredible power of the market economy - the invisible hand pushes humans, it may or may not push AI in the same way.

Compounding this problem is that the field of AI has recently pretty rapidly dropped many of the touted claims about how far our most efficient training methods will get us. We have switched from ‘scaling laws predictably hold so pre-training is all you need to get AGI’ to the rather less convincing ‘scaling RL hopefully will be enough but maybe we’ll also need to scale pre-training or have other architectures, let’s find out’. The thing is, our touted saviour RL, whilst still backpropagation, looks an awful lot more like evolution than an AI researcher might like. You have ‘rollouts’ which are basically lives. You have success or failure, which is very analogous to survival. Then you have updates between rollouts, which if you squint can take the place of mutations. Even more concerningly, RL is also notoriously inefficient compared to the other training methods, it’s much harder to steer in the right direction, and often comes up with unintended solutions… which is well… exactly like evolution.

The fact the way we train models is starting to look a lot more like evolution (RL) also touches on another angle of the paradox. Namely, that there seems to be a reasonably high correlation between things that evolution iterated on for a really really long time (motor control, multi agent interaction, planning - i.e the machine hard column) and the things that we are now having to start to rely on RL for. While, things that evolution spent actually relatively little time iterating on (symbolic reasoning, written language, coding - i.e the machine easy column) generally seem to be the things that supervised and unsupervised learning have solved.

As of today then, to get to AGI we are increasingly faced with tasks that seem to be on the wrong side of Moravec's paradox and also at the same time are having to resort to training methods that both look a lot more like evolution and are much less efficient. So when you hear an AI researcher say we ‘just need to do long horizon multi-agent RL’… the ‘just’ part of that statement could be doing an incredibly large amount of heavy lifting. Even if RL is still 1-5 orders of magnitude better at learning these long horizon social interaction and planning skills than evolution was… you might ‘just’ be talking about 10+ of orders of magnitude more compute that anyone has access to today.

Viewed from this perspective, the bet that AI researchers aren’t merely going to match the efficiency of our existence proof for learning general intelligence but instead will be better by numerous orders of magnitude starts to look like not quite such a certain bet.

The thing that helps me sleep night then, is that despite all the stunning progress there still seems to be a credible worst case that the capabilities left to reach AGI (e.g long horizon multi agent interaction) might require far more compute than we have today or will have in the foreseeable future. Plus, even if we continue to be wildly more efficient than evolution and do get to a AGI soon, it may well still take decades to adopt AGI within the swamp we inhabit because most of the human activity that matters isn’t ‘general’, it’s incredibly specialised, undocumented, and context specific.

Discussion about this post

Ready for more?