The robots are coming

18 min read Original article ↗

Around this time last year I wrote “An Ode to Physical AI”.

I was convinced that while the world was mesmerized by parlor tricks performed by large “vision-language” models, most people were not paying enough attention to an even more transformational inflection point for “Physical AI” which deals in atoms & molecules, rather than staying confined to the world of bits. I’m talking about AI models whose I/O ports are directly plugged into physical sensors and actuators.

Now almost a year has gone by, and two things have changed:

  1. I’m even more convinced that Physical AI — specifically the field of robotics — is accelerating. I’ve come to believe that the rise of robotics is going to be one of the most impactful, global mega-trends of this century.

  2. The world appears to be waking up to this phenomenon.

Hence, I’ve decided to reprise the robotics section of my prior “Ode”.

(So, this is a partial repost with substantial updates.)

tl;dr — The robots are coming. Get ready.

Humans are still a material species, living in a material world, which means that our biggest problems — from security, to climate change, to disease — are still mostly material in nature. These problems can only be addressed obliquely by hordes of digital AI agents confined to a computer screen.

This is especially true in the case of climate and energy. In a recent post, I shared five reasons why the modern energy sector is such difficult terrain for anyone attempting to build a revolutionary technology company. All five stem from the fact that the industry is defined by big, physical systems, and the global flow of commodity materials.

There’s only so much that an LLM can do to overcome these barriers, no matter how many PhD level questions it can answer.

In fact, this principle extends well beyond energy. Across the economy, the biggest bottlenecks to growth are generally not PhD level bottlenecks. The digital revolution of the past 25 years has already made anyone working primarily with data & ideas — i.e. “knowledge workers” — tremendously more productive. As discussed above, LLMs are continuing this trend. During that same period, however, labor productivity in many “blue collar” fields has stagnated. In manufacturing and construction — which are the foundations of our ability to shape the physical world — labor productivity in the United States has been flat to declining for the past two decades.3

Source: US Census, Total Factor Productivity and Related Measures, Major Sectors

Compounding this problem is a related macro trend: demographic change. This trend has been an easy one to see coming, but has proven extremely difficult to counter. Demographics are now a challenge for essentially all technologically advanced, wealthy nations.

The last time the US embarked on a major national infrastructure spree, following World War II, the nation had an ideal demographic profile for such an endeavor. There was a bumper crop of children — courtesy of the famous “baby boom” — which produced a steady supply of prime working age adults for the next few decades. But today, and our demographic profile is decidedly less favorable for a campaign of heavy lifting. Our pool of young people with strong backs is much shallower; and new restrictions on immigration are set to shrink the pool even further.

We’re already beginning to see these macro trends play out at a micro level in many individual sectors. Take solar power, for example. Back in 2022, the consulting firm McKinsey surveyed a group of project developers and utilities setting out to build large solar power portfolios, and found that this group’s top concern — even more so than inflation, or clogged interconnection queues — was the availability of labor.

Source: “Build together: Rethinking solar project delivery”, McKinsey & Company, June 2023

Meanwhile, the US Bureau of Labor Statistics has projected that solar installers and wind turbine service technicians are going to be two of the fastest growing job categories for the next decade. Although these labor forces are still relatively small today, the prospect of 40-50% compound annual growth rates has got to be keeping hiring managers awake at night.

And the problem is not just localized to the renewable energy sector, which is still relatively new. Pretty much all of the established trades that are crucial for “The Energy Transition” are in extremely high demand, and growing. Electricians, HVAC technicians, utility line workers — these roles are starting from a much larger base of workers, and still projected to grow at 7-9% per year. That means more than doubling the number of workers in these fields within ten years.

US Bureau of Labor Statistics

All this is to say: I can only see one plausible option for addressing our society’s existential energy needs, and remaining globally competitive: We Americans (and Europeans, and Japanese…) need to take the next leap forward in physical labor productivity. That means we’re also going to need to embed AI in machines capable of interacting directly with the physical world.

Yes, I’m talking about ROBOTS.

So, forget all those vague proclamations that “AI is coming…” Robots are really coming — in fact, millions of them are already here — and it doesn’t require much imagination to foresee their revolutionary impact on society.

One way to understand the state of robotics is to picture a spectrum. This spectrum spans the full range of physical tasks which a robotic system might be called on to perform.

At one end of the spectrum are the easiest forms of labor to automate: discrete, repetitive tasks with minimal variability and limited manual dexterity requirements. These are tasks which can be fully programmed as a series of concrete steps: Picture a robotic arm in a factory, which spends its days picking up uniform metal parts from the end of a conveyor belt, rotating them, and then stacking them in a pile.

In fact, picture these robotic arms. This is what practically all of the robots performing real work in the economy today actually look like:

Globally, there are roughly five million robots like this in service. They operate mostly in the automotive, electronics, and metal fabrication sectors. Sales have roughly quintupled since 2010, while the average price has fallen by about 50% to less than $20,000 an arm! If you’re a tinkerer, and you want to mess around with a robotic arm at home, you can easily find dozens of much cheaper, albeit less robust versions on Amazon.

Of course, as in many other industries, China is the biggest player in this sector. China has become the largest buyer of industrial robots, by far, and has nurtured a domestic industry which is now challenging even the top suppliers in Europe, Japan, and Korea. (For more on this point, check out “A tale of two energy superpowers”.)

Meanwhile, at the other end of The Spectrum, we find the most difficult forms of human labor to automate. The jobs to be done in this domain typically share a number of common attributes:

  1. They tend to be clusters of tasks that are nearly impossible to disaggregate — which means that they tend to be performed by a single human worker or a group of workers in concert.

  2. They’re extremely variable. No two situations will be exactly alike.

  3. They sometimes involve interaction with the general population, not just specialized robot handlers.

  4. They may also involve very fine levels of object manipulation — which has long been a notoriously difficult problem in field of robotics. In the words of the distinguished roboticist Brad Porter:

“The challenge is that the space of ideas, semantic concepts, words… is actually constrained. There are fewer words in the English language than there are valid positions of just a single hand. The space of physical interactions in the world is unbounded.”

My colleague Anil Achyuta and I were inspired to run some numbers on this:

The human hand has 27 joints. Because the physical world is continuous, not discrete, each of those joints could theoretically be positioned in an infinite number of ways. Yet even if we simplify the world by collapsing an infinite set of positions into just ten distinct positions per joint, we still end up with 10^27 unique hand arrangements.

Compare that number (an “octillion”) to the number of words in the Oxford English dictionary. The OED contains about 500,000 words, but we use far fewer in regular conversation.

There’s more. According to the National Institutes of Health, the human hand also has about 17,000 touch receptors, which works out to approximately 40 receptors per square centimeter. Additionally, our hands have internal nerve clusters which contribute to “proprioception” (one of my favorite uncommonly used words in the OED) — which is the sensation of how our bodies are positioned. I’m not even going to try to calculate the number of distinct sensory combinations that we can perceive.

Try making an affordable robotic hand with 40 touch receptors per square centimeter.

This is why my mental model for the kind of job at the furthest end of the robotics spectrum is a plumber.

Consider the variability that a typical plumber encounters on a typical household job site. For starters, he probably meets an upset customer (another human) with a poorly defined problem — e.g. “My ceiling is leaking, and I’m not quite sure where the water is coming from”. Then he enters a messy work environment — perhaps under a sink, or in a basement crawl space — and encounters a messy system, with components installed over the course of multiple decades. He typically needs to diagnose the problem with just a few observations. Then, finally, he needs to get to work on a solution, which probably requires a combination of agility, brute strength, and fine motor skills. For example, the plumber may need to snake his arm into an awkward space — while holding a wrench — in order to gently tighten a valve by just the right amount.

Hence, we’re still very, very far away from replacing human plumbers with robots.

However, in just the past five years, we have made a lot of progress towards that end of The Spectrum. The frontier of robotics has been propelled forward dramatically by a number of interrelated trends.

On the robotics spectrum, the next big step past assembly lines has been warehouses. Warehouses are also relatively controlled environments, but much less so than factories. They tend to require greater mobility, and superior agility in order to navigate changing circumstances and unplanned interactions with humans. For some warehouse tasks — e.g “picking & placing” goods — robots require refined grasping, which takes us back to the established challenge of building dextrous robotic hands. Nevertheless, in the past decade we’ve seen warehouse robots grow from an experiment into a cornerstone of the industry.

Take Amazon, for example. The company embarked on a robotics program beginning in 2012 with the acquisition of a warehouse automation startup, Kiva Systems. Now, Amazon may soon have more discrete robot workers than human employees; and while the number of robots is rising, the number of human employees has begun to decline.

Source: Multiple news sources including public statements and filings by Amazon.
Source: Wall St Journal, “Amazon Is on the Cusp of Using More Robots Than Humans in Its Warehouses”, June 2025.

At Energy Impact Partners, we’ve invested in an emerging player in this sector, RobustAI, which was founded by two robotics veterans focused on how robots can best augment human labor. Their product, “Carter”, appears to be just another humble warehouse cart, but this thing is actually an elegantly designed “cobot” built for collaboration with human workers.

By “drones”, I mean those small, buzzing, electric, multi-rotor vehicles which you’ve probably been annoyed by in a park somewhere. You know the type — they’ve become mainstays of wedding photography. Through the 2010s, this particular form factor became an extremely popular one because: A) It flies, B) It’s versatile, and C) It’s cheap — or rather, it has become cheap as Chinese manufacturers have turned their attention to the sector.

At first, these things were flown by remote control, but businesses quickly began to identify use cases which would benefit from some degree of autonomy. And so, drones became a kind of robotics playground. In the energy sector, for example, drones now serve as efficient tools for inspecting big, tall, and remote infrastructure — e.g. wind turbines, solar farms, and transmission towers — with help from machine vision software like GridVision, which was developed by one of our portfolio companies at EIP. Now, drones can even help construct and maintain these types of assets. For example, another portfolio company, Infravision, has developed a drone-based system for installing transmission lines.

Source: Infravision.

The acronym “LLM” has made it into common parlance, but the more appropriate term for most large, multimodal models today like ChatGPT and Claude is “VLM”, which is short for “vision-language model”. It’s hard to overemphasize how much the field of robotics is benefiting from the existence of this VLM foundation.

The next step is yet another acronym which I suspect you’ll be hearing a lot more in the years to come: “VLA”, or “vision-language-action” models.

Let me explain by way of analogy: a human baby. When babies are born, as far as we can tell, they have practically no comprehension of the world around them. They have zero context for the rush of sensations they’re experiencing. They have practically no proprioceptive control over their own muscles. All they have are a few very basic instincts: sucking, gripping, kicking.

If you were building any kind of autonomous physical system five years ago, this would be a pretty good description of your starting point: a tabula rasa just beginning to gather data about both the environment and its own capabilities. Hence, anyone seeking to build an autonomous system — whether a passenger vehicle or a robotic arm— needed to gather massive amounts of data from real world deployments. Waymo, for example, had to drive around for millions of miles in order to take the first baby steps towards autonomy. The company took six years to reach a million fully autonomous miles. (The next million miles took just sixteen months.)

This game has now completely changed. Today, roboticists can instantly upload internet-scale comprehension of the world, and full semantic capabilities, into the minds of their newborns. Imagine a human infant with Claude’s understanding of everything he’s seeing and hearing, and the ability to converse with his parents…

… Except, for that last part. Not right away. Speaking requires proprioception of the vocal cords, tongue, and mouth, and lots of practice orchestrating those muscles in concert. This is the loop that needs to be closed to turn comprehension into autonomy. There’s still new training data that needs to be collected in order to turn "vision-language” into “vision-language-action”.

But the point is that we’re no longer starting from a blank slate. Building a VLA is much easier when you start with the “VL” already in place. This approach cuts many years and many millions of dollars (sometimes billions) out of the process of developing a new robotic solution.

Google kicked this trend off back in 2023 with a landmark model called “PaLM-e”. Per Google:

We began with PaLM, a powerful large language model, and “embodied” it (the “E” in PaLM-E), by complementing it with sensor data from the robotic agent. This is the key difference from prior efforts to bring large language models to robotics — rather than relying on only textual input, with PaLM-E we train the language model to directly ingest raw streams of robot sensor data. The resulting model not only enables highly effective robot learning, but is also a state-of-the-art general-purpose visual-language model, while maintaining excellent language-only task capabilities.

Google's PaLM-E: The Revolutionary Robot Brain That Takes Commands
Source: Google

Google has continued to stay at the forefront of VLA development, but the field has quickly expanded. There are now at least a half dozen well-funded companies building generalist robotic “foundation models”, such as Physical Intelligence and Field AI. Additionally, the machine learning ecosystem has always had a strong open-source current, and this trend is continuing in the new era of Physical AI. Certainly NVIDIA, at least, has every incentive to support open-source tools as a means of stimulating demand for its chips in billions of robots.

In a recent post, “Autonomy is real now”, I discussed how autonomous vehicles have finally reached a commercial inflection point — mostly thanks to the pluck and detemination of just one company: Waymo.

Autonomy is real now

After more than fifteen years of effort, Waymo has reached escape velocity with its autonomous taxi service, with active service in ten cities and immediate plans to expand in over a dozen more. (That list includes Boston, which my Bostonian father insists is going to be much harder than they think.)

This feat is a particularly remarkable given that Waymo opened its service to the general public less than three years ago. Most importantly, the company is now armed with data on over 200 million miles of autonomous driving, and a safety record which conclusively demonstrates their superiority to human drivers. This record should help accelerate Waymo’s expansion into new cities, by alleviating the public’s understandable safety concerns.

Swiss RE & Waymo, “o Autonomous Vehicles Outperform Latest-Generation Human-Driven Vehicles? A Comparison to Waymo's Auto Liability Insurance Claims at 25 Million Miles”, Dec 2024.

Hence, on the Spectrum of Robotic Capabilities, Waymo has planted a flag much further to the right than any autonomous system that’s come before… and much further than most people I meet fully comprehend. While the range of specific tasks that any given AV needs to perform is fairly narrow — “accelerate”, “decelerate”, “turn right”, “turn left” — city streets are some of the most unpredictable environments imaginable. The range of unusual situations a vehicle might encounter is effectively infinite. And of course, the consequences of even small mistakes can be catastrophic.

Frankly, I’m still shocked by how little this phenomenon has made it into everyday conversation. SELF-DRIVING CARS ARE REAL NOW! We can barely begin to predict the second and third order effects of this technology.

In order to reach this extraordinary inflection point, Waymo had to invest about $6 billion, while its competitors have invested tens of billions more. It should come as no surprise that such a flood of capital pouring into this field has generated positive ripple effects for adjacent sectors.

And already, the robotics ecosystem is looking like one of the biggest beneficiaries.

For example: AV development has bequeathed to the world a substantial cohort of “Physical AI” engineers who have now cut their teeth at the leading edge. Unsurprisingly, some of these engineers are now following the great Silicon Valley tradition of setting out to launch new companies. For example, see Bedrock Robotics, a hot autonomous construction company founded by Waymo alumnus Boris Sofman.

This level of talent dispersion is a very strong tailwind.

AV development has also been an important catalyst for investment in LIDAR, a mode of three dimensional sensing with important advantages over purely optical sensors for machine vision. Thanks to the AV ecosystem, the cost of LIDAR sensors has fallen by more than 90% while performance has greatly improved.

And LIDAR is not the only common robotics component which has become a lot more affordable in the past decade.

  • Regular Steel For Fuel readers are certainly familiar with the plummeting cost of lithium-ion batteries, which is important, because electricity tends to be the optimal choice for powering autonomous systems.

  • The price of servo motors, which nearly always serve as the joints and muscles of robotic appendages, has fallen by more than 50%.

  • And the price of GPUs and related microchips, which make up the brains of these systems, has famously fallen by more than 95%.

And so, the board is set. All of the building blocks are now in place.

What will they look like?

There are a number of very well-funded operations pursuing “humanoid” form factors. The thesis behind humanoids is pretty easy to understand: Most of our world was built by humans, for humans — hence, bipedal robots with two arms, two hands, and opposable thumbs should fit right in. Humanoid robotic models are also naturally well suited for training via “imitation learning” — at one point Tesla was reportedly hiring workers for nearly $50 an hour to wear motion capture suits in order to train the company’s humanoid robots.

Plus, decades of science fiction have conditioned us to look for recognizable features in our creations…

Perhaps humanoids will indeed be necessary if we’re ever going to reach “Plumber” on the Spectrum of Robotic Capabilities. But personally, I’m not so sure. I can see more value being created by a diverse panoply of robots, embodied in a wide range of form factors, each engineered for a distinct set of tasks. This is the course that nature took, after all, as life evolved to fill millions of disparate ecological niches. I don’t see any more reason why a humanoid would be the ideal form factor for installing solar panels or coating electric transmission lines than why a humanoid would be the ideal form factor for traversing a coral reef.

If I’m correct, then we should all be preparing for a Cambrian explosion of robots entering nearly every corner of the physical economy.

Discussion about this post

Ready for more?