Stanford released new research that demonstrates an easier way to control and train humanoid robots.
What’s up: A new paper, titled “HumanPlus: Humanoid Shadowing and Imitation from Humans” demonstrates the ability to track the real-time body and hand pose of a teleoperator, using only a single RGB camera, then feed that directly into a humanoid robot mimicking the human. This pioneers a much simpler method for collecting data to training an imitation learning model for autonomous operation. The Stanford lab used a slightly modified Unitree H1 humanoid robot.
What it means: Current methods for training robot imitation learning models require collecting training data while the robot completes the tasks that are intended to be automated. In many cases, this requires tele-operating the robot by either using a miniature version of the robot (“puppeteering”), or using VR headsets to control the gripper position. Humanoid robots give us the unique opportunity to control the robot with our own bodies and hands (vs say.. a 6-axis robot without any direct mapping between its joints and yours). This is the core of what the Stanford Lab has pulled off - which is way easier than using any existing teleoperation rigs, such as those used by companies like Tesla.
Even though this method may be easier and faster, there are significant shortcomings: it lacks touch/force feedback for the operator which makes this approach difficult for completing tasks with fine motor skills. It also requires that the operator has full real-time view of robot during teleoperation so they can see its whole body position relative to the real world.
Looking at the bigger picture, this could solve one of the biggest problems in robotics - DATA. It hints towards a sort of Holy Grail opportunity, transferring learnings from videos of humans to humanoid robots, and avoid the need for massive teleoperated data collection. Imagine using all the videos on YouTube for training robots…
A new open-source robotic control model, brought to you by, well, everyone.
What’s up: A combination of top research labs released “OpenVLA: An Open-Source Vision-Language-Action Model” last week. Building on the success of other open source robotic models like ACT, OpenVLA is a generalist policy trained on a large variety of data collected across tasks, robots, and environments (all from the Open X-Embodiment dataset). The results show a step forward in performance, with the model generalizing better than state of the art alternatives such as RT-1-X, Octo, and RT-2-X.
What it means: It’s exciting to see that there’s lots of “juice left to squeeze” from existing robot training data with improvements in the underlying model. Beyond that, it' is encouraging that this was made open-source and will now likely go on to be improved by the broader robotics industry. One small step forward on the journey to build intelligent, capable robots.
Harvard and Deepmind trained a virtual rodent brain, and the resulting virtual “brain waves” were similar to the real ones.
What’s up: In a new study, Harvard and Deepmind used reinforcement learning to train a virtual rodent, living in a virtual physics simulator, to move like a real rodent moving about in the real world. They then compared the brain activity of the real rodent to the trained neural network controlling the virtual rodents movements and found it was much more similar than methods trying to purely predict the rodent’s movement based on it’s motion or body position.
What it means: This establishes a promising method for understanding how the brain controls motion, which could affect future brain-machine-interfaces, and even robotics. Will we one day be able to remotely ‘log in’ to a robot and control it with just our mind? We’ll see.
Luma AI launched DreamMachine, a new and free to use text-to-video model.
What’s up: The generative AI video space is growing fast, with incumbents like OpenAI/Sora, Pika, and Runway. Luma is a startup that’s launched a variety of 3D-focused products over the past couple of years, and as of last week showed their ability to launch a compelling new option in the text-to-video model space - DeamMachine.
What it means: Going forward, it’s not hard to imagine that these models are only a few years away from being able to produce YouTube quality video from text instruction. Consider that these models are effectively encoding the laws of physics and simulating reality. They may eventually be useful for robotics biggest problem - creating DATA - and fine tuning a robot foundation model to specific hardware, environments, or tasks.
Elon Musk talks about Optimus progress at the Tesla investor meeting.
What’s up: At the 2024 Annual Tesla Stockholder Meeting, Elon shared some future visions and details around Optimus. Tesla already has 2 Optimus robots in action in a Tesla factory, and plans to deploy 1000 by next year. After dogfooding the product at the Tesla factory, Elon sees a huge market ahead.
What it means: Elon is making the case that the Humanoid Robot market is a $20 Trillion opportunity.. That is almost 10x the ~$2.5T global automotive market today. If you believe this - Humanoid robots are likely to become the next …. car. Tesla is well positioned as an established industrial player to manage the supply chain necessary to make billions of robots. We may be decades away from that point, but no matter the discount factor, maybe Tesla stock is cheap again?
This seems like a brilliant move and puts Tesla way ahead of the rest of the auto industry. it is also telling that Optimus is under the Tesla brand, instead of Elon’s xAI startup, which means he views this as more of a manufacturing play than a research project.
Collaborative Robotics, aka Cobot, made a key hire and expanded to the Seattle area.
What’s up: Cobot, a startup building a non-humanoid mobile robot for general task automation, announced that it had hired a former Amazon AI leader and is opening a new Seattle office after raising a $100M series M.
What it means: In the midsts of the ongoing AI talent war, robotics startups are now becoming capable of making financially attractive and exciting offers to big-tech talent looking to take part in the ongoing robotics renaissance. We expect this trend to continue as high-profile engineers and entrepreneurs jump into the robotics space.





