Gemini Robotics

3 min read Original article ↗

Gemini Robotics models allow robots of any shape and size to perceive, reason, use tools and interact with humans. They can solve a wide range of complex real-world tasks – even those they haven’t been trained to complete.

Gemini Robotics 1.5 is designed to reason through multi-step complex tasks, and to make decisions to form a plan of action. It will then work to carry out each step autonomously.

Slide 1 of 6

Generality

Understands the physical world, and adapts and generalizes its behaviour to fit new situations. Breaks down goals into manageable steps to make longer-term plans and overcome unexpected problems.

Agentic

Assess complex challenges, natively call tools – like Google Search – to look up information, and create detailed step-by-step plans to overcome them.

Thinking

Enables robots to think before acting, improving the quality of their actions, and making their decisions more transparent in natural language.

Interactivity

Understands and responds to everyday commands. Can explain its approach while taking action. Users can redirect it at any point, without using technical language. It also adjusts to any changes in its environment.

Dexterity

Enables robots to tackle complex tasks requiring fine motor skills and precise manipulation – like folding origami, packing a lunch box, or preparing a salad.

Multiple embodiments

Adapts to a diverse array of robot forms, from bi-arm static robotic platforms like ALOHA and Bi-arm Franka, to humanoid robots like Apptronik’s Apollo. A single model can be used across all these robots, in turn accelerating its learning across multiple embodiments.


Slide 1 of 8

Agentic capabilities

Uses digital tools autonomously to solve complex tasks.

Thinking while acting

Solves longer, multi-step tasks – without needing new instructions after each step.

Learning across embodiments

Transfers learned motions across robots of different sizes and shapes, helping robots to become more useful.

Embodied reasoning

Understand its environment and how to complete a task.

Generality in action

Generalizes across novel situations and solves a vast range of tasks.

Dynamic interactions

Responds to natural conversation and adapts rapidly to changing environments.

Partnering with Apptronik

Helping to build the next generation of humanoid robots.

Dexterous skills

Perform tasks that require fine motor skills and coordination.



Gemini Robotics 1.5

Our most capable vision-language-action (VLA) model. It can ‘see’ (vision), ‘understand’ (language) and ‘act’ (action) within the physical world. It processes visual inputs and user prompts, learning within different embodiments and increasing its ability to generalize problem-solving.

Gemini Robotics-ER 1.5

Our state-of-the-art embodied reasoning model. It specializes in understanding physical spaces, planning, and making logical decisions relating to its surroundings. It doesn’t directly control robotic limbs – but provides high-level insights to help the VLA model decide what to do next.

Gemini Robotics On-Device

This iteration of our VLA model is incredibly versatile, and optimized to run locally on robotic devices. This will allow robotics developers to adapt the model to improve performance on their own applications.


Gemini Robotics SDK

This iteration of our VLA model is incredibly versatile, and optimized to run locally on robotic devices. This will allow robotics developers to adapt the model to improve performance on their own applications.


Slide 1 of 9

Trusted Tester

Agility Robotics

Trusted Tester

Boston Dynamics

Trusted Tester

Agile Robots

Trusted Tester

Enchanted Tools

Trusted Tester

PAL Robotics

Trusted Tester

Rainbow Robotics

Trusted Tester

Collaborative Robotics

Trusted Tester

Universal Robots

Experience Gemini Robotics

If you're interested in testing our models, please share a few details to join the waitlist.