A Large-Scale In-the-Wild Robot Manipulation Dataset

DROID: A Large-Scale In-the-Wild
Robot Manipulation Dataset

DROID Dataset Team

(hover to display full author list)

Alexander Khazatsky^*, Karl Pertsch^*, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park, Ilija Radosavovic, Kaiyuan Wang, Albert Zhan, Kevin Black, Cheng Chi, Kyle Beltran Hatch, Shan Lin, Jingpei Lu, Jean Mercat, Abdul Rehman, Pannag R Sanketi, Archit Sharma, Cody Simpson, Quan Vuong, Homer Rich Walke, Blake Wulfe, Ted Xiao, Jonathan Heewon Yang, Arefeh Yavary, Tony Z. Zhao, Christopher Agia, Rohan Baijal, Mateo Guaman Castro, Daphne Chen, Qiuyu Chen, Trinity Chung, Jaimyn Drake, Ethan Paul Foster, Jensen Gao, Vitor Guizilini, David Antonio Herrera, Minho Heo, Kyle Hsu, Jiaheng Hu, Muhammad Zubair Irshad, Donovon Jackson, Charlotte Le, Yunshuang Li, Kevin Lin, Roy Lin, Zehan Ma, Abhiram Maddukuri, Suvir Mirchandani, Daniel Morton, Tony Nguyen, Abigail O'Neill, Rosario Scalise, Derick Seale, Victor Son, Stephen Tian, Emi Tran, Andrew E. Wang, Yilin Wu, Annie Xie, Jingyun Yang, Patrick Yin, Yunchu Zhang, Osbert Bastani, Glen Berseth, Jeannette Bohg, Ken Goldberg, Abhinav Gupta, Abhishek Gupta, Dinesh Jayaraman, Joseph J Lim, Jitendra Malik, Roberto Martín-Martín, Subramanian Ramamoorthy, Dorsa Sadigh, Shuran Song, Jiajun Wu, Michael C. Yip, Yuke Zhu, Thomas Kollar, Sergey Levine, Chelsea Finn
^*Co-Lead

Getting Started

Dataset Quickstart (Colab)

import tensorflow_datasets as tfds     

ds = tfds.load("droid", 
    data_dir="gs://gresearch/robotics", split="train")

for episode in ds.take(5):
    for step in episode["steps"]:
        image = step["observation"]["exterior_image_1_left"]
        wrist_image = step["observation"]["wrist_image_left"]
        action = step["action"]
        instruction = step["language_instruction"]

Abstract

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance, greater robustness, and improved generalization ability. We open source the full dataset, code for policy training, and a detailed guide for reproducing our robot hardware setup.

✨ Updates ✨

April 2025: We provide improved camera calibrations for 36k episodes of the DROID dataset on HuggingFace -- check our updated paper for how we computed these calibration values!
December 2024: We provide an updated set of DROID language annotations on HuggingFace -- 3 natural language annotations for 95% of all successful DROID episodes (75k episodes)

The DROID Robot Platform

DROID uses the same hardware setup across all 13 institutions to streamline data collection while maximizing portability and flexibility. The setup consists of a Franka Panda 7DoF robot arm, two adjustable Zed 2 stereo cameras, a wristmounted Zed Mini stereo camera, and an Oculus Quest 2 headset with controllers for teleoperation. Everything is mounted on a portable, height-adjustable desk for quick scene changes.

DROID Dataset Analysis

Third-person camera viewpoints in DROID (subsampled). DROID episodes cover a total of 1417 camera viewpoints along with intrinsic and extrinsic stereo camera calibration. Brighter colors indicate regions of higher viewpoint density.

Visualization of 3D interaction points relative to the robot base. We visualize the 3D location at which the gripper first closes in each trajectory, since closing the gripper often indicates meaningful object interactions. DROID’s interactions cover a larger part of the robot’s workspace, since the robot is moved freely between collection sessions instead of being placed in front of repetitive tabletop scenes.

Number of scenes per scene type. DROID has an order of magnitude more scenes than other large robot manipulation datasets, spanning a much wider range of scene types.

Distribution of verbs and objects in DROID Top: Distribution of skills in DROID. DROID features a long tail of diverse verb classes that is only matched by Bridge V2, while the RH20T and RT-1 datasets have a more constrained set of skills. Bottom: Distribution of interacted objects in DROID, grouped by category. The robot interacts with a wide range of everyday objects.

Experiments

We investigate whether DROID can be used to boost policy performance and robustness across a wide spectrum of robot manipulation tasks and environments. To this end, we train policies across 6 tasks in 4 different locations including lab, office, and household settings, to reflect the diversity of real world robotic research use cases. All experiments use representative, state of the art robot policy learning approaches. Across the board, we find that DROID improves policy success rate while increasing robustness to scene changes like distractors or novel object instances.

Qualitative Comparison

Qualitatively, we find that policies that leverage DROID during training are notably smoother and precise than other comparisons.

Qualitative Comparison (OOD Evaluations)

We also find policies co-trained with DRIOD to be more robust to distractors and novel object instances.

Quantitative Comparison

Robot setups for policy evaluation. We cover a wide range of tasks and scenes, from lab evaluations to offices and real households, to reflect the diversity of use cases in real robot research.

Does DROID Improve Policy Performance and Robustness? We find that across all our evaluation tasks, co-training with DROID significantly improves both in distribution and OOD performance over both no co-training and co-training with the Open-X dataset. We compare success rate averaged across all tasks with standard error, and find DROID outperforms the next best method by 22% absolute success rate in-distribution and by 17% out of distribution.

BibTeX

@article{khazatsky2024droid,
    title   = {DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset},
    author  = {Alexander Khazatsky and Karl Pertsch and Suraj Nair and Ashwin Balakrishna and Sudeep Dasari and Siddharth Karamcheti and Soroush Nasiriany and Mohan Kumar Srirama and Lawrence Yunliang Chen and Kirsty Ellis and Peter David Fagan and Joey Hejna and Masha Itkina and Marion Lepert and Yecheng Jason Ma and Patrick Tree Miller and Jimmy Wu and Suneel Belkhale and Shivin Dass and Huy Ha and Arhan Jain and Abraham Lee and Youngwoon Lee and Marius Memmel and Sungjae Park and Ilija Radosavovic and Kaiyuan Wang and Albert Zhan and Kevin Black and Cheng Chi and Kyle Beltran Hatch and Shan Lin and Jingpei Lu and Jean Mercat and Abdul Rehman and Pannag R Sanketi and Archit Sharma and Cody Simpson and Quan Vuong and Homer Rich Walke and Blake Wulfe and Ted Xiao and Jonathan Heewon Yang and Arefeh Yavary and Tony Z. Zhao and Christopher Agia and Rohan Baijal and Mateo Guaman Castro and Daphne Chen and Qiuyu Chen and Trinity Chung and Jaimyn Drake and Ethan Paul Foster and Jensen Gao and Vitor Guizilini and David Antonio Herrera and Minho Heo and Kyle Hsu and Jiaheng Hu and Muhammad Zubair Irshad and Donovon Jackson and Charlotte Le and Yunshuang Li and Kevin Lin and Roy Lin and Zehan Ma and Abhiram Maddukuri and Suvir Mirchandani and Daniel Morton and Tony Nguyen and Abigail O'Neill and Rosario Scalise and Derick Seale and Victor Son and Stephen Tian and Emi Tran and Andrew E. Wang and Yilin Wu and Annie Xie and Jingyun Yang and Patrick Yin and Yunchu Zhang and Osbert Bastani and Glen Berseth and Jeannette Bohg and Ken Goldberg and Abhinav Gupta and Abhishek Gupta and Dinesh Jayaraman and Joseph J Lim and Jitendra Malik and Roberto Martín-Martín and Subramanian Ramamoorthy and Dorsa Sadigh and Shuran Song and Jiajun Wu and Michael C. Yip and Yuke Zhu and Thomas Kollar and Sergey Levine and Chelsea Finn},
    year    = {2024},
}

DROID: A Large-Scale In-the-WildRobot Manipulation Dataset