Robot Training Data Augmentation
This repo contains a plug-and-play tool to massively multiply robot training datasets by augmenting training episodes with new scenes. Currently RLDS formatted datasets are supported, and Open-X-Embodiment is used as an example in this repo.
Images from training episodes are transformed into new scenes while leaving unchanged critical visual aspect such as trajectories and object interaction. Currently, this is powered by RunwayML's Gen4-Alpeh.
Tools are provided for:
- Downloading the dataset.
- Extracting videos from the dataset.
- Generating new videos.
- (coming soon) Writing back new episodes to the dataset.
Quickstart
Prerequisites
- Docker is required - this tool is packaged as a container.
- Load your Replicate API key in
.env- video-to-video generative model APIs from Replicate are used:
REPLICATE_API_TOKEN=XXXXXXXXXXX
Build
docker build -f tool/Dockerfile -t oxe-tool .Run
- Make directories for inputs and outputs:
mkdir oxe-datasets videos
- Download episodes from a sample dataset from Open-X-Embodiment:
docker run --rm \
-v "$(pwd)/oxe-datasets:/datasets" \
oxe-tool download_dataset \
--dataset bridge \
--max_episodes 50- Export videos:
docker run --rm \ -v "$(pwd)/oxe-datasets:/datasets:ro" \ -v "$(pwd)/videos:/videos" \ oxe-tool export_video \ --dataset bridge \ --max_episodes 50 --fps 24 --info
- Generate a transformed video:
docker run --rm \ --env-file .env \ -v "$(pwd)/videos:/videos" \ oxe-tool generate_video \ --dataset bridge \ --video-name ep00021.mp4 \ --prompt "Re-light the scene with a bright white spotlight" \ --seed 1234
CLI Overview
Run help:
docker run --rm oxe-tool --help docker run --rm oxe-tool download_dataset --help docker run --rm oxe-tool export_video --help docker run --rm oxe-tool generate_video --help
Subcommands and key options:
-
download_dataset- Downloads to
/datasets(mounted via Docker-v) --dataset(repeatable), or--datasets(comma/space-separated). If not provided, defaults tobridge.--max_episodes: optional integer to limit episodes downloaded per dataset (default: download all episodes).
- Downloads to
-
export_video- Reads from
/datasets, writes to/videos(both mounted via Docker-v) --dataset(repeatable), or--datasets(comma/space-separated). If not provided, defaults tobridge.--split(defaulttrain),--max_episodes(default5),--fps(default24),--display_key(defaultimage),--info--image_key_choice: Pre-select image key choice (1-based index) for datasets with multiple camera views to avoid interactive prompts- For interactive selection in Docker, add
-itflags:docker run --rm -it ...
- Reads from
-
generate_video- Reads/writes from
/videos(mounted via Docker-v) --dataset: dataset name (matches directory name in video structure)--video-name: video filename (e.g.,ep00001.mp4)--prompt: text prompt for the model--seed: optional integer for reproducible generations- Input video must be 24fps, ≤5s, ≤1MB; aspect ratio must be one of
16:9,9:16,4:3,3:4,1:1,21:9 - Output saved to
/videos/{dataset}/generated/{video-name}_generated-{number}.mp4 - Requires
REPLICATE_API_TOKENin the environment (e.g.,--env-file .env).
- Reads/writes from
-
Future commands (coming soon):
augment_dataset, to write back the augmented episodes to a dataset.
Notes
- The tool downloads the Open-X-Embodiment dataset from the public mirror
gs://gresearch/robotics/. - File Organization:
export_videocreates dataset-specific subdirectories:{video-dir}/{dataset}/ep{N}.mp4generate_videocreates organized output:{video-dir}/{dataset}/generated/{video}_generated-{N}.mp4- Generated videos use automatic numbering to prevent overwrites
- Video Requirements for AI Generation: 24fps, ≤5 seconds duration, ≤1MB file size, supported aspect ratios only
- Video-to-Video AI Model: Uses RunwayML's Gen4-Aleph