GitHub - adiasg/robot-data-augment: Augment robot training data with generative media

Robot Training Data Augmentation

This repo contains a plug-and-play tool to massively multiply robot training datasets by augmenting training episodes with new scenes. Currently RLDS formatted datasets are supported, and Open-X-Embodiment is used as an example in this repo.

Images from training episodes are transformed into new scenes while leaving unchanged critical visual aspect such as trajectories and object interaction. Currently, this is powered by RunwayML's Gen4-Alpeh.

Tools are provided for:

Downloading the dataset.
Extracting videos from the dataset.
Generating new videos.
(coming soon) Writing back new episodes to the dataset.

Quickstart

Prerequisites

Docker is required - this tool is packaged as a container.
Load your Replicate API key in .env - video-to-video generative model APIs from Replicate are used:

REPLICATE_API_TOKEN=XXXXXXXXXXX

Build

docker build -f tool/Dockerfile -t oxe-tool .

Run

Make directories for inputs and outputs:

mkdir oxe-datasets videos

Download episodes from a sample dataset from Open-X-Embodiment:

docker run --rm \
  -v "$(pwd)/oxe-datasets:/datasets" \
  oxe-tool download_dataset \
  --dataset bridge \
  --max_episodes 50

Export videos:

docker run --rm \
  -v "$(pwd)/oxe-datasets:/datasets:ro" \
  -v "$(pwd)/videos:/videos" \
  oxe-tool export_video \
  --dataset bridge \
  --max_episodes 50 --fps 24 --info

Generate a transformed video:

docker run --rm \
  --env-file .env \
  -v "$(pwd)/videos:/videos" \
  oxe-tool generate_video \
  --dataset bridge \
  --video-name ep00021.mp4 \
  --prompt "Re-light the scene with a bright white spotlight" \
  --seed 1234

CLI Overview

Run help:

docker run --rm oxe-tool --help
docker run --rm oxe-tool download_dataset --help
docker run --rm oxe-tool export_video --help
docker run --rm oxe-tool generate_video --help

Subcommands and key options:

download_dataset
- Downloads to /datasets (mounted via Docker -v)
- --dataset (repeatable), or --datasets (comma/space-separated). If not provided, defaults to bridge.
- --max_episodes: optional integer to limit episodes downloaded per dataset (default: download all episodes).
export_video
- Reads from /datasets, writes to /videos (both mounted via Docker -v)
- --dataset (repeatable), or --datasets (comma/space-separated). If not provided, defaults to bridge.
- --split (default train), --max_episodes (default 5), --fps (default 24), --display_key (default image), --info
- --image_key_choice: Pre-select image key choice (1-based index) for datasets with multiple camera views to avoid interactive prompts
- For interactive selection in Docker, add -it flags: docker run --rm -it ...
generate_video
- Reads/writes from /videos (mounted via Docker -v)
- --dataset: dataset name (matches directory name in video structure)
- --video-name: video filename (e.g., ep00001.mp4)
- --prompt: text prompt for the model
- --seed: optional integer for reproducible generations
- Input video must be 24fps, ≤5s, ≤1MB; aspect ratio must be one of 16:9, 9:16, 4:3, 3:4, 1:1, 21:9
- Output saved to /videos/{dataset}/generated/{video-name}_generated-{number}.mp4
- Requires REPLICATE_API_TOKEN in the environment (e.g., --env-file .env).
Future commands (coming soon): augment_dataset, to write back the augmented episodes to a dataset.

Notes

The tool downloads the Open-X-Embodiment dataset from the public mirror gs://gresearch/robotics/.
File Organization:
- export_video creates dataset-specific subdirectories: {video-dir}/{dataset}/ep{N}.mp4
- generate_video creates organized output: {video-dir}/{dataset}/generated/{video}_generated-{N}.mp4
- Generated videos use automatic numbering to prevent overwrites
Video Requirements for AI Generation: 24fps, ≤5 seconds duration, ≤1MB file size, supported aspect ratios only
Video-to-Video AI Model: Uses RunwayML's Gen4-Aleph