uCO3D: UnCommon Objects in 3D
[paper] [project page]
This repository contains download scripts and tooling for working with the UnCommon Objects in 3D (uCO3D) dataset.
uCO3D contains ~170,000 turn-table videos capturing objects from the LVIS taxonomy of object categories.
The dataset is described in our paper "UnCommon Objects in 3D".
Updates:
- Now the uCO3D dataset is also available on hugging face!
Main features
- 170,000 videos scanning diverse objects from all directions.
- Objects come from the LVIS taxonomy of ~1000 categories, grouped into 50 super-categories.
- Unlike CO3Dv2, uCO3D releases full original videos instead of frames.
- Each video is annotated with object segmentation, camera poses, and 3 types of point clouds.
- The dataset newly contains a 3D Gaussian Splat reconstruction for each video.
- Each scene contains a long and short caption obtained with a large video-language model.
- Significantly improved annotation quality and size w.r.t. CO3Dv2.
Download & Install
The full dataset (processed version) takes ~19.3 TB of space. We distribute it in chunks up to 20 GB. We provide an automated way of downloading and decompressing the data.
First, run the install script that will also take care of dependencies:
git clone git@github.com:facebookresearch/uco3d.git cd uco3d pip install -e .
Then run the download script (make sure to change <DESTINATION_FOLDER>):
python dataset_download/download_dataset.py --download_folder <DESTINATION_FOLDER> --checksum_check
Option2: Using the standard download script with HuggingFace option You can also use the original download script with the HuggingFace option:
python dataset_download/download_dataset.py \ --download_folder <DESTINATION_FOLDER> \ --use_huggingface
Download a small subset of the dataset
We allow users to download a small 52-video subset of uCO3D for preview and debugging purposes. The videos represent a random sample of categories. The small subset takes ~9.6 GB, i.e. more than 1000x smaller than the full dataset.
To download the small subset, run the following:
python dataset_download/download_dataset.py --download_small_subset --download_folder <SMALL_SUBSET_DESTINATION_FOLDER>
Download specific subsets of the dataset
As detailed here, we allow users to download only specific subsets of the dataset (e.g. only Gaussian Splats and RGB videos of specific object categories). This allows to greatly decrease the amount of required space. Note that this functionality is not supported for downloading the small subset - the small subset will always be downloaded in full.
Downloading specific modalities
Setting --download_modalities to a comma-separated list of specific modality names will download only a subset of available modalities.
For instance
python dataset_download/download_dataset.py --download_folder <DESTINATION_FOLDER> --download_modalities "rgb_videos,point_clouds"
will only download rgb videos and point clouds.
Execute python dataset_download/download_dataset.py -h for the list of all downloadable modalities.
Dataset size per modality
The following table contains the size of all videos for a given modality:
----------------------------------
Modality Size (TB)
----------------------------------
rgb_videos 7.59
mask_videos 0.16
depth_maps 9.69
gaussian_splats 1.18
point_clouds 0.57
segmented_point_clouds 0.04
sparse_point_clouds 0.04
----------------------------------
Total 19.27
Downloading specific super-categories
Setting --download_super_categories will instruct the script to download only a subset of the available categories.
For instance
python dataset_download/download_dataset.py --download_folder <DESTINATION_FOLDER> --download_super_categories "vegetables_and_legumes,stationery"
will download only the vegetables&legumes and stationery super-categories.
Note that --download_modalities can be mixed with --download_super_categories to enable choosing any possible subset of the dataset.
Run python dataset_download/download_dataset.py -h for the full list of options.
API Quick Start and Examples
-
Setup the dataset root environment var
export UCO3D_DATASET_ROOT=<DESTINATION_FOLDER>
pointing to the root folder with the uCO3D dataset.
- Create the dataset object and fetch its data:
from uco3d import UCO3DDataset, UCO3DFrameDataBuilder from uco3d.dataset_utils.utils import get_dataset_root import os # Get the dataset root folder and check that # all required metadata files exist. dataset_root = get_dataset_root(assert_exists=True) # Get the "small" subset list containing a small subset # of the uCO3D categories. For loading the whole dataset # use "set_lists_all-categories.sqlite". subset_lists_file = os.path.join( dataset_root, "set_lists", "set_lists_3categories-debug.sqlite", ) dataset = UCO3DDataset( subset_lists_file=subset_lists_file, subsets=["train"], frame_data_builder=UCO3DFrameDataBuilder( apply_alignment=True, load_images=True, load_depths=False, load_masks=True, load_depth_masks=True, load_gaussian_splats=True, gaussian_splats_truncate_background=True, load_point_clouds=True, load_segmented_point_clouds=True, load_sparse_point_clouds=True, box_crop=True, box_crop_context=0.4, load_frames_from_videos=True, image_height=800, image_width=800, undistort_loaded_blobs=True, ) ) # query the dataset object to obtain a single video frame of a sequence frame_data = dataset[100] # obtain the RGB image of the frame image_rgb = frame_data.image_rgb # obtain the 3D gaussian splats reconstructing the whole scene gaussian_splats = frame_data.sequence_gaussian_splats # render the scene gaussian splats into the camera of the loaded frame # NOTE: This requires the 'gsplat' library. You can install it with: # > pip install git+https://github.com/nerfstudio-project/gsplat.git@v1.3.0 from uco3d import render_splats render_colors, render_alphas, render_info = render_splats( cameras=frame_data.camera, splats=gaussian_splats render_size=[512, 512] )
Examples
The examples folder contains python scripts with examples using the dataset.
Tests
The tests folder runs various tests checking the correctness of the implementation, and also visualizing various loadable modalities, such as point clouds or 3D Gaussian Splats.
To run tests execute the following:
Visualisation tests
The 3D Gaussian Splat and Pointcloud tests contain many scripts for loading and visualizing all point-cloud and 3D Gaussian Splat data which the dataset contains. Make sure to explore these to familiarize yourself with the dataset interface.
Dataset format
The dataset is organized in the filesystem as follows:
├── metadata.sqlite
├── set_lists
│ ├── set_lists_3categories-debug.sqlite
│ ├── set_lists_all-categories.sqlite
│ ├── set_lists_<subset_lists_name_2>.sqlite
│ ├── ...
├── <super_category_1>
│ ├── <category_1>
│ │ ├── <sequence_name_1>
│ │ │ ├── depth_maps.h5
│ │ │ ├── gaussian_splats
│ │ │ ├── mask_video.mkv
│ │ │ ├── rgb_video.mp4
│ │ │ ├── point_cloud.ply
│ │ │ ├── segmented_point_cloud.ply
│ │ │ └── sparse_point_cloud.ply
│ │ ├── <sequence_name_2>
│ │ │ ├── depth_maps.h5
│ │ │ ├── gaussian_splats
│ │ │ ├── mask_video.mkv
│ │ │ ├── rgb_video.mp4
│ │ │ ├── point_cloud.ply
│ │ │ ├── segmented_point_cloud.ply
│ │ │ └── sparse_point_cloud.ply
│ │ ├── ...
│ │ ├── <sequence_name_S>
│ ├── ...
│ ├── <category_C>
├── ...
├── <super_category_S>
Data stored as videos
Note that, differently from CO3Dv2, the frame-level data such as images or depth maps is solely released in form of videos or h5 files to save space. The provided UCO3DFrameDataBuilder dataset object then seeks rgb/depth/mask frames from the loaded videos on-the-fly.
Per-sequence data
Each sequence-specific folder <super_category>/<category>/<sequence_name> contains the following files:
rgb_video.mp4: The original crowd-sourced video capturing the object from the visual category<category>and super-category<super_category>.mask_video.mkv: Segmentation video of the same length asrgb_video.mp4containing the video-segmentation of the foreground object. The latter was obtained usingLangSAMin combination with a video segmentation refiner based onXMem.depth_maps.h5:hdf5file containing a depth map for each of the 200 frames sampled equidistantly from the input video. We first runDepthAnythingV2and align the result depth map's scale with the scene sparse point cloud fromsparse_point_cloud.ply. Hence, the depth maps have a consistent scale within each scene, although they do not achieve strict pixel-wise consistency across multiple views. We are working on improving this and should provide more consistent depth maps in the future.gaussian_splats: 3D Gaussian Splat reconstruction of the scene obtained with thegsplatlibrary (v1.3.0). The splats are compressed using the standardgsplatcompression method which sorts the gaussians using Self-Organizing Gaussian Grids followed bypngcompression.point_cloud.ply: A dense colored 3D pointcloud reconstructing the scene. Obtained using VGGSfM.segmented_point_cloud.ply: Same aspoint_cloud.plybut restricted only to points covering the foreground object.sparse_point_cloud.ply: Sparse geometrically-accurate scene pointcloud used to reconstruct the scene cameras. Obtained using VGGSfM.
Metadata database
The $UCO3D_DATASET_ROOT/metadata.sqlite file contains a database of all frame-level and video-level metadata such as paths to individual RGB/mask videos, or camera poses for each frame. We opted for an SQL database since it provides fast access times without the need to store all metadata in memory (loading all metadata to memory usually takes minutes to hours for the whole dataset), and is widely supported.
PyTorch3D camera convention
The provided camera annotations follow the PyTorch3D convention and are represented in the PyTorch3D NDC space. Note that PyTorch3D is only an optional dependency which enables extra functionalities and tests within the codebase.
Converting uCO3D Cameras to PyTorch3D PerspectiveCameras
Note that, if PyTorch3D is installed, the Cameras objects loaded using the UCO3DDataset object can be converted to the corresponding PyTorch3D PerspectiveCameras object using the Cameras.to_pytorch3d_cameras function.
Converting uCO3D Cameras to OpenCV cameras
We also provide a conversion to the OpenCV (cv2) camera format:
from uco3d import UCO3DDataset, UCO3DFrameDataBuilder # import the camera conversion function: from uco3d import opencv_cameras_projection_from_uco3d # instantiate the dataset dataset = UCO3DDataset( ... ) # query the dataset object to obtain a single video frame of a sequence frame_data = dataset[100] R, tvec, camera_matrix = opencv_cameras_projection_from_uco3d( frame_data.camera, image_size=frame_data.image_size_hw[None], ) # R, tvec, camera_matrix follow OpenCV's camera definition
3D Gaussian Splat convention
uCO3D also contains 3D Gaussian Splat (3DGS) reconstructions in each folder. Here, our Gaussian Splat reconstructions were obtained using gsplat (v1.3.0). gsplat is an optional dependency that allows fast rendering of the provided 3DGS reconstructions.
Installing gsplat
The easiest way to install the supported version of gsplat is to use pip+git:
pip install git+https://github.com/nerfstudio-project/gsplat.git@v1.3.0
Rendering with gsplat
Note that we also provide functions for rendering the loaded splats:
from uco3d import UCO3DDataset, UCO3DFrameDataBuilder from uco3d import render_splats # instantiate the dataset dataset = UCO3DDataset( ... ) # query the dataset object to obtain a single video frame of a sequence frame_data = dataset[100] # render the scene gaussian splats into the camera of the loaded frame render_colors, render_alphas, render_info = render_splats( cameras=frame_data.camera, splats=frame_data.sequence_gaussian_splats, render_size=[512, 512] )
Subset lists
The subset lists files:
$UCO3D_DATASET_ROOT/set_lists/set_lists_<SETLIST_NAME>.sqlite
definine dataset splits. Specifically, each file contains a list of frames (identified with their sequence_name and frame_number) in the "train" and "val" subsets of the dataset.
In order to select a specific subset of the dataset, one passes the correct subset list path, and the subset name to the constructore of UCO3DDataset.
For instance
dataset = UCO3DDataset( subset_lists_file="<UCO3D_DATASET_ROOT>/set_lists/set_lists_all-categories.sqlite", subsets=["train"], frame_data_builder=..., )
will load the "train" subset of the set_lists_all-categories.sqlite subset list which contains the whole uCO3D dataset.
Available subset lists
The folder <UCO3D_DATASET_ROOT>/set_lists/ provides the following subset lists:
- set_lists_3categories-debug.sqlite - A small split for debugging purposes. Very fast to load allowing fast iteration on the code using the dataset.
- set_lists_all-categories.sqlite - Contains the whole dataset.
- set_lists_static-categories.sqlite - Contains the videos of all rigid categories of uCO3D.
- set_lists_static-categories-accurate-reconstruction.sqlite - Contains the videos of all rigid categories of uCO3D with high-quality reconstructions.
- set_lists_dynamic-categories.sqlite - Contains the videos of all flexible categories (e.g. animals) of uCO3D.
Creating custom subset lists
Subset lists are stored as sqlite tables. The easiest way is to use pandas to select a subset of the main <UCO3D_DATASET_ROOT>/metadata.sqlite table. The following example makes a subset list from or 100 and 30 training and validation samples respectively, by taking the first 130 sequences from the metadata:
import pandas as pd, sqlite3, os # read the main metadata table (takes long time) metadata_file = os.path.join(UCO3D_DATASET_ROOT, "metadata.sqlite") frame_annots = pd.read_sql_table("frame_annots", f"sqlite:///{metadata_file}") # pick dataset sequence names by taking unique sequence names from frame annotations seqs = frame_annots["sequence_name"].unique() # choose the list of train/val sequences train_seqs, val_seqs = downloaded_seqs[:100], downloaded_seqs[100:130] # training setlist setlists_train = frame_annots[frame_annots.isin(train_seqs)][["frame_number","sequence_name"]] setlists_train["subset"] = "train" # validation setlist setlists_val = frame_annots[frame_annots.isin(val_seqs)][["frame_number","sequence_name"]] setlists_val["subset"] = "val" # concatenate train and val setlists = pd.concat([setlists_train, setlists_val]) # the new setlists will be stored as the set_lists_130.sqlite file in the original root setlist_file = os.path.join(UCO3D_DATASET_ROOT, "set_lists", "set_lists_130.sqlite") # store the new table with sqlite3.connect(setlist_file) as con: setlists.to_sql("set_lists", con, if_exists='replace', index=False)
License
The data are released under the CC BY 4.0 license.
Third-Party Code
This project uses code from other sources, which are licensed under their respective licenses:
Reference
If you use our dataset, please use the following citation:
@inproceedings{liu24uco3d, Author = {Liu, Xingchen and Tayal, Piyush and Wang, Jianyuan and Zarzar, Jesus and Monnier, Tom and Tertikas, Konstantinos and Duan, Jiali and Toisoul, Antoine and Zhang, Jason Y. and Neverova, Natalia and Vedaldi, Andrea and Shapovalov, Roman and Novotny, David}, Booktitle = {arXiv}, Title = {UnCommon Objects in 3D}, Year = {2025}, }

