GitHub - NVIDIA/asset-harvester: Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

NVIDIA

Abstract

Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.

Asset Harvester turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.

Pipeline Overview

NCore V4 Data ─► NCore Parsing ─► Multiview Diffusion + Gaussian Lifting ─► metadata.yaml (required for NuRec Object Insertion)

Input View	Multiview Diffusion (2 of 16 views shown)	3D Gaussian Lifting

User Guide

For end-to-end asset harvesting from recorded driving sessions, see our Full End-to-End Workflow ✨ !

Setup

Prerequisites

conda (Miniconda or Miniforge)
NVIDIA driver >= 570 (CUDA 12.8 compatible)
GCC 10–13 (tested with GCC 12.3)
GPU VRAM ~16 GB (add --offload_model_to_cpu to offload unused models to CPU for lower VRAM)

Note: Initial setup takes ~20 minutes to complete.

git clone https://github.com/NVIDIA/asset-harvester.git
cd asset-harvester
bash setup.sh
conda activate asset-harvester

Option note: bash setup.sh --env-name asset-harvester --python 3.10

The bash script setup.sh handles the full environment setup for this repo.

If you need a manual install from a checkout, preinstall the pinned gsplat build first, then install the repo with the extras you need:

pip install --no-cache-dir --no-build-isolation \
    "git+https://github.com/nerfstudio-project/gsplat.git@b60e917c95afc449c5be33a634f1f457e116ff5e"
pip install --extra-index-url https://download.pytorch.org/whl/cu128 \
    -e ".[ncore-parser,multiview_diffusion,tokengs,camera-estimator]"

Download Model Checkpoints

pip install huggingface_hub[cli]
hf auth login
hf download nvidia/asset-harvester --local-dir checkpoints

or, manually from the Hugging Face. This places the following files in checkpoints/:

checkpoints/
├── AH_multiview_diffusion.safetensors
├── AH_tokengs_lifting.safetensors
├── AH_camera_estimator.safetensors
└── AH_object_seg_jit.pt

Image-to-3D

Try Asset Harvester on our sample data (Multiview Diffusion + Gaussian Lifting). Requires ~16GB VRAM. If you run into VRAM OOM issues, add --offload_model_to_cpu to offload unused model components to CPU:

export DATA_ROOT=data_samples/rectified_AV_objects/
export CHECKPOINT_MV=checkpoints/AH_multiview_diffusion.safetensors
export CHECKPOINT_GS=checkpoints/AH_tokengs_lifting.safetensors
export OUTPUT_DIR=outputs/harvesting
python3 run_inference.py \
    --diffusion_checkpoint "${CHECKPOINT_MV}" \
    --data_root "${DATA_ROOT}" \
    --output_dir "${OUTPUT_DIR}" \
    --lifting_checkpoint "${CHECKPOINT_GS}"

Or if you have a single-view image with an object in the center and a foreground mask, resize them into 512x512, and place them in a folder with this structure:

YOUR_IMAGE_ROOT/
├─── YOUR_IMAGE_NAME_0
│    ├── frame.jpeg
│    └── mask.png
└─── YOUR_IMAGE_NAME_1
...

If masks are not available, you can also use our image segmentation model to get mask.png from frame.jpeg stored in above structure:

export CHECKPOINT_SEG=checkpoints/AH_object_seg_jit.pt
export IMAGE_ROOT=data_samples/segmented_images 
python -m asset_harvester.utils.image_segment \
  --checkpoint $CHECKPOINT_SEG \
  --image_folder $IMAGE_ROOT \
  --frame_name frame.jpeg \
  --mask_name mask.png

Check the folder data_samples/OOD_images for example.

After data preparation, run Asset Harvester with our built-in camera estimator:

export YOUR_IMAGE_ROOT=data_samples/OOD_images
export CHECKPOINT_MV=checkpoints/AH_multiview_diffusion.safetensors
export CHECKPOINT_GS=checkpoints/AH_tokengs_lifting.safetensors
export CHECKPOINT_CAM=checkpoints/AH_camera_estimator.safetensors
export OUTPUT_DIR=outputs/harvesting_with_camera_estimate
python3 run_inference.py \
    --diffusion_checkpoint "${CHECKPOINT_MV}" \
    --ahc_checkpoint "${CHECKPOINT_CAM}" \
    --image_dir "${YOUR_IMAGE_ROOT}" \
    --output_dir "${OUTPUT_DIR}" \
    --lifting_checkpoint  "${CHECKPOINT_GS}"

Full End-to-End Workflow

For the complete step-by-step pipeline walkthrough — from raw NCore driving logs through NCore parsing, multiview diffusion, Gaussian lifting, metadata generation, and benchmark evaluation — see the End-to-End Guide.

Benchmark

Evaluate run_inference.py output by rendering Gaussian splat assets from reserved-view cameras and comparing to ground-truth frames. Computes PSNR, LPIPS, SSIM, and optional DINOv3 embedding distances.

Setup

Install benchmark evaluation dependencies and download third-party model weights required for evaluation. The script requires Hugging Face access to facebook/dinov3-vith16plus-pretrain-lvd1689m and facebook/sam-3d-body-dinov3.

Note that the benchmark environment needs transformers>=4.56.0, while the main asset-harvester environment uses transformers==4.48.3. Therefore, we clone the environment into a separate benchmark environment and install the updated dependencies.

conda create --name av-object-benchmark --clone asset-harvester

conda activate av-object-benchmark 
bash benchmark/install.sh

This installs benchmark-specific packages (pytorch-lightning, yacs, ninja, termcolor, transformers>=4.56.0), clones the SAM 3D Body repo, and downloads the SAM 3D Body and DINOv3 checkpoints. If the third-party checkpoints are unavailable (gated repo), eval still runs but skips embedding metrics.

Usage

python benchmark/eval.py --output_dir /path/to/run_inference_output --eval_output_dir benchmark/eval

See benchmark/README.md for the full list of options, input/output format, and metric descriptions.

Repository Structure

asset-harvester/
├── asset_harvester/
│   ├── camera_estimator/            # Camera pose estimator package
│   ├── multiview_diffusion/         # SparseViewDiT package
│   ├── ncore_parser/                # NCore parser package + CLI
│   ├── patches/                     # Compatibility patches
│   ├── tokengs/                     # TokenGS package + main.py training entry point
│   └── utils/                       # Shared utilities and CLI helpers
├── benchmark/                       # Evaluation tools
│   ├── eval.py
│   ├── embedding_metrics.py
│   ├── utils.py
│   ├── install.sh
│   └── README.md
├── data_samples/                    # Bundled sample data for Quick Start
├── docs/
│   ├── assets/                      # Demo images, GIFs, and videos
│   ├── end_to_end_example.md
│   └── tokengs.md
├── scripts/
│   ├── check_license_headers.py
│   └── run_ncore_parser.sh          # Step 1 wrapper for the parser CLI
├── run_inference.py                 # Main inference entry point
├── run.sh                           # Step 2: multiview diffusion + Gaussian lifting wrapper
├── setup.sh
├── pyproject.toml
├── CONTRIBUTING.MD
├── LICENSE.txt
├── THIRD_PARTY_LICENSE.txt
└── README.md

License

This project is licensed under the Apache License 2.0. See individual file headers for details.

Citation

If you find this work useful in your research, please consider citing:

@article{cao2026assetharvester,
  title   = {Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation},
  author  = {Cao, Tianshi and Ren, Jiawei and Zhang, Yuxuan and 
             Seo, Jaewoo and Huang, Jiahui and Solanki, Shikhar and 
             Zhang, Haotian and Guo, Mingfei and Turki, Haithem and 
             Li, Muxingzi and Zhu, Yue and Zhang, Sipeng and Gojcic, Zan and
             Fidler, Sanja and Yin, Kangxue},
  year    = {2026},
}

Disclaimer

Asset Harvester is trained for the AV domain, and the results are not guaranteed in other domains.

AI models generate responses and outputs based on complex algorithms and machine learning techniques, and those responses or outputs may be inaccurate or offensive. By downloading a model, you assume the risk of any harm caused by any response or output of the model. By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy, and privacy policy as applicable.