Going from Anywhere to Everywhere
CVPR 2024
Starting from an arbitrary location (specified by either text or an image), WonderJourney generates a sequence of diverse yet coherently connected 3D scenes (i.e., a "wonderjourney") along a camera trajectory. We render a "wonderjourney" using a back-and-forth camera trajectory.

Input image

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

Input image

Input real photo

Input image

Input image

Input image

Input image
WonderJourney can synthesize long "wonderjourneys". Hover over a video to pause automatic sliding.

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

Input image
Starting from the same location, WonderJourney can generate a diverse set of "wonderjourneys", ending at different destinations. We render each video below using a trajectory of camera poses. Hover over a video to pause automatic sliding.
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3
Generated wonderjourney 1
Generated wonderjourney 2
Generated wonderjourney 3

Input real photo

Input image

Input image

Input image

Input image

Input image

Input image

Input image

Input real photo

Input image

Input image
WonderJourney can also generate controlled wonderjourneys given a sequences of text descriptions, such as poems, haikus, and story abstracts. Hover over a video to pause automatic sliding.

Input text

Input text

Input text

Input text

Input text

Input text

Input text

Input text
We introduce WonderJourney, a modularized framework for perpetual scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image), and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary ``wonderjourneys''.
No, no! The adventures first, explanations take such a dreadful time. --- Alice's Adventures in Wonderland
Our modular design does not require any training, allowing easy future improvements from the quick advances in vision and language models.
@inproceedings{yu2024wonderjourney, title={Wonderjourney: Going from Anywhere to Everywhere}, author={Hong-Xing Yu and Haoyi Duan and Junhwa Hur and Kyle Sargent and Michael Rubinstein and William T. Freeman and Forrester Cole and Deqing Sun and Noah Snavely and Jiajun Wu and Charles Herrmann}, booktitle={CVPR}, year={2024} }