WonderJourney - NFHN Reader

Going from Anywhere to Everywhere

CVPR 2024

Starting from an arbitrary location (specified by either text or an image), WonderJourney generates a sequence of diverse yet coherently connected 3D scenes (i.e., a "wonderjourney") along a camera trajectory. We render a "wonderjourney" using a back-and-forth camera trajectory.

Input image

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

WonderJourney can synthesize long "wonderjourneys". Hover over a video to pause automatic sliding.

Input real photo

Input image

Input real photo

Input image

Input real photo

Input image

Starting from the same location, WonderJourney can generate a diverse set of "wonderjourneys", ending at different destinations. We render each video below using a trajectory of camera poses. Hover over a video to pause automatic sliding.

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Generated wonderjourney 1

Generated wonderjourney 2

Generated wonderjourney 3

Input real photo

Input image

Input real photo

Input image

WonderJourney can also generate controlled wonderjourneys given a sequences of text descriptions, such as poems, haikus, and story abstracts. Hover over a video to pause automatic sliding.

Input text

We introduce WonderJourney, a modularized framework for perpetual scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image), and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary ``wonderjourneys''.

No, no! The adventures first, explanations take such a dreadful time. --- Alice's Adventures in Wonderland

Our modular design does not require any training, allowing easy future improvements from the quick advances in vision and language models.

@inproceedings{yu2024wonderjourney, title={Wonderjourney: Going from Anywhere to Everywhere}, author={Hong-Xing Yu and Haoyi Duan and Junhwa Hur and Kyle Sargent and Michael Rubinstein and William T. Freeman and Forrester Cole and Deqing Sun and Noah Snavely and Jiajun Wu and Charles Herrmann}, booktitle={CVPR}, year={2024} }