Any Resolution Any Geometry: From Multi-View To Multi-Patch

CVPR 2026

¹KAUST, ²Space Research Institute NASU-SSAU
^*Equal contribution

On in-the-wild high-resolution images, our method produces depth and normal maps with sharp boundaries and globally consistent geometry. Hover over the images to zoom in, and use the magnification slider above to adjust the magnification level and inspect the improved fine-detail preservation and depth–normal consistency across all predictions.

Magnification：1.5×
Drag to adjust zoom level

Input RGB

Input RGB · Scene 1 (8K)

Our Depth Prediction

Our Depth Prediction · Scene 1 (8K)

Our Normal Prediction

Our Normal Prediction · Scene 1 (8K)

👀 Interactive Comparison

Drag for interactive comparison.

In-the-Wild Samples

Depth Estimation

Surface Normal Estimation

In-Domain Samples from UnrealStereo4K

Depth Estimation

Surface Normal Estimation

Method

We introduce a multi-patch framework for high-resolution monocular geometry estimation, delivering sharp and globally consistent depth and surface normals at any resolution (e.g., 2K, 4K, 8K) from a single RGB image.
The main ideas are:

Reformulating high-resolution prediction as a multi-patch refinement task: we divide the input image into spatial patches, augment each patch with coarse depth and normal priors, and process all patches jointly with a unified transformer backbone.
Employing cross-patch attention with global positional encoding to propagate information across distant regions, enforcing seamless boundaries and coherent geometry across the entire image.
Introducing a Variable Multi-Patch Training (GridMix) strategy that samples different patch-grid configurations during training, improving robustness to image resolution and spatial layout and yielding strong zero-shot performance on real-world benchmarks.

Framework diagram

Framework

BibTeX

@inproceedings{cui2026resolutiongeometrymultiviewmultipatch,
  title={Any Resolution Any Geometry: From Multi-View To Multi-Patch},
  author={Cui, Wenqing and Li, Zhenyu and Lavreniuk, Mykola and Shi, Jian and Idoughi, Ramzi and Tang, Xiangjun and Wonka, Peter},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}