CVPR 2026
1KAUST, 2Space Research Institute NASU-SSAU
*Equal contribution
On in-the-wild high-resolution images, our method produces depth and normal maps with sharp boundaries and globally consistent geometry. Hover over the images to zoom in, and use the magnification slider above to adjust the magnification level and inspect the improved fine-detail preservation and depth–normal consistency across all predictions.
Magnification:1.5×
Drag to adjust zoom level

Input RGB · Scene 1 (8K)

Our Depth Prediction · Scene 1 (8K)

Our Normal Prediction · Scene 1 (8K)
👀 Interactive Comparison
Drag for interactive comparison.
In-the-Wild Samples
Depth Estimation
Surface Normal Estimation
In-Domain Samples from UnrealStereo4K
Depth Estimation
Surface Normal Estimation
Method
We introduce a multi-patch framework for high-resolution monocular geometry estimation, delivering sharp and globally consistent depth and surface normals at any resolution (e.g., 2K, 4K, 8K) from a single RGB image.
The main ideas are:
- Reformulating high-resolution prediction as a multi-patch refinement task: we divide the input image into spatial patches, augment each patch with coarse depth and normal priors, and process all patches jointly with a unified transformer backbone.
- Employing cross-patch attention with global positional encoding to propagate information across distant regions, enforcing seamless boundaries and coherent geometry across the entire image.
- Introducing a Variable Multi-Patch Training (GridMix) strategy that samples different patch-grid configurations during training, improving robustness to image resolution and spatial layout and yielding strong zero-shot performance on real-world benchmarks.

Framework
BibTeX
@inproceedings{cui2026resolutiongeometrymultiviewmultipatch,
title={Any Resolution Any Geometry: From Multi-View To Multi-Patch},
author={Cui, Wenqing and Li, Zhenyu and Lavreniuk, Mykola and Shi, Jian and Idoughi, Ramzi and Tang, Xiangjun and Wonka, Peter},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}