Drop The GAN: In Defense of Patch Nearest Neighbors as Single Image Generative Models
Accepted to CVPR 2022Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, Michal Irani
[arxiv] [Code]
Abstract
Image manipulation dates back long before the deep learning era. The classical prevailing approaches were based on maximizing patch similarity between the input and generated output. Recently, single-image GANs were introduced as a superior and more sophisticated solution to image manipulation tasks. Moreover, they offered the opportunity not only to manipulate a given image, but also to generate a large and diverse set of different outputs from a single natural image. This gave rise to new tasks, which are considered "GAN-only". However, despite their impressiveness, single-image GANs require long training time (usually hours) for each image and each task and often suffer from visual artifacts. In this paper we revisit the classical patch-based methods, and show that – unlike previously believed – classical methods can be adapted to tackle these novel "GAN-only" tasks. Moreover, they do so better and faster than single-image GAN-based methods. More specifically, we show that: (i) by introducing slight modifications, classical patch-based methods are able to unconditionally generate diverse images based on a single natural image; (ii) the generated output visual quality exceeds that of single-image GANs by a large margin (confirmed both quantitatively and qualitatively); (iii) they are orders of magnitude faster (runtime reduced from hours to seconds).
Supplementary Material
1. Diverse image generation based on a single image
2. Conditional Inpainting
3. Structural Analogies
4. Retargeting
5. Collage
6. Editing
7. User-Study Images
8. Runtime and Memory
9. Implementation Details
1. Diverse Image Generation Based on a Single Image
- GPNN generates many realistic and visually appealing images based on a single image.
- We show 50 random generations per image for a few sample images, both for GPNN and SinGAN [23].
- The runtime of GPNN (ours) is ~2sec, while the runtime of SinGAN [23] is ~1hr.
Single Source Image
![]()
GPNN (Ours)
![]()
SinGAN [23]
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
2. Conditional Inpainting
- We present more examples for the conditional inpainting task.
Source Image
![]()
Source Image
![]()
Source Image
![]()
Source Image
![]()
Source Image
![]()
Source Image
![]()
Source Image
![]()
Source Image
![]()
3. Analogies
- Additional examples for Image-to-Image translation (structural analogies, sketch-to-image). GPNN results are compared with [3].
- Please use the buttons on the right to flicker between the images (GPNN (Ours), Structural Analogies [3]).
- The runtime of GPNN (ours) is ~10sec, while the runtime of Structural Analogies [3] is ~8.5hrs.
A to B
![]()
A to B
![]()
4. Retargeting
- Additional examples for Retargeting. GPNN results are compared with [24, 36].
- Please use the buttons on the right to flicker between the results (GPNN (Ours), InGAN [24], Bidirectional Similarity [26]).
- The runtime of GPNN (ours) is 10-30sec, while the runtime of InGAN [24] is 1-4hrs and the runtime of Bidirectional Similarity [26] is ~3min.
Source
![]()
GPNN (Ours)
![]()
Source
![]()
Ours
![]()
Source
![]()
GPNN (Ours)
![]()
Ours
![]()
Source
![]()
GPNN (Ours)
![]()
5. Collage
- We present more examples for the collage task. Our results are compared with Bidirectional Similarity [26].
- The runtime of GPNN (ours) is ~20sec, while the runtime of Bidirectional Similarity [26] is ~3min.
6. Editing
- We present more examples for the editing task, compared with [23, 26].
7. User-Study Images
- Please view the user-study images and details in AMT.pdf
8. Runtime and Memory
- The computational bottleneck in GPNN is computing the distances matrix Di,j, the normalized scores Si,j and finding nearest neighbors (steps 2-4 in Fig.3 in the main text). Given n query patches and m key-value pairs of patches, these matrices are of size n×m. GPNN utilizes a GPU to compute these entries in parallel, for speed. Furthermore, GPNN exploits the fact that each operation uses only 1 row/column, and stores only small parts of the matrix at a time. This reduces the memory footprint of GPNN from O(n·m) to O(n+m). We acknowledge that approximate nearest-neighbor-search algorithms (e.g., PatchMatch [2]) have better asymptotic complexity of O(n+m). While our current implementation has a runtime of O(n·m), it is part of our future plans to incorporate approximate-nearest-neighbors search into GPNN, to further reduce its runtime.
9. Implementation Details
- The patch size is 7×7 and is fixed at all scales. Like [23], the pyramid depth N is chosen such that the patch size is half of the image height. The downscaling factor r is 4/3, and all down/up-sampling operations are done using bicubical kernel. PNN is applied 10 times at each scale. We set the noise std σ to be 0.75.
Relevant references:
[2] Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. ACM Trans. Graph.,28(3):24, 2009.
[3] Sagie Benaim, Ron Mokady, Amit Bermano, and L Wolf. Structural Analogy from a Single Image Pair. In Computer Graphics Forum. Wiley Online Library, 2020.
[23] Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. SinGAN: Learning a Generative Model from a Single Natural Image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4570-4580, 2019.
[24] Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. InGAN: Capturing and Remapping the "DNA" of a Natural Image. In arXiv, 2019.
[26] Denis Simakov, Yaron Caspi, Eli Shechtman, and Michal Irani. Summarizing Visual Data Using Bidirectional Similarity. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008.