In Defense of Patch Nearest Neighbors as as Single Image Generative Models

5 min read Original article ↗

Drop The GAN: In Defense of Patch Nearest Neighbors as Single Image Generative Models
Accepted to CVPR 2022

Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, Michal Irani

[arxiv] [Code]


Abstract

Image manipulation dates back long before the deep learning era. The classical prevailing approaches were based on maximizing patch similarity between the input and generated output. Recently, single-image GANs were introduced as a superior and more sophisticated solution to image manipulation tasks. Moreover, they offered the opportunity not only to manipulate a given image, but also to generate a large and diverse set of different outputs from a single natural image. This gave rise to new tasks, which are considered "GAN-only". However, despite their impressiveness, single-image GANs require long training time (usually hours) for each image and each task and often suffer from visual artifacts. In this paper we revisit the classical patch-based methods, and show that – unlike previously believed – classical methods can be adapted to tackle these novel "GAN-only" tasks. Moreover, they do so better and faster than single-image GAN-based methods. More specifically, we show that: (i) by introducing slight modifications, classical patch-based methods are able to unconditionally generate diverse images based on a single natural image; (ii) the generated output visual quality exceeds that of single-image GANs by a large margin (confirmed both quantitatively and qualitatively); (iii) they are orders of magnitude faster (runtime reduced from hours to seconds).

Supplementary Material

1. Diverse image generation based on a single image
2. Conditional Inpainting
3. Structural Analogies
4. Retargeting
5. Collage
6. Editing
7. User-Study Images
8. Runtime and Memory
9. Implementation Details


1. Diverse Image Generation Based on a Single Image
  • GPNN generates many realistic and visually appealing images based on a single image.
  • We show 50 random generations per image for a few sample images, both for GPNN and SinGAN [23].
  • The runtime of GPNN (ours) is ~2sec, while the runtime of SinGAN [23] is ~1hr.
Single Source Image
Input image
GPNN (Ours)
Randomly generated images
SinGAN [23]
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images
Input image
Randomly generated images
Randomly generated images

2. Conditional Inpainting
  • We present more examples for the conditional inpainting task.
Source Image
Source Image



Source Image
Source Image



Source Image
Source Image



Source Image
Source Image



Source Image
Source Image



Source Image
Source Image



Source Image
Source Image



Source Image
Source Image




3. Analogies
  • Additional examples for Image-to-Image translation (structural analogies, sketch-to-image). GPNN results are compared with [3].
  • Please use the buttons on the right to flicker between the images (GPNN (Ours), Structural Analogies [3]).
  • The runtime of GPNN (ours) is ~10sec, while the runtime of Structural Analogies [3] is ~8.5hrs.





A to B
A to B



A to B
A to B




4. Retargeting
  • Additional examples for Retargeting. GPNN results are compared with [24, 36].
  • Please use the buttons on the right to flicker between the results (GPNN (Ours), InGAN [24], Bidirectional Similarity [26]).
  • The runtime of GPNN (ours) is 10-30sec, while the runtime of InGAN [24] is 1-4hrs and the runtime of Bidirectional Similarity [26] is ~3min.

Source
Source Image

GPNN (Ours)
GPNN (Ours)

Source
Source Image

Ours
Ours
Source
Source Image
GPNN (Ours)
GPNN (Ours)


Ours
Ours



Source
Source Image

GPNN (Ours)
GPNN (Ours)

5. Collage
  • We present more examples for the collage task. Our results are compared with Bidirectional Similarity [26].
  • The runtime of GPNN (ours) is ~20sec, while the runtime of Bidirectional Similarity [26] is ~3min.

6. Editing
  • We present more examples for the editing task, compared with [23, 26].

7. User-Study Images
  • Please view the user-study images and details in AMT.pdf

8. Runtime and Memory
  • The computational bottleneck in GPNN is computing the distances matrix Di,j, the normalized scores Si,j and finding nearest neighbors (steps 2-4 in Fig.3 in the main text). Given n query patches and m key-value pairs of patches, these matrices are of size n×m. GPNN utilizes a GPU to compute these entries in parallel, for speed. Furthermore, GPNN exploits the fact that each operation uses only 1 row/column, and stores only small parts of the matrix at a time. This reduces the memory footprint of GPNN from O(n·m) to O(n+m). We acknowledge that approximate nearest-neighbor-search algorithms (e.g., PatchMatch [2]) have better asymptotic complexity of O(n+m). While our current implementation has a runtime of O(n·m), it is part of our future plans to incorporate approximate-nearest-neighbors search into GPNN, to further reduce its runtime.

9. Implementation Details
  • The patch size is 7×7 and is fixed at all scales. Like [23], the pyramid depth N is chosen such that the patch size is half of the image height. The downscaling factor r is 4/3, and all down/up-sampling operations are done using bicubical kernel. PNN is applied 10 times at each scale. We set the noise std σ to be 0.75.


Relevant references:
[2] Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. ACM Trans. Graph.,28(3):24, 2009.
[3] Sagie Benaim, Ron Mokady, Amit Bermano, and L Wolf. Structural Analogy from a Single Image Pair. In Computer Graphics Forum. Wiley Online Library, 2020.
[23] Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. SinGAN: Learning a Generative Model from a Single Natural Image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4570-4580, 2019.
[24] Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. InGAN: Capturing and Remapping the "DNA" of a Natural Image. In arXiv, 2019.
[26] Denis Simakov, Yaron Caspi, Eli Shechtman, and Michal Irani. Summarizing Visual Data Using Bidirectional Similarity. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008.