Adversarial Generation of Continuous Images

CVPR 2021

Ivan Skorokhodov·Savva Ignatyev·Mohamed Elhoseiny

KAUSTSkoltech

# abstract

In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) — an MLP that predicts an RGB pixel value given its \((x,y)\) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative modulation and multi-scale INRs, and use them to build a state-of-the-art continuous image GAN. Previous attempts to adapt INRs for image generation were limited to MNIST-like datasets and do not scale to complex real-world data. Our proposed INR-GAN architecture improves the performance of continuous image generators by several times, greatly reducing the gap between continuous image GANs and pixel-based ones. Apart from that, we explore several exciting properties of the INR-based decoders, like out-of-the-box superresolution, meaningful image-space interpolation, accelerated inference of low-resolution images, an ability to extrapolate outside of image boundaries, and strong geometric prior.

Main idea

INR-based decoders (right) are structured differently from the convolutional ones (left). They are composed of a hypernetwork (a neural network which generates parameters for another neural network) and an MLP which produces an RGB value from the pixel coordinate. In our work, we introduced two techniques to make this parametrization much more efficient.

Properties

The key feature of INR-based decoders lies in their properties. In our paper, we explore several of them: image extrapolation, superresolution, meaningful interpolation, a strong geometric prior, and others.

Our INR-based decoder is capable of extrapolating beyond image boundaries without being trained to do so. Originally, we thought we were the first to show this; after the submission, we found that the authors of the COCO-GAN paper had demonstrated the same property.
INR-GAN has meaningful interpolations in the image space (i.e. in the parameter space of the INRs)
INR-based decoder can perform superresolution out-of-the-box by evaluating on a denser coordinate grid.
We fitted a linear model to predict face keypoints from latent codes and observed that it achieves much better performance for INR-GAN than for StyleGAN2. This shows that the keypoints (and hence other geometric information) are encoded in a less entangled form in INR-GAN.

Related work

CIPS is a contemporary work which also builds a large-scale INR-based GAN for image generation.

# bibtex

@InProceedings{inr-gan,
    author    = {Skorokhodov, Ivan and Ignatyev, Savva and Elhoseiny, Mohamed},
    title     = {Adversarial Generation of Continuous Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {10753-10764}
}
@inproceedings{cips,
    title={Image generators with conditionally-independent pixel synthesis},
    author={Anokhin, Ivan and Demochkin, Kirill and Khakhulin, Taras and Sterkin, Gleb and Lempitsky, Victor and Korzhenkov, Denis},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={14278--14287},
    year={2021}
}

← back