Deep Automodulators

TL;DR - Instantaneous semantic manipulation of existing images with scale-specific feature mixing.

The grand goal of Deep Generative Models is to automatically learn semantic “knobs” for content design and smart editing. In image space, this means e.g. letting you simply adjust the “male-ness knob” of face, instead of operating those pixels. Similar knobs will eventually be available for designing 3D structures, proteins, artificial organs etc.

Here, we present the automodulator - an autoencoder with scale-specific control of outputs and the ability to mix several input images into a single fusion image. Though you can do the same with random images in e.g. StyleGAN, it is extremely slow if you try to work on existing images, with even a fast GPU requiring several minutes for just a single image. Automodulator allows the same instantaneously, as it is an autoencoder, not a GAN.

The quality? See below and judge for yourself. Roughly speaking, ~1/3 of typical face images appear to work as well as these ones.

Get started instantly with our Colab notebook and share your results!

Fig. Apply coarse features of the top row 512x512 inputs on the left-most column inputs. Diagonals contain the ‘traditional’ reconstruction. For e.g. cars, see the Appendix of the paper. For e.g. attribute editing, see the Notebooks.

Abstract

We introduce a new family of generative neural network models called automodulators. These autoencoder-like networks can faithfully reproduce individual real-world input images like autoencoders, and also generate a fused sample from an arbitrary combination of several such images, allowing ‘style-mixing’ and other new applications. An automodulator decouples the data flow of decoder operations from statistical properties thereof and uses the latent vector to modulate the former by the latter, with a principled approach for mutual disentanglement of decoder layers. This is the first general-purpose model to successfully apply this principle on existing input images, whereas prior work has focused on random sampling in GANs. We introduce novel techniques for stable unsupervised training of the model on four high-resolution data sets. Besides style-mixing, we show state-of-the-art results in autoencoder comparison, and visual image quality nearly indistinguishable from state-of-the-art GANs. We expect the automodulator variants to become a useful building block for image applications and other data domains.

Materials

Paper
PyTorch Code
Pre-trained models
Colab Notebook
Jupyter Notebook

Support

For all correspondence, please contact ari.heljakka@aalto.fi.

Referencing

Please cite our work as follows:

@article{Heljakka+Solin+Kannala:2020,
      title = {Deep Automodulators},
     author = {Heljakka, Ari and Hou, Yuxin and Kannala, Juho and Solin, Arno},
    journal = {arXiv preprint arXiv:1912.10321},
       year = {2020}
}