PIONEER is a generative neural network model that learns a well-structured representation of certain kinds of images, such as faces, completely unsupervised.

For demonstration, the model can be used to modify real input images in various smart ways as in figures below, without losing sharpness in the output. Importantly, the model is not in any way specialized for this kind of application. It first reconstructs the new faces from scratch to resemble the inputs, and then allows us to alter its internal representation to modify specific features.

This paper marks a jump in resolution, quality and preservation of identity in face images over the previous PIONEER incarnation, and makes the feature modification capability more explicit. While the results were demonstrated primarily on face data, the method is general. The model has a simple autoencoder architecture with no discriminator networks.

alt text alt text alt text alt text alt text alt text
alt text alt text alt text alt text alt text alt text
alt text alt text alt text alt text alt text alt text

Figure: For real input images (left), our model can change various features (reconstruct, smile on/off, switch sex, rotate, add sunglasses) that it has learnt in a fully unsupervised manner - no class information was used during training.

Abstract

We build on recent advances in progressively growing generative autoencoder models. These models can encode and reconstruct existing images, and generate novel ones, at resolutions comparable to Generative Adversarial Networks (GANs), while consisting only of a single encoder and decoder network. The ability to reconstruct and arbitrarily modify existing samples such as images separates autoencoder models from GANs, but the output quality of image autoencoders has remained inferior. The recently proposed PIONEER autoencoder can reconstruct faces in the 256x256 CelebAHQ dataset, but like IntroVAE, another recent method, it often loses the identity of the person in the process. We propose an improved and simplified version of PIONEER and show significantly improved quality and preservation of the face identity in CelebAHQ, both visually and quantitatively. We also show evidence of state-of-the-art disentanglement of the latent space of the model, both quantitatively and via realistic image feature manipulations. On the LSUN Bedrooms dataset, our model also improves the results of the original PIONEER. Overall, our results indicate that the PIONEER networks provide a way to photorealistic face manipulation.

Materials

Paper pre-print

Video show-casing how to gradually apply various transformations on new input images:

Code (PyTorch) with Pre-trained models: https://github.com/AaltoVision/balanced-pioneer

Support

For all correspondence, please contact ari.heljakka@aalto.fi.

Referencing

Please cite our work as follows:

@inproceedings{Heljakka+Solin+Kannala:2020,
      title = {Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders},
     author = {Heljakka, Ari and Solin, Arno and Kannala, Juho},
  booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
       year = {2020}
}