Pose Manipulation with Identity Preservation



poseapose manipulation, image generation, adaptive normalization, Generative Adversarial Network manipulation, Generative Adversarial Network


This paper describes a new model which generates images in novel poses e.g. by altering face expression and orientation, from just a few instances of a human subject. Unlike previous approaches which require large datasets of a specific person for training, our approach may start from a scarce set of images, even from a single image. To this end, we introduce Character Adaptive Identity Normalization GAN (CainGAN) which uses spatial characteristic features extracted by an embedder and combined across source images. The identity information is propagated throughout the network by applying conditional normalization. After extensive adversarial training, CainGAN receives figures of faces from a certain individual and produces new ones while preserving the person’s identity. Experimental results show that the quality of generated images scales with the size of the input set used during inference. Furthermore, quantitative measurements indicate that CainGAN performs better compared to other methods when training data is limited.


Albuquerque, I.; Monteiro, J.; Doan T.; Considine B.; Falk T.; Mitliagkas I. (2019). Multiobjective training of Generative Adversarial Networks with multiple discriminators, arXiv preprint arXiv:1901.08680, 2019.

Blanz, V.; Vetter, T. (1999). A Morphable Model for the Synthesis of 3D Faces, Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187-194, 1999. https://doi.org/10.1145/311535.311556

Bulat, A.; Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks), International Conference on Computer Vision, 2017. https://doi.org/10.1109/ICCV.2017.116

Chen, L.; Li, Z.; Maddox, R.K.; Duan, Z.; Xu, C. (2018). Lip movements generation at a glance, Proceedings of the European Conference on Computer Vision (ECCV), 520-535, 2018. https://doi.org/10.1007/978-3-030-01234-2_32

Chen, L.; Zheng, H.; Maddox, R.K.; Duan, Z.; Xu, C. (2019). Sound to Visual: Hierarchical Cross-Modal Talking Face Video Generation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops, 2019. https://doi.org/10.1109/CVPR.2019.00802

Chen, T.; Lucic, M.; Houlsby, N.; Gelly, S. (2019). On Self Modulation for Generative Adversarial Networks, International Conference on Learning Representations, 2019.

Chung, J. S.; Nagrani, A.; Zisserman, A. (2018). VoxCeleb2: Deep Speaker Recognition, INTERSPEECH, 2018 https://doi.org/10.21437/Interspeech.2018-1929

Durugkar I. P.; Gemp, I.; Mahadevan, S. (2016). Generative Multi-Adversarial Networks, arXiv preprint arXiv:1611.01673, 2016.

Finn, C.; Abbeel, P.; Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks, Proceedings of the 34th International Conference on Machine Learning-Volume 70, 1126-1135, 2017.

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. (2014). Generative Adversarial Nets, Advances in Neural Information Processing Systems 27, 2672-2680, 2014.

Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium, Advances in Neural Information Processing Systems, 6626-6637, 2017.

Huang, X.; Belongie, S. (2017). Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization, Proceedings of the IEEE International Conference on Computer Vision, 1501- 1510, 2017. https://doi.org/10.1109/ICCV.2017.167

Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. (2017). Image-to-Image Translation with Conditional Adversarial Networks, Proceedings of the IEEE conference on computer vision and pattern recognition, 1125-1134, 2017. https://doi.org/10.1109/CVPR.2017.632

Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation, International Conference on Learning Representations, 2018.

Karras, T.; Laine, S.; Aila, T. (2018). A Style-Based Generator Architecture for Generative Adversarial Networks, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4396-4405, 2018. https://doi.org/10.1109/CVPR.2019.00453

Lim, J.H.; Ye, J.C. (2017), Geometric gan, arXiv preprint arXiv:1705.02894, 2017.

Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. (2018). Spectral Normalization for Generative Adversarial Networks, arXiv preprint arXiv:1802.05957, 2018.

Nguyen, T.; Le, T.; Vu, H.; Phung, D. (2017). Dual Discriminator Generative Adversarial Nets, Advances in Neural Information Processing Systems, 2670-2680, 2017.

Park, T.; Liu M.; Wang T.C.; Zhu, J.Y. (2019), Semantic Image Synthesis with Spatially-Adaptive Normalization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2337-2346, 2019. https://doi.org/10.1109/CVPR.2019.00244

Simonyan, K.; Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.

Song, Y.; Zhu, J.; Li, D.; Wang, A.; Qi, H. (2019). Talking Face Generation by Conditional Recurrent Adversarial Network, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 919-925, 2019. https://doi.org/10.24963/ijcai.2019/129

Suwajanakorn, S; Seitz, S.; Kemelmacher, I. (2017). Synthesizing Obama: learning lip sync from audio, ACM Transactions on Graphics, 36, 1-13, 2017. https://doi.org/10.1145/3072959.3073640

Thies, J.; Zollhöfer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. (2018). Face2Face: Real-time face capture and reenactment of RGB videos, Communications of the ACM, 62, 96-104, 2018. https://doi.org/10.1145/3292039

Ulyanov, D.; Vedaldi, A.; Lempitsky, V. (2016). Instance normalization: The missing ingredient for fast stylization, arXiv preprint arXiv:1607.08022, 2016.

Wang T.C.; Liu M.Y.; Zhu J.Y.; Tao A.; Kautz J.; Catanzaro B. (2018). High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8798-8807, 2018. https://doi.org/10.1109/CVPR.2018.00917

Wiles, O.; Koepke, A.S.; Zisserman, A. (2018). X2Face: A network for controlling face generation, European Conference on Computer Vision 670-686, 2018. https://doi.org/10.1007/978-3-030-01261-8_41

Yuan, X.; Park, I.K., (2019). Face De-occlusion using 3D Morphable Model and Generative Adversarial Network, Proceedings of the IEEE International Conference on Computer Vision, 10062-10071, 2019. https://doi.org/10.1109/ICCV.2019.01016

Zakharov, E.; Shysheya, A.; Burkov, E.; Lempitsky, V. (2019). Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, arXiv preprint arXiv:1905.08233, 2019. https://doi.org/10.1109/ICCV.2019.00955

Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. (2018), Self-Attention Generative Adversarial Networks, arXiv preprint arXiv:1805.08318, 2018

Zhou, H.; Liu, Y.; Liu, Z.; Luo, P.; Wang, X. (2019). Talking Face Generation by Adversarially Disentangled Audio-Visual Representation, AAAI Conference on Artificial Intelligence, 33, 9299- 9306, 2019 https://doi.org/10.1609/aaai.v33i01.33019299



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.