Pose Manipulation with Identity Preservation
Keywords:poseapose manipulation, image generation, adaptive normalization, Generative Adversarial Network manipulation, Generative Adversarial Network
AbstractThis paper describes a new model which generates images in novel poses e.g. by altering face expression and orientation, from just a few instances of a human subject. Unlike previous approaches which require large datasets of a specific person for training, our approach may start from a scarce set of images, even from a single image. To this end, we introduce Character Adaptive Identity Normalization GAN (CainGAN) which uses spatial characteristic features extracted by an embedder and combined across source images. The identity information is propagated throughout the network by applying conditional normalization. After extensive adversarial training, CainGAN receives figures of faces from a certain individual and produces new ones while preserving the person’s identity. Experimental results show that the quality of generated images scales with the size of the input set used during inference. Furthermore, quantitative measurements indicate that CainGAN performs better compared to other methods when training data is limited.
Albuquerque, I.; Monteiro, J.; Doan T.; Considine B.; Falk T.; Mitliagkas I. (2019). Multiobjective training of Generative Adversarial Networks with multiple discriminators, arXiv preprint arXiv:1901.08680, 2019.
Blanz, V.; Vetter, T. (1999). A Morphable Model for the Synthesis of 3D Faces, Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187-194, 1999. https://doi.org/10.1145/311535.311556
Bulat, A.; Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks), International Conference on Computer Vision, 2017. https://doi.org/10.1109/ICCV.2017.116
Chen, L.; Li, Z.; Maddox, R.K.; Duan, Z.; Xu, C. (2018). Lip movements generation at a glance, Proceedings of the European Conference on Computer Vision (ECCV), 520-535, 2018. https://doi.org/10.1007/978-3-030-01234-2_32
Chen, L.; Zheng, H.; Maddox, R.K.; Duan, Z.; Xu, C. (2019). Sound to Visual: Hierarchical Cross-Modal Talking Face Video Generation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops, 2019. https://doi.org/10.1109/CVPR.2019.00802
Chen, T.; Lucic, M.; Houlsby, N.; Gelly, S. (2019). On Self Modulation for Generative Adversarial Networks, International Conference on Learning Representations, 2019.
Chung, J. S.; Nagrani, A.; Zisserman, A. (2018). VoxCeleb2: Deep Speaker Recognition, INTERSPEECH, 2018 https://doi.org/10.21437/Interspeech.2018-1929
Durugkar I. P.; Gemp, I.; Mahadevan, S. (2016). Generative Multi-Adversarial Networks, arXiv preprint arXiv:1611.01673, 2016.
Finn, C.; Abbeel, P.; Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks, Proceedings of the 34th International Conference on Machine Learning-Volume 70, 1126-1135, 2017.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. (2014). Generative Adversarial Nets, Advances in Neural Information Processing Systems 27, 2672-2680, 2014.
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium, Advances in Neural Information Processing Systems, 6626-6637, 2017.
Huang, X.; Belongie, S. (2017). Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization, Proceedings of the IEEE International Conference on Computer Vision, 1501- 1510, 2017. https://doi.org/10.1109/ICCV.2017.167
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. (2017). Image-to-Image Translation with Conditional Adversarial Networks, Proceedings of the IEEE conference on computer vision and pattern recognition, 1125-1134, 2017. https://doi.org/10.1109/CVPR.2017.632
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. (2018). Progressive Growing of GANs for Improved Quality, Stability, and Variation, International Conference on Learning Representations, 2018.
Karras, T.; Laine, S.; Aila, T. (2018). A Style-Based Generator Architecture for Generative Adversarial Networks, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4396-4405, 2018. https://doi.org/10.1109/CVPR.2019.00453
Lim, J.H.; Ye, J.C. (2017), Geometric gan, arXiv preprint arXiv:1705.02894, 2017.
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. (2018). Spectral Normalization for Generative Adversarial Networks, arXiv preprint arXiv:1802.05957, 2018.
Nguyen, T.; Le, T.; Vu, H.; Phung, D. (2017). Dual Discriminator Generative Adversarial Nets, Advances in Neural Information Processing Systems, 2670-2680, 2017.
Park, T.; Liu M.; Wang T.C.; Zhu, J.Y. (2019), Semantic Image Synthesis with Spatially-Adaptive Normalization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2337-2346, 2019. https://doi.org/10.1109/CVPR.2019.00244
Simonyan, K.; Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
Song, Y.; Zhu, J.; Li, D.; Wang, A.; Qi, H. (2019). Talking Face Generation by Conditional Recurrent Adversarial Network, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, 919-925, 2019. https://doi.org/10.24963/ijcai.2019/129
Suwajanakorn, S; Seitz, S.; Kemelmacher, I. (2017). Synthesizing Obama: learning lip sync from audio, ACM Transactions on Graphics, 36, 1-13, 2017. https://doi.org/10.1145/3072959.3073640
Thies, J.; ZollhÃ¶fer, M.; Stamminger, M.; Theobalt, C.; NieÃŸner, M. (2018). Face2Face: Real-time face capture and reenactment of RGB videos, Communications of the ACM, 62, 96-104, 2018. https://doi.org/10.1145/3292039
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. (2016). Instance normalization: The missing ingredient for fast stylization, arXiv preprint arXiv:1607.08022, 2016.
Wang T.C.; Liu M.Y.; Zhu J.Y.; Tao A.; Kautz J.; Catanzaro B. (2018). High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8798-8807, 2018. https://doi.org/10.1109/CVPR.2018.00917
Wiles, O.; Koepke, A.S.; Zisserman, A. (2018). X2Face: A network for controlling face generation, European Conference on Computer Vision 670-686, 2018. https://doi.org/10.1007/978-3-030-01261-8_41
Yuan, X.; Park, I.K., (2019). Face De-occlusion using 3D Morphable Model and Generative Adversarial Network, Proceedings of the IEEE International Conference on Computer Vision, 10062-10071, 2019. https://doi.org/10.1109/ICCV.2019.01016
Zakharov, E.; Shysheya, A.; Burkov, E.; Lempitsky, V. (2019). Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, arXiv preprint arXiv:1905.08233, 2019. https://doi.org/10.1109/ICCV.2019.00955
Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. (2018), Self-Attention Generative Adversarial Networks, arXiv preprint arXiv:1805.08318, 2018
Zhou, H.; Liu, Y.; Liu, Z.; Luo, P.; Wang, X. (2019). Talking Face Generation by Adversarially Disentangled Audio-Visual Representation, AAAI Conference on Artificial Intelligence, 33, 9299- 9306, 2019 https://doi.org/10.1609/aaai.v33i01.33019299
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.