Information Bottleneck in Deep Learning - A Semiotic Approach
Keywords:deep learning, information bottleneck, semiotics
The information bottleneck principle was recently proposed as a theory meant to explain some of the training dynamics of deep neural architectures. Via information plane analysis, patterns start to emerge in this framework, where two phases can be distinguished: fitting and compression. We take a step further and study the behaviour of the spatial entropy characterizing the layers of convolutional neural networks (CNNs), in relation to the information bottleneck theory. We observe pattern formations which resemble the information bottleneck fitting and compression phases. From the perspective of semiotics, also known as the study of signs and sign-using behavior, the saliency maps of CNN’s layers exhibit aggregations: signs are aggregated into supersigns and this process is called semiotic superization. Superization can be characterized by a decrease of entropy and interpreted as information concentration. We discuss the information bottleneck principle from the perspective of semiotic superization and discover very interesting analogies related to the informational adaptation of the model. In a practical application, we introduce a modification of the CNN training process: we progressively freeze the layers with small entropy variation of their saliency map representation. Such layers can be stopped earlier from training without a significant impact on the performance (the accuracy) of the network, connecting the entropy evolution through time with the training dynamics of a network.
 Rana Ali Amjad and Bernhard Geiger. Learning representations for neural network-based classification using the information bottleneck principle. (submitted to) IEEE Transactions on Pattern Analysis and Machine Intelligence, PP, 02 2018.
 Adar Elad, Doron Haviv, Yochai Blau, and Tomer Michaeli. Direct validation of the information bottleneck principle for deep nets. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 758-762, 2019. https://doi.org/10.1109/ICCVW.2019.00099
 Ziv Goldfeld and Yury Polyanskiy. The information bottleneck problem and its applications in machine learning. CoRR, abs/2004.14941, 2020.
 Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples, 2015.
 Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010.
 Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, and Geoffrey E. Hinton. Regularizing neural networks by penalizing confident output distributions. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.
 Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, and David Daniel Cox. On the information bottleneck theory of deep learning. In International Conference on Learning Representations, 2018.
 Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. CoRR, abs/1703.00810, 2017.
 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929-1958, 2014.
 Razvan Andonie, "Semiotic aggregation in computer vision," Revue roumaine de linguistique, Cahiers de linguistique théorique et appliquée, vol. 24, pp. 103-107, 1987.
 Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
 Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368-377, 1999.
 Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pages 1-5, 2015. https://doi.org/10.1109/ITW.2015.7133169
 Kristoffer Wickstrí¸m, Sigurd Lí¸kse, Michael Kampffmeyer, Shujian Yu, Jose Principe, and Robert Jenssen. Information plane analysis of deep neural networks via matrix-based renyi's entropy and tensor kernels, 2019.
 Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto, 05 2012.
 Bernhard C. Geiger. On information plane analyses of neural network classifiers-a review. IEEE Transactions on Neural Networks and Learning Systems, pages 1-13, 2021. https://doi.org/10.1109/TNNLS.2021.3089037
 R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, "Grad-CAM: Why did you say that? Visual explanations from deep networks via gradient-based localization," CoRR, vol. abs/1610.02391, 2016. [Online]. Available: http://arxiv.org/abs/1610.02391 https://doi.org/10.1109/ICCV.2017.74
 Bogdan Musat and Razvan Andonie. Semiotic aggregation in deep learning. Entropy, 22(12), 2020. https://doi.org/10.3390/e22121365
 E. Volden, G. Giraudon, and M. Berthod, "Modelling image redundancy," in 1995 International Geoscience and Remote Sensing Symposium, IGARSS '95. Quantitative Remote Sensing for Science and Applications, vol. 3, 1995, pp. 2148-2150.
 Timor Kadir and Michael Brady. Saliency, scale and image description. International Journal of Computer Vision, 45:83-105, 11 2001. https://doi.org/10.1023/A:1012460413855
 M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," CoRR, vol. abs/1311.2901, 2013. [Online]. Available: http://arxiv.org/abs/1311.2901
 K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep inside convolutional networks: Visualising image classification models and saliency maps." CoRR, vol. abs/1312.6034, 2013. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/corr1312.html#SimonyanVZ13
 D. Smilkov, N. Thorat, B. Kim, F. B. Viégas, and M. Wattenberg, "SmoothGrad: removing noise by adding noise," CoRR, vol. abs/1706.03825, 2017. [Online]. Available: http://arxiv. org/abs/1706.03825
 A. Mahdi, J. Qin, and G. Crosby, "DeepFeat: A bottom-up and top-down saliency model based on deep features of convolutional neural networks," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 1, pp. 54-63, 2020. https://doi.org/10.1109/TCDS.2019.2894561
 A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," in Advances in Neural Information Processing Systems 32, H.Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8026-8037. [Online]. Available: http://papers.nips.cc/paper/ 9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
 Asano YM., Rupprecht C., and Vedaldi A. A critical analysis of self-supervision, or what we can learn from a single image. In International Conference on Learning Representations, 2020.
 Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," CoRR, vol. abs/1409.4842, 2014. [Online]. Available: http://arxiv.org/abs/1409.4842 https://doi.org/10.1109/CVPR.2015.7298594
 Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing, 2018. https://doi.org/10.1109/MCI.2018.2840738
 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015. [Online]. Available: http://arxiv.org/abs/1512.03385
 M. Tan and Q. V. Le, "EfficientNet: Rethinking model scaling for convolutional neural networks," CoRR, vol. abs/1905.11946, 2019. [Online]. Available: http://arxiv.org/abs/1905.11946
 Charlie Nash, Nate Kushman, and Christopher K.I. Williams. Inverting supervised representations with autoregressive neural density models. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1620-1629. PMLR, 16-18 Apr 2019.
 Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, and Yury Polyanskiy. Estimating information flow in deep neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2299-2308. PMLR, 09-15 Jun 2019.
 Yoav Goldberg and Graeme Hirst. Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers, 2017.
 Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios Protopapadakis. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018:1-13, 02 2018. https://doi.org/10.1155/2018/7068349
 Niall O' Mahony, Sean Campbell, Anderson Carvalho, Suman Harapanahalli, Gustavo Adolfo Velasco-Hernández, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. Deep learning vs. traditional computer vision. CoRR, abs/1910.13796, 2019 https://doi.org/10.1007/978-3-030-17795-9_10
 Hai Nguyen and Hung La. Review of deep reinforcement learning for robot manipulation. In 2019 Third IEEE International Conference on Robotic Computing (IRC), pages 590-595, 2019. https://doi.org/10.1109/IRC.2019.00120
 Harry A. Pierson and Michael S. Gashler. Deep learning in robotics: A review of recent research, 2017. https://doi.org/10.1080/01691864.2017.1365009
 A. G. Journel and C. V. Deutsch, "Entropy and spatial disorder," Mathematical Geology, vol. 25, no. 3, pp. 329-355, 1993. https://doi.org/10.1007/BF00901422
 Ekaba Bisong. Google Colaboratory, pages 59-64. Apress, Berkeley, CA, 2019. https://doi.org/10.1007/978-1-4842-4470-8_7
 Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. CoRR, abs/1608.06993, 2016. https://doi.org/10.1109/CVPR.2017.243
 Paul Parsons and Kamran Sedig. Common visualizations: Their cognitive utility. In Handbook of Human Centric Visualization, pages 671-691. Springer, 2014. https://doi.org/10.1007/978-1-4614-7485-2_27
 Charles S. Peirce, Collected papers of charles sanders peirce. Harvard University Press, 1960, vol. 2.
 Razvan Andonie, "A semiotic approach to hierarchical computer vision," in Cybernetics and Systems (Proceedings of the Seventh International Congress of Cybernetics and Systems, London, Sept. 7-11, 1987), J. Ross, Ed. Lytham St. Annes, U.K.: Thales Publication, 1987, pp. 930-933.
 Max Bense, Semiotische Prozesse und Systeme in Wissenschaftstheorie und Design, í„sthetik und Mathematik. Baden-Baden: Agis-Verlag, 1975.
 Helmar Frank, Kybernetische Grundlagen der Pí¤dagogik: eine Einführung in die Informationspsychologie und ihre philosophischen, mathematischen und physiologischen Grundlagen, second edition ed. Baden-Baden: Agis-Verlag, 1969.
 Ioan Stan and Razvan Andonie, "Cybernetical model of the artist-consumer relationship (in Romanian)," Studia Universitatis Babes-Bolyai, vol. 2, pp. 9-15, 1977.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.