Information Bottleneck in Deep Learning - A Semiotic Approach


  • Bogdan Musat Transilvania University of Brasov, Romania
  • Razvan Andonie Central Washington University, USA



deep learning, information bottleneck, semiotics


The information bottleneck principle was recently proposed as a theory meant to explain some of the training dynamics of deep neural architectures. Via information plane analysis, patterns start to emerge in this framework, where two phases can be distinguished: fitting and compression. We take a step further and study the behaviour of the spatial entropy characterizing the layers of convolutional neural networks (CNNs), in relation to the information bottleneck theory. We observe pattern formations which resemble the information bottleneck fitting and compression phases. From the perspective of semiotics, also known as the study of signs and sign-using behavior, the saliency maps of CNN’s layers exhibit aggregations: signs are aggregated into supersigns and this process is called semiotic superization. Superization can be characterized by a decrease of entropy and interpreted as information concentration. We discuss the information bottleneck principle from the perspective of semiotic superization and discover very interesting analogies related to the informational adaptation of the model. In a practical application, we introduce a modification of the CNN training process: we progressively freeze the layers with small entropy variation of their saliency map representation. Such layers can be stopped earlier from training without a significant impact on the performance (the accuracy) of the network, connecting the entropy evolution through time with the training dynamics of a network.

Author Biography

Razvan Andonie, Central Washington University, USA

Executive Editor


[1] Alex Alemi, Ian Fischer, Josh Dillon, and Kevin Murphy. Deep variational information bottleneck. In ICLR, 2017.

[2] Rana Ali Amjad and Bernhard Geiger. Learning representations for neural network-based classification using the information bottleneck principle. (submitted to) IEEE Transactions on Pattern Analysis and Machine Intelligence, PP, 02 2018.

[3] Adar Elad, Doron Haviv, Yochai Blau, and Tomer Michaeli. Direct validation of the information bottleneck principle for deep nets. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 758-762, 2019.

[4] Ziv Goldfeld and Yury Polyanskiy. The information bottleneck problem and its applications in machine learning. CoRR, abs/2004.14941, 2020.

[5] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples, 2015.

[6] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010.

[7] Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, and Geoffrey E. Hinton. Regularizing neural networks by penalizing confident output distributions. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings., 2017.

[8] Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, and David Daniel Cox. On the information bottleneck theory of deep learning. In International Conference on Learning Representations, 2018.

[9] Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. CoRR, abs/1703.00810, 2017.

[10] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929-1958, 2014.

[11] Razvan Andonie, "Semiotic aggregation in computer vision," Revue roumaine de linguistique, Cahiers de linguistique théorique et appliquée, vol. 24, pp. 103-107, 1987.

[12] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.

[13] Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368-377, 1999.

[14] Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pages 1-5, 2015.

[15] Kristoffer Wickstrí¸m, Sigurd Lí¸kse, Michael Kampffmeyer, Shujian Yu, Jose Principe, and Robert Jenssen. Information plane analysis of deep neural networks via matrix-based renyi's entropy and tensor kernels, 2019.

[16] Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto, 05 2012.

[17] Bernhard C. Geiger. On information plane analyses of neural network classifiers-a review. IEEE Transactions on Neural Networks and Learning Systems, pages 1-13, 2021.

[18] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, "Grad-CAM: Why did you say that? Visual explanations from deep networks via gradient-based localization," CoRR, vol. abs/1610.02391, 2016. [Online]. Available:

[19] Bogdan Musat and Razvan Andonie. Semiotic aggregation in deep learning. Entropy, 22(12), 2020.

[20] E. Volden, G. Giraudon, and M. Berthod, "Modelling image redundancy," in 1995 International Geoscience and Remote Sensing Symposium, IGARSS '95. Quantitative Remote Sensing for Science and Applications, vol. 3, 1995, pp. 2148-2150.

[21] Timor Kadir and Michael Brady. Saliency, scale and image description. International Journal of Computer Vision, 45:83-105, 11 2001.

[22] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," CoRR, vol. abs/1311.2901, 2013. [Online]. Available:

[23] K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep inside convolutional networks: Visualising image classification models and saliency maps." CoRR, vol. abs/1312.6034, 2013. [Online]. Available:

[24] D. Smilkov, N. Thorat, B. Kim, F. B. Viégas, and M. Wattenberg, "SmoothGrad: removing noise by adding noise," CoRR, vol. abs/1706.03825, 2017. [Online]. Available: http://arxiv. org/abs/1706.03825

[25] A. Mahdi, J. Qin, and G. Crosby, "DeepFeat: A bottom-up and top-down saliency model based on deep features of convolutional neural networks," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 1, pp. 54-63, 2020.

[26] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," in Advances in Neural Information Processing Systems 32, H.Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8026-8037. [Online]. Available: 9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

[27] Asano YM., Rupprecht C., and Vedaldi A. A critical analysis of self-supervision, or what we can learn from a single image. In International Conference on Learning Representations, 2020.

[28] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.

[29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," CoRR, vol. abs/1409.4842, 2014. [Online]. Available:

[30] Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing, 2018.

[31] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," CoRR, vol. abs/1512.03385, 2015. [Online]. Available:

[32] M. Tan and Q. V. Le, "EfficientNet: Rethinking model scaling for convolutional neural networks," CoRR, vol. abs/1905.11946, 2019. [Online]. Available:

[33] Charlie Nash, Nate Kushman, and Christopher K.I. Williams. Inverting supervised representations with autoregressive neural density models. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1620-1629. PMLR, 16-18 Apr 2019.

[34] Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, and Yury Polyanskiy. Estimating information flow in deep neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2299-2308. PMLR, 09-15 Jun 2019.

[35] Yoav Goldberg and Graeme Hirst. Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers, 2017.

[36] Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios Protopapadakis. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018:1-13, 02 2018.

[37] Niall O' Mahony, Sean Campbell, Anderson Carvalho, Suman Harapanahalli, Gustavo Adolfo Velasco-Hernández, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. Deep learning vs. traditional computer vision. CoRR, abs/1910.13796, 2019

[38] Hai Nguyen and Hung La. Review of deep reinforcement learning for robot manipulation. In 2019 Third IEEE International Conference on Robotic Computing (IRC), pages 590-595, 2019.

[39] Harry A. Pierson and Michael S. Gashler. Deep learning in robotics: A review of recent research, 2017.

[40] A. G. Journel and C. V. Deutsch, "Entropy and spatial disorder," Mathematical Geology, vol. 25, no. 3, pp. 329-355, 1993.

[41] Ekaba Bisong. Google Colaboratory, pages 59-64. Apress, Berkeley, CA, 2019.

[42] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. CoRR, abs/1608.06993, 2016.

[43] Paul Parsons and Kamran Sedig. Common visualizations: Their cognitive utility. In Handbook of Human Centric Visualization, pages 671-691. Springer, 2014.

[44] Charles S. Peirce, Collected papers of charles sanders peirce. Harvard University Press, 1960, vol. 2.

[45] Razvan Andonie, "A semiotic approach to hierarchical computer vision," in Cybernetics and Systems (Proceedings of the Seventh International Congress of Cybernetics and Systems, London, Sept. 7-11, 1987), J. Ross, Ed. Lytham St. Annes, U.K.: Thales Publication, 1987, pp. 930-933.

[46] Max Bense, Semiotische Prozesse und Systeme in Wissenschaftstheorie und Design, í„sthetik und Mathematik. Baden-Baden: Agis-Verlag, 1975.

[47] Helmar Frank, Kybernetische Grundlagen der Pí¤dagogik: eine Einführung in die Informationspsychologie und ihre philosophischen, mathematischen und physiologischen Grundlagen, second edition ed. Baden-Baden: Agis-Verlag, 1969.

[48] Ioan Stan and Razvan Andonie, "Cybernetical model of the artist-consumer relationship (in Romanian)," Studia Universitatis Babes-Bolyai, vol. 2, pp. 9-15, 1977.

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.