Transfer Entropy in Deep Neural Networks

Razvan Andonie; Angel Cataron; Adrian Moldovan

doi:10.15837/ijccc.2025.1.6904

Authors

Razvan Andonie (1) Department of Computer Science, Central Washington University, USA; (2) Department of Electronics and ComputersTransilvania, University of Braşov, Romania
Angel Cataron Department of Electronics and Computers, Transilvania University of Braşov, Romania
Adrian Moldovan Siemens Research and Predevelopment, Siemens SRL, Braşov, Romania

DOI:

https://doi.org/10.15837/ijccc.2025.1.6904

Keywords:

Transfer entropy, causality, deep learning, neural network explainability

Abstract

This paper explores the application of Transfer Entropy (TE) in deep neural networks as a tool to improve training efficiency and analyze causal information flow. TE is a measure of directed information transfer that captures nonlinear dependencies and temporal dynamics between system components. The study investigates the use of TE in optimizing learning in Convolutional Neural Networks and Graph Convolutional Neural Networks. We present case studies that demonstrate reduced training times and improved accuracy. In addition, we apply TE within the framework of the Information Bottleneck theory, providing an insight into the trade-off between compression and information preservation during the training of deep learning architectures. The results highlight TE’s potential for identifying causal features, improving explainability, and addressing challenges such as oversmoothing in Graph Convolutional Neural Networks. Although computational overhead and complexity pose challenges, the findings emphasize the role of TE in creating more efficient and interpretable neural models.

References

D. Hume, "Of the idea of necessary connection," in A Treatise of Human Nature. John Noon, 1739, ch. 14. https://doi.org/10.1093/oseo/instance.00046221

T. Schreiber, "Measuring information transfer," Phys. Rev. Lett., vol. 85, pp. 461-464, Jul 2000. https://doi.org/10.1103/PhysRevLett.85.461

T. Marwala, Causality, correlation and artificial intelligence for rational decision making. World Scientific, 2015. https://doi.org/10.1142/9356

J. Pearl, Causality: Models, Reasoning and Inference, 2nd ed. New York, NY, USA: Cambridge University Press, 2009. https://doi.org/10.1017/CBO9780511803161

J. S. Mill, A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence and the Methods of Scientific Investigation. New York: Harper & Brothers, 1882.

W. Salmon, Scientific Explanation and the Causal Structure of the World. Princeton University Press, 2020. [Online]. Available: https://books.google.com/books?id=AET_DwAAQBAJ https://doi.org/10.2307/j.ctv173f2gh

R. Andonie and B. Kovalerchuk, "Neural networks for data mining: Constrains and open problems," in Proceedings of the 12th European Symposium on Artificial Neural Networks (ESANN'2004), M. Verleysen, Ed., 2004, pp. 449-458.

B. Kovalerchuk, K. Nazemi, R. Andonie, N. Datia, and E. Banissi, Integrating Artificial Intelligence and Visualization for Visual Knowledge Discovery. Springer, 2022. https://doi.org/10.1007/978-3-030-93119-3

B. Kovalerchuk, R. Andonie, N. Datia, K. Nazemi, and E. Banissi, "Visual knowledge discovery with artificial intelligence: Challenges and future directions," in Integrating Artificial Intelligence and Visualization for Visual Knowledge Discovery. Cham: Springer International Publishing, 2022, pp. 1-27. https://doi.org/10.1007/978-3-030-93119-3_1

B. Kovalerchuk, K. Nazemi, R. Andonie, N. Datia, and E. Banissi, Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Springer Cham, 2024. https://doi.org/10.1007/978-3-031-46549-9

R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, "Grad-CAM: Why did you say that? Visual explanations from deep networks via gradient-based localization," CoRR, vol. abs/1610.02391, 2016. [Online]. Available: http://arxiv.org/abs/1610.02391

E. Borgonovo, E. Plischke, and G. Rabitti, "The many shapley values for explainable artificial intelligence: A sensitivity analysis perspective," European Journal of Operational Research, 2024. https://doi.org/10.1016/j.ejor.2024.06.023

G. Morales and J. Sheppard, "Counterfactual explanations of neural network-generated response curves," in 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023, pp. 01-08. https://doi.org/10.1109/IJCNN54540.2023.10191746

A. Behnam and B. Wang, "Graph neural network causal explanation via neural causal models," arXiv preprint arXiv:2407.09378, 2024. https://doi.org/10.1007/978-3-031-73030-6_23

B. Muşat and R. Andonie, "Semiotic aggregation in deep learning," Entropy, vol. 22, no. 12, p. 1365, 2020. https://doi.org/10.3390/e22121365

R. Andonie and B. Musat, "Signs and supersigns in deep learning," INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, vol. 19, no. 1, 2024. https://doi.org/10.15837/ijccc.2024.1.6392

O. Kwon and J.-S. Yang, "Information flow between stock indices," EPL (Europhysics Letters), vol. 82, no. 6, p. 68003, 2008. [Online]. Available: http://stacks.iop.org/0295-5075/82/i=6/a= 68003 https://doi.org/10.1209/0295-5075/82/68003

T. Bossomaier, L. Barnett, M. Harré, and J. T. Lizier, An Introduction to Transfer Entropy. Information Flow in Complex Systems. Springer, 2016. https://doi.org/10.1007/978-3-319-43222-9

L. Barnett, A. B. Barrett, and A. K. Seth, "Granger causality and transfer entropy are equivalent for gaussian variables," Phys. Rev. Lett., vol. 103, p. 238701, Dec 2009. [Online]. https://doi.org/10.1103/PhysRevLett.103.238701

K. Hlaváčková-Schindler, "Equivalence of granger causality and transfer entropy: A generalization," Appl. Math. Sci., vol. 5, no. 73, pp. 3637-3648, 2011.

P. Wollstadt, M. Martínez-Zarzuela, R. Vicente, F. J. Díaz-Pernas, and M. Wibral, "Efficient transfer entropy analysis of non-stationary neural time series," PloS one, vol. 9, no. 7, p. e102833, 2014. https://doi.org/10.1371/journal.pone.0102833

P. Bonetti, A. M. Metelli, and M. Restelli, "Causal feature selection via transfer entropy," in 2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1-10. https://doi.org/10.1109/IJCNN60899.2024.10651028

X. Li and G. Tang, "Multivariate sequence prediction for graph convolutional networks based on esmd and transfer entropy," Multimedia Tools and Applications, pp. 1-19, 2024. https://doi.org/10.2139/ssrn.4570936

J. Zhang, J. Cao, W. Huang, X. Shi, and X. Zhou, "Rutting prediction and analysis of influence factors based on multivariate transfer entropy and graph neural networks," Neural Networks, vol. 157, pp. 26-38, 2023. https://doi.org/10.1016/j.neunet.2022.08.030

H. Xu, Y. Huang, Z. Duan, J. Feng, and P. Song, "Multivariate time series forecasting based on causal inference with transfer entropy and graph neural network," arXiv preprint arXiv:2005.01185, pp. 1-9, 2020.

S. Kim, S. Ku, W. Chang, and J. W. Song, "Predicting the direction of us stock prices using effective transfer entropy and machine learning techniques," IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.3002174

H. Wang, D. Li, H. Zhou, C. Guo, and Y. Liu, "Transfer entropy and lstm deep learning-based faulty sensor data recovery method for building air-conditioning systems," Journal of Building Engineering, p. 111307, 2024. https://doi.org/10.1016/j.jobe.2024.111307

O. Obst, J. Boedecker, and M. Asada, "Improving recurrent neural network performance using transfer entropy," in Proceedings of the 17th International Conference on Neural Information Processing: Models and Applications - Volume Part II, ser. ICONIP'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 193-200. [Online]. Available: http://dl.acm.org/citation.cfm?id=1939751.1939778 https://doi.org/10.1007/978-3-642-17534-3_24

S. Herzog, C. Tetzlaff, and F. Wörgötter, "Transfer entropy-based feedback improves performance in artificial neural networks," CoRR, vol. abs/1706.04265, 2017. [Online]. Available: http://arxiv.org/abs/1706.04265

A. Moldovan, A. Caţaron, and R. Andonie, "Learning in feedforward neural networks accelerated by transfer entropy," Entropy, vol. 22, no. 1, p. 102, 2020. https://doi.org/10.3390/e22010102

--, "Learning in convolutional neural networks accelerated by transfer entropy," Entropy, vol. 23, no. 9, 2021. [Online]. Available: https://www.mdpi.com/1099-4300/23/9/1218 https://doi.org/10.3390/e23091218

A. Moldovan, A. Caţaron, and R. Andonie, "Information plane analysis visualization in deep learning via transfer entropy," in 2023 27th International Conference Information Visualisation (IV), 2023, pp. 278-285. https://doi.org/10.1109/IV60283.2023.00055

--, "Transfer entropy in graph convolutional neural networks," in 2024 28th International Conference Information Visualisation (IV), 2024, pp. 278-285.

R. Féraud and F. Clérot, "A methodology to explain neural network classification," Neural Networks, vol. 15, no. 2, pp. 237 - 246, 2002. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608001001277 https://doi.org/10.1016/S0893-6080(01)00127-7

S. Ito, "Backward transfer entropy: Informational measure for detecting hidden markov models and its interpretations in thermodynamics, gambling and causality," Scientific reports, vol. 6, no. 1, p. 36831, 2016. https://doi.org/10.1038/srep36831

S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, "Pegasos: primal estimated sub-gradient solver for svm," Mathematical Programming, vol. 127, pp. 3-30, 2011. https://doi.org/10.1007/s10107-010-0420-4

A. Caţaron and R. Andonie, "Transfer information energy: A quantitative indicator of information transfer between time series," Entropy, vol. 20, no. 5, 2018. [Online]. Available: https://www.mdpi.com/1099-4300/20/5/323 https://doi.org/10.3390/e20050323

N. Tishby and N. Zaslavsky, "Deep learning and the information bottleneck principle," in 2015 IEEE Information Theory Workshop (ITW), 2015, pp. 1-5. https://doi.org/10.1109/ITW.2015.7133169

R. Shwartz-Ziv and N. Tishby, "Opening the black box of deep neural networks via information," CoRR, vol. abs/1703.00810, 2017. [Online]. Available: http://arxiv.org/abs/1703.00810

A. M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, "On the information bottleneck theory of deep learning," Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124020, dec 2019. [Online]. https://doi.org/10.1088/1742-5468/ab3985

D. McFadden, "Conditional logit analysis of qualitative choice behavior," in Frontiers in Econometrics, P. Zarembka, Ed. Academic Press, 1972, pp. 105-142.

H. Haken and J. Portugali, Information adaptation: the interplay between Shannon information and semantic information in cognition. Springer, 2014. https://doi.org/10.1007/978-3-319-11170-4

O. Shamir, S. Sabato, and N. Tishby, "Learning and generalization with the information bottleneck," Theoretical Computer Science, vol. 411, no. 29-30, pp. 2696-2711, 2010. https://doi.org/10.1016/j.tcs.2010.04.006

R. Shwartz Ziv and Y. LeCun, "To compress or not to compress-self-supervised learning and information theory: A review," Entropy, vol. 26, no. 3, 2024. [Online]. Available: https://www.mdpi.com/1099-4300/26/3/252 https://doi.org/10.3390/e26030252

B. C. Geiger, "On information plane analyses of neural network classifiers-a review," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7039-7051, 2022. https://doi.org/10.1109/TNNLS.2021.3089037

V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, Oct. 2008. [Online]. https://doi.org/10.1088/1742-5468/2008/10/P10008

Y. Yan, M. Hashemi, K. Swersky, Y. Yang, and D. Koutra, "Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks," in IEEE International Conference on Data Mining (ICDM). Los Alamitos, CA, USA: IEEE Computer Society, dec 2022, pp. 1287-1292. https://doi.org/10.1109/ICDM54844.2022.00169

M. Prokopenko, J. T. Lizier, and D. C. Price, "On thermodynamic interpretation of transfer entropy," Entropy, vol. 15, no. 2, pp. 524-543, 2013. [Online]. Available: https://www.mdpi.com/1099-4300/15/2/524 https://doi.org/10.3390/e15020524

Transfer Entropy in Deep Neural Networks

Authors

DOI:

Keywords:

Abstract

References

Additional Files

Published

Issue

Section

License

Most read articles by the same author(s)