Transfer Entropy in Deep Neural Networks
DOI:
https://doi.org/10.15837/ijccc.2025.1.6904Keywords:
Transfer entropy, causality, deep learning, neural network explainabilityAbstract
This paper explores the application of Transfer Entropy (TE) in deep neural networks as a tool to improve training efficiency and analyze causal information flow. TE is a measure of directed information transfer that captures nonlinear dependencies and temporal dynamics between system components. The study investigates the use of TE in optimizing learning in Convolutional Neural Networks and Graph Convolutional Neural Networks. We present case studies that demonstrate reduced training times and improved accuracy. In addition, we apply TE within the framework of the Information Bottleneck theory, providing an insight into the trade-off between compression and information preservation during the training of deep learning architectures. The results highlight TE’s potential for identifying causal features, improving explainability, and addressing challenges such as oversmoothing in Graph Convolutional Neural Networks. Although computational overhead and complexity pose challenges, the findings emphasize the role of TE in creating more efficient and interpretable neural models.
References
D. Hume, "Of the idea of necessary connection," in A Treatise of Human Nature. John Noon, 1739, ch. 14. https://doi.org/10.1093/oseo/instance.00046221
T. Schreiber, "Measuring information transfer," Phys. Rev. Lett., vol. 85, pp. 461-464, Jul 2000. https://doi.org/10.1103/PhysRevLett.85.461
T. Marwala, Causality, correlation and artificial intelligence for rational decision making. World Scientific, 2015. https://doi.org/10.1142/9356
J. Pearl, Causality: Models, Reasoning and Inference, 2nd ed. New York, NY, USA: Cambridge University Press, 2009. https://doi.org/10.1017/CBO9780511803161
J. S. Mill, A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence and the Methods of Scientific Investigation. New York: Harper & Brothers, 1882.
W. Salmon, Scientific Explanation and the Causal Structure of the World. Princeton University Press, 2020. [Online]. Available: https://books.google.com/books?id=AET_DwAAQBAJ https://doi.org/10.2307/j.ctv173f2gh
R. Andonie and B. Kovalerchuk, "Neural networks for data mining: Constrains and open problems," in Proceedings of the 12th European Symposium on Artificial Neural Networks (ESANN'2004), M. Verleysen, Ed., 2004, pp. 449-458.
B. Kovalerchuk, K. Nazemi, R. Andonie, N. Datia, and E. Banissi, Integrating Artificial Intelligence and Visualization for Visual Knowledge Discovery. Springer, 2022. https://doi.org/10.1007/978-3-030-93119-3
B. Kovalerchuk, R. Andonie, N. Datia, K. Nazemi, and E. Banissi, "Visual knowledge discovery with artificial intelligence: Challenges and future directions," in Integrating Artificial Intelligence and Visualization for Visual Knowledge Discovery. Cham: Springer International Publishing, 2022, pp. 1-27. https://doi.org/10.1007/978-3-030-93119-3_1
B. Kovalerchuk, K. Nazemi, R. Andonie, N. Datia, and E. Banissi, Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Springer Cham, 2024. https://doi.org/10.1007/978-3-031-46549-9
R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra, "Grad-CAM: Why did you say that? Visual explanations from deep networks via gradient-based localization," CoRR, vol. abs/1610.02391, 2016. [Online]. Available: http://arxiv.org/abs/1610.02391
E. Borgonovo, E. Plischke, and G. Rabitti, "The many shapley values for explainable artificial intelligence: A sensitivity analysis perspective," European Journal of Operational Research, 2024. https://doi.org/10.1016/j.ejor.2024.06.023
G. Morales and J. Sheppard, "Counterfactual explanations of neural network-generated response curves," in 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023, pp. 01-08. https://doi.org/10.1109/IJCNN54540.2023.10191746
A. Behnam and B. Wang, "Graph neural network causal explanation via neural causal models," arXiv preprint arXiv:2407.09378, 2024. https://doi.org/10.1007/978-3-031-73030-6_23
B. Muşat and R. Andonie, "Semiotic aggregation in deep learning," Entropy, vol. 22, no. 12, p. 1365, 2020. https://doi.org/10.3390/e22121365
R. Andonie and B. Musat, "Signs and supersigns in deep learning," INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, vol. 19, no. 1, 2024. https://doi.org/10.15837/ijccc.2024.1.6392
O. Kwon and J.-S. Yang, "Information flow between stock indices," EPL (Europhysics Letters), vol. 82, no. 6, p. 68003, 2008. [Online]. Available: http://stacks.iop.org/0295-5075/82/i=6/a= 68003 https://doi.org/10.1209/0295-5075/82/68003
T. Bossomaier, L. Barnett, M. Harré, and J. T. Lizier, An Introduction to Transfer Entropy. Information Flow in Complex Systems. Springer, 2016. https://doi.org/10.1007/978-3-319-43222-9
L. Barnett, A. B. Barrett, and A. K. Seth, "Granger causality and transfer entropy are equivalent for gaussian variables," Phys. Rev. Lett., vol. 103, p. 238701, Dec 2009. [Online]. https://doi.org/10.1103/PhysRevLett.103.238701
K. Hlaváčková-Schindler, "Equivalence of granger causality and transfer entropy: A generalization," Appl. Math. Sci., vol. 5, no. 73, pp. 3637-3648, 2011.
P. Wollstadt, M. Martínez-Zarzuela, R. Vicente, F. J. Díaz-Pernas, and M. Wibral, "Efficient transfer entropy analysis of non-stationary neural time series," PloS one, vol. 9, no. 7, p. e102833, 2014. https://doi.org/10.1371/journal.pone.0102833
P. Bonetti, A. M. Metelli, and M. Restelli, "Causal feature selection via transfer entropy," in 2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1-10. https://doi.org/10.1109/IJCNN60899.2024.10651028
X. Li and G. Tang, "Multivariate sequence prediction for graph convolutional networks based on esmd and transfer entropy," Multimedia Tools and Applications, pp. 1-19, 2024. https://doi.org/10.2139/ssrn.4570936
J. Zhang, J. Cao, W. Huang, X. Shi, and X. Zhou, "Rutting prediction and analysis of influence factors based on multivariate transfer entropy and graph neural networks," Neural Networks, vol. 157, pp. 26-38, 2023. https://doi.org/10.1016/j.neunet.2022.08.030
H. Xu, Y. Huang, Z. Duan, J. Feng, and P. Song, "Multivariate time series forecasting based on causal inference with transfer entropy and graph neural network," arXiv preprint arXiv:2005.01185, pp. 1-9, 2020.
S. Kim, S. Ku, W. Chang, and J. W. Song, "Predicting the direction of us stock prices using effective transfer entropy and machine learning techniques," IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.3002174
H. Wang, D. Li, H. Zhou, C. Guo, and Y. Liu, "Transfer entropy and lstm deep learning-based faulty sensor data recovery method for building air-conditioning systems," Journal of Building Engineering, p. 111307, 2024. https://doi.org/10.1016/j.jobe.2024.111307
O. Obst, J. Boedecker, and M. Asada, "Improving recurrent neural network performance using transfer entropy," in Proceedings of the 17th International Conference on Neural Information Processing: Models and Applications - Volume Part II, ser. ICONIP'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 193-200. [Online]. Available: http://dl.acm.org/citation.cfm?id=1939751.1939778 https://doi.org/10.1007/978-3-642-17534-3_24
S. Herzog, C. Tetzlaff, and F. Wörgötter, "Transfer entropy-based feedback improves performance in artificial neural networks," CoRR, vol. abs/1706.04265, 2017. [Online]. Available: http://arxiv.org/abs/1706.04265
A. Moldovan, A. Caţaron, and R. Andonie, "Learning in feedforward neural networks accelerated by transfer entropy," Entropy, vol. 22, no. 1, p. 102, 2020. https://doi.org/10.3390/e22010102
--, "Learning in convolutional neural networks accelerated by transfer entropy," Entropy, vol. 23, no. 9, 2021. [Online]. Available: https://www.mdpi.com/1099-4300/23/9/1218 https://doi.org/10.3390/e23091218
A. Moldovan, A. Caţaron, and R. Andonie, "Information plane analysis visualization in deep learning via transfer entropy," in 2023 27th International Conference Information Visualisation (IV), 2023, pp. 278-285. https://doi.org/10.1109/IV60283.2023.00055
--, "Transfer entropy in graph convolutional neural networks," in 2024 28th International Conference Information Visualisation (IV), 2024, pp. 278-285.
R. Féraud and F. Clérot, "A methodology to explain neural network classification," Neural Networks, vol. 15, no. 2, pp. 237 - 246, 2002. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608001001277 https://doi.org/10.1016/S0893-6080(01)00127-7
S. Ito, "Backward transfer entropy: Informational measure for detecting hidden markov models and its interpretations in thermodynamics, gambling and causality," Scientific reports, vol. 6, no. 1, p. 36831, 2016. https://doi.org/10.1038/srep36831
S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, "Pegasos: primal estimated sub-gradient solver for svm," Mathematical Programming, vol. 127, pp. 3-30, 2011. https://doi.org/10.1007/s10107-010-0420-4
A. Caţaron and R. Andonie, "Transfer information energy: A quantitative indicator of information transfer between time series," Entropy, vol. 20, no. 5, 2018. [Online]. Available: https://www.mdpi.com/1099-4300/20/5/323 https://doi.org/10.3390/e20050323
N. Tishby and N. Zaslavsky, "Deep learning and the information bottleneck principle," in 2015 IEEE Information Theory Workshop (ITW), 2015, pp. 1-5. https://doi.org/10.1109/ITW.2015.7133169
R. Shwartz-Ziv and N. Tishby, "Opening the black box of deep neural networks via information," CoRR, vol. abs/1703.00810, 2017. [Online]. Available: http://arxiv.org/abs/1703.00810
A. M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, "On the information bottleneck theory of deep learning," Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124020, dec 2019. [Online]. https://doi.org/10.1088/1742-5468/ab3985
D. McFadden, "Conditional logit analysis of qualitative choice behavior," in Frontiers in Econometrics, P. Zarembka, Ed. Academic Press, 1972, pp. 105-142.
H. Haken and J. Portugali, Information adaptation: the interplay between Shannon information and semantic information in cognition. Springer, 2014. https://doi.org/10.1007/978-3-319-11170-4
O. Shamir, S. Sabato, and N. Tishby, "Learning and generalization with the information bottleneck," Theoretical Computer Science, vol. 411, no. 29-30, pp. 2696-2711, 2010. https://doi.org/10.1016/j.tcs.2010.04.006
R. Shwartz Ziv and Y. LeCun, "To compress or not to compress-self-supervised learning and information theory: A review," Entropy, vol. 26, no. 3, 2024. [Online]. Available: https://www.mdpi.com/1099-4300/26/3/252 https://doi.org/10.3390/e26030252
B. C. Geiger, "On information plane analyses of neural network classifiers-a review," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7039-7051, 2022. https://doi.org/10.1109/TNNLS.2021.3089037
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, Oct. 2008. [Online]. https://doi.org/10.1088/1742-5468/2008/10/P10008
Y. Yan, M. Hashemi, K. Swersky, Y. Yang, and D. Koutra, "Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks," in IEEE International Conference on Data Mining (ICDM). Los Alamitos, CA, USA: IEEE Computer Society, dec 2022, pp. 1287-1292. https://doi.org/10.1109/ICDM54844.2022.00169
M. Prokopenko, J. T. Lizier, and D. C. Price, "On thermodynamic interpretation of transfer entropy," Entropy, vol. 15, no. 2, pp. 524-543, 2013. [Online]. Available: https://www.mdpi.com/1099-4300/15/2/524 https://doi.org/10.3390/e15020524
Additional Files
Published
Issue
Section
License
Copyright (c) 2024 Razvan ANDONIE, Angel Cataron, Adrian Moldovan
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.