Prognosis Prediction of Stroke based on Machine Learning and Explanation Model
Keywords:machine learning, stroke, prognosis prediction, explanation model
The prognosis prediction of stroke is of great significance to its prevention and treatment. This paper used machine learning to predict stroke prognosis, and use SHAP method to make feature importance and single sample analysis. Firstly, feature engineering, use Borderline-SMOTE algorithm to deal with data imbalance, use Support Vector Machine(SVM) to build a prognostic prediction model, and use Random Forest(RF), Decision Tree(DT), Logistic Regression(LR) for comparative analysis, and find the performance of SVM after feature engineering better than other models, the accuracy, specificity, F1 score, AUC value reach 0.8306, 0.8356, 0.8415 and 0.9140. Then, the model was further analyzed for explainability, and it was found that the top three causes of the disease were Glasgow Coma Score, NIHSS and atrial fibrillation. Finally, try to analysis a single sample, which is performed to determine that the patient is a low-risk patient, and suffering from atrial fibrillation is the largest potential risk factor for the patient.
 Asgedom, S. W. et al.(2020). Medical complications and mortality of hospitalized stroke patients, Journal of stroke and cerebrovascular diseases, 29(8), 104990, 2020. https://doi.org/10.1016/j.jstrokecerebrovasdis.2020.104990
 Azodi, C.B et al.(2020). Opening the Black Box: Interpretable Machine Learning for Geneticists, Trends in genetics, 36(6), 442-455, 2020. https://doi.org/10.1016/j.tig.2020.03.005
 Bergstra, J.; Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization, The Journal of Machine Learning Research, 13(1), 281-305, 2012.
 Boehme, Amelia K et al. (2017). Stroke Risk Factors, Genetics, and Prevention, Circulation research, 120(3), 472-495, 2017. https://doi.org/10.1161/CIRCRESAHA.116.308398
 Buonacera, Agata et al.(2019). Stroke and Hypertension: An Appraisal from Pathophysiology to Clinical Practice, Current vascular pharmacology, 17(1), 72-84, 2019. https://doi.org/10.2174/1570161115666171116151051
 Capor Hrosik, R.; Tuba, E.; Dolicanin, E.; Jovanovic, R.; Tuba, M. (2019). Brain Image Segmentation Based on Firefly Algorithm Combined with K-means Clustering, Studies in Informatics and Control, Studies in Informatics and Control, 28(2), 167-176, 2019. https://doi.org/10.24846/v28i2y201905
 Cortes, C; Vapnik, V. N. (1995). Support-Vector Networks, Machine Learning, 20(3), 273-297, 1995. https://doi.org/10.1007/BF00994018
 Datta, A. et al.(2020). "Black Box" to "Conversational" Machine Learning: Ondansetron Reduces Risk of Hospital-Acquired Venous Thromboembolism, IEEE journal of biomedical and health informatics, 1-1, 2020. https://doi.org/10.1109/JBHI.2020.3033405
 Esenwa, C.; Gutierrez, J. (2015). Secondary stroke prevention: challenges and solutions, Vascular health and risk management, 11, 437, 2015. https://doi.org/10.2147/VHRM.S63791
 Fellous, J.-M. et al.(2019). Explainable Artificial Intelligence for Neuroscience: Behavioral Neurostimulation, Frontiers in neuroscience, 13, 1346, 2019. https://doi.org/10.3389/fnins.2019.01346
 Han, H.; Wang, W.; Mao, B. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, International Conference on Intelligent Computing, 2005. https://doi.org/10.1007/11538059_91
 Hannon, N. et al.(2015). Antithrombotic treatment at onset of stroke with atrial fibrillation, functional outcome, and fatality: a systematic review and meta-analysis, International journal of stroke, 10(6), 808-814, 2015. https://doi.org/10.1111/ijs.12473
 Heo, J.N. et al. (2019). Machine Learning-Based Model for Prediction of Outcomes in Acute Stroke, Stroke, 50(5), 1263-1265, 2019. https://doi.org/10.1161/STROKEAHA.118.024293
 Ho, T.K. (1995). Random decision forests, Proc. 3rd Int. Conf. Doc. Anal. Recognit., 278-282, 1995.
 Hofman, J. M. et al.(2017). Prediction and explanation in social systems, Science, 355(6324), 486-488, 2017. https://doi.org/10.1126/science.aal3856
 Kapil, N. et al. (2017). Antiplatelet and Anticoagulant Therapies for Prevention of Ischemic Stroke, Clinical and applied thrombosis/hemostasis, 23(4), 301-318, 2017. https://doi.org/10.1177/1076029616660762
 Kuang, H. et al. (2019). Automated ASPECTS on Noncontrast CT Scans in Patients with Acute Ischemic Stroke Using Machine Learning, American journal of neuroradiology, 40(1), 33-38, 2019. https://doi.org/10.3174/ajnr.A5889
 Kuang, H et al. (2019). JOURNAL CLUB: Use of Gradient Boosting Machine Learning to Predict Patient Outcome in Acute Ischemic Stroke on the Basis of Imaging, Demographic, and Clinical Information, American journal of roentgenology, 212(1), 44-51, 2019. https://doi.org/10.2214/AJR.18.20260
 Kuang, H. et al. (2017). PREDICTIVE MODELING OF HOSPITAL READMISSION RATES USING ELECTRONIC MEDICAL RECORD-WIDE MACHINE LEARNING: A CASE-STUDY USING MOUNT SINAI HEART FAILURE COHORT, Pacific Symposium on Biocomputing, 22, 276-287, 2017.
 Li, J; Pan, S.X.; Huang, L.; Zhu, X. (2019). A Machine Learning Based Method for Customer Behavior Prediction, Tehnicki vjesnik-Technical Gazette, 26(6), 1670-1676, 2019. https://doi.org/10.17559/TV-20190603165825
 Lundberg, S.; Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions, NIPS, 2017.
 Marijon, E., et al.(2013). Causes of Death and Influencing Factors in Patients With Atrial Fibrillation, Circulation, 128(20), 2192-2201, 2013. https://doi.org/10.1161/CIRCULATIONAHA.112.000491
 Pistoia, F. et al. (2016). The Epidemiology of Atrial Fibrillation and Stroke, Cardiology clinics, 34(2), 255-268, 2016. https://doi.org/10.1016/j.ccl.2015.12.002
 Powers, W. J. et al. (2019). Guidelines for the Early Management of Patients With Acute Ischemic Stroke: 2019 Update to the 2018 Guidelines for the Early Management of Acute Ischemic Stroke: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association, Stroke, 50(12), e344-e418, 2019. https://doi.org/10.1161/STR.0000000000000211
 Ribeiro, M. T.; Singh, S.; Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier, International Conference on Knowledge Discovery and Data Mining, 1135-1144, 2016. https://doi.org/10.18653/v1/N16-3020
 Sandercock, P.; Wardlaw, J.M.; Lindley, R.I. et al. (2012). The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial, Lancet, 379, 2352-2363, 2012. https://doi.org/10.1016/S0140-6736(12)60768-5
 Shapley, L. S.(1953). A value for n-person games, Contributions to the Theory of Games, 1953.
 Sung, S. F.; Lin, C. Y.; Hu, Y. H.(2020). EMR-based phenotyping of ischemic stroke using supervised machine learning and text mining techniques, IEEE Journal of Biomedical and Health Informatics, 24(10), 2922-2931, 2020. https://doi.org/10.1109/JBHI.2020.2976931
 Virani, S.S. et al. (2020). Heart Disease and Stroke Statistics-2020 Update: A Report From the American Heart Association, Circulation, 141(9), e139-e596, 2020.
 [Online]. Available: https://www.who.int/zh/news-room/fact-sheets/detail/the-top-10-causes-ofdeath, Accesed on 10 January 2021.
 [Online]. Available: https://www.who.int/cardiovascular_diseases/en/cvd_atlas_03_risk_factors.pdf, Accesed on 10 January 2021.
 [Online]. Available: https://www.stroke.org/en/about-stroke/stroke-risk-factors/stroke-riskfactors- not-within-your-control?, Accesed on 10 January 2021.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.