Prognosis Prediction of Stroke based on Machine Learning and Explanation Model


  • Qiuli Qin Beijing Jiaotong University
  • Xuehan Zhou
  • Yong Jiang


machine learning, stroke, prognosis prediction, explanation model


The prognosis prediction of stroke is of great significance to its prevention and treatment. This paper used machine learning to predict stroke prognosis, and use SHAP method to make feature importance and single sample analysis. Firstly, feature engineering, use Borderline-SMOTE algorithm to deal with data imbalance, use Support Vector Machine(SVM) to build a prognostic prediction model, and use Random Forest(RF), Decision Tree(DT), Logistic Regression(LR) for comparative analysis, and find the performance of SVM after feature engineering better than other models, the accuracy, specificity, F1 score, AUC value reach 0.8306, 0.8356, 0.8415 and 0.9140. Then, the model was further analyzed for explainability, and it was found that the top three causes of the disease were Glasgow Coma Score, NIHSS and atrial fibrillation. Finally, try to analysis a single sample, which is performed to determine that the patient is a low-risk patient, and suffering from atrial fibrillation is the largest potential risk factor for the patient.


[1] Afify, H.M.; Mohammed, K.K.; Hassanien, A.E.(2020). Multi-Images Recognition of Breast Cancer Histopathological via Probabilistic Neural Network Approach, Journal of System and Management Sciences, 1(2), 53-68, 2020.

[2] Asgedom, S. W. et al.(2020). Medical complications and mortality of hospitalized stroke patients, Journal of stroke and cerebrovascular diseases, 29(8), 104990, 2020.

[3] Azodi, C.B et al.(2020). Opening the Black Box: Interpretable Machine Learning for Geneticists, Trends in genetics, 36(6), 442-455, 2020.

[4] Bergstra, J.; Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization, The Journal of Machine Learning Research, 13(1), 281-305, 2012.

[5] Boehme, Amelia K et al. (2017). Stroke Risk Factors, Genetics, and Prevention, Circulation research, 120(3), 472-495, 2017.

[6] Buonacera, Agata et al.(2019). Stroke and Hypertension: An Appraisal from Pathophysiology to Clinical Practice, Current vascular pharmacology, 17(1), 72-84, 2019.

[7] Capor Hrosik, R.; Tuba, E.; Dolicanin, E.; Jovanovic, R.; Tuba, M. (2019). Brain Image Segmentation Based on Firefly Algorithm Combined with K-means Clustering, Studies in Informatics and Control, Studies in Informatics and Control, 28(2), 167-176, 2019.

[8] Cortes, C; Vapnik, V. N. (1995). Support-Vector Networks, Machine Learning, 20(3), 273-297, 1995.

[9] Datta, A. et al.(2020). "Black Box" to "Conversational" Machine Learning: Ondansetron Reduces Risk of Hospital-Acquired Venous Thromboembolism, IEEE journal of biomedical and health informatics, 1-1, 2020.

[10] Esenwa, C.; Gutierrez, J. (2015). Secondary stroke prevention: challenges and solutions, Vascular health and risk management, 11, 437, 2015.

[11] Fellous, J.-M. et al.(2019). Explainable Artificial Intelligence for Neuroscience: Behavioral Neurostimulation, Frontiers in neuroscience, 13, 1346, 2019.

[12] Han, H.; Wang, W.; Mao, B. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, International Conference on Intelligent Computing, 2005.

[13] Hannon, N. et al.(2015). Antithrombotic treatment at onset of stroke with atrial fibrillation, functional outcome, and fatality: a systematic review and meta-analysis, International journal of stroke, 10(6), 808-814, 2015.

[14] Heo, J.N. et al. (2019). Machine Learning-Based Model for Prediction of Outcomes in Acute Stroke, Stroke, 50(5), 1263-1265, 2019.

[15] Ho, T.K. (1995). Random decision forests, Proc. 3rd Int. Conf. Doc. Anal. Recognit., 278-282, 1995.

[16] Hofman, J. M. et al.(2017). Prediction and explanation in social systems, Science, 355(6324), 486-488, 2017.

[17] Kapil, N. et al. (2017). Antiplatelet and Anticoagulant Therapies for Prevention of Ischemic Stroke, Clinical and applied thrombosis/hemostasis, 23(4), 301-318, 2017.

[18] Kuang, H. et al. (2019). Automated ASPECTS on Noncontrast CT Scans in Patients with Acute Ischemic Stroke Using Machine Learning, American journal of neuroradiology, 40(1), 33-38, 2019.

[19] Kuang, H et al. (2019). JOURNAL CLUB: Use of Gradient Boosting Machine Learning to Predict Patient Outcome in Acute Ischemic Stroke on the Basis of Imaging, Demographic, and Clinical Information, American journal of roentgenology, 212(1), 44-51, 2019.


[21] Li, J; Pan, S.X.; Huang, L.; Zhu, X. (2019). A Machine Learning Based Method for Customer Behavior Prediction, Tehnicki vjesnik-Technical Gazette, 26(6), 1670-1676, 2019.

[22] Lundberg, S.; Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions, NIPS, 2017.

[23] Marijon, E., et al.(2013). Causes of Death and Influencing Factors in Patients With Atrial Fibrillation, Circulation, 128(20), 2192-2201, 2013.

[24] Pistoia, F. et al. (2016). The Epidemiology of Atrial Fibrillation and Stroke, Cardiology clinics, 34(2), 255-268, 2016.

[25] Powers, W. J. et al. (2019). Guidelines for the Early Management of Patients With Acute Ischemic Stroke: 2019 Update to the 2018 Guidelines for the Early Management of Acute Ischemic Stroke: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association, Stroke, 50(12), e344-e418, 2019.

[26] Ribeiro, M. T.; Singh, S.; Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier, International Conference on Knowledge Discovery and Data Mining, 1135-1144, 2016.

[27] Sandercock, P.; Wardlaw, J.M.; Lindley, R.I. et al. (2012). The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial, Lancet, 379, 2352-2363, 2012.

[28] Shapley, L. S.(1953). A value for n-person games, Contributions to the Theory of Games, 1953.

[29] Sung, S. F.; Lin, C. Y.; Hu, Y. H.(2020). EMR-based phenotyping of ischemic stroke using supervised machine learning and text mining techniques, IEEE Journal of Biomedical and Health Informatics, 24(10), 2922-2931, 2020.

[30] Virani, S.S. et al. (2020). Heart Disease and Stroke Statistics-2020 Update: A Report From the American Heart Association, Circulation, 141(9), e139-e596, 2020.

[31] [Online]. Available:, Accesed on 10 January 2021.

[32] [Online]. Available:, Accesed on 10 January 2021.

[33] [Online]. Available: not-within-your-control?, Accesed on 10 January 2021.

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.