Evaluating a machine learning models for predicting full recovery in stroke: A case study at Hatyai Hospital

Authors

  • Nisjara Kunaton Hatyai Hospital, Songkhla

Keywords:

Machine Learning, Barthel Index, Stroke

Abstract

Stroke is a leading cause of death and disability worldwide, making the prediction of patient recovery crucial for treatment planning and rehabilitation. This study investigates the application of Machine Learning (ML) techniques to predict stroke patient recovery, using data from 6,081 cases at Hatyai Hospital. In the design and execution of the experiment, four ML models were utilized: Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting. The data preparation process involved label encoding, handling missing values, and balancing the dataset. Model performance was evaluated using K-Fold Cross-Validation and Hyperparameter Tuning. Results show that Random Forest performed the best, achieving an accuracy of 88.61%, with both Precision and Recall at 0.91, and an F1-Score of 0.91. Gradient Boosting followed closely with an accuracy of 88.50%. Logistic Regression and Decision Tree showed lower performance, with accuracies of 87.08% and 83.83%, respectively. The study demonstrates that ML techniques, particularly Random Forest and Gradient Boosting, offer high accuracy in predicting stroke recovery, providing valuable insights for efficient treatment planning.

References

V. L. Feigin et al., “Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019,” The Lancet Neurology, vol. 20, no. 10, pp. 795–820, Oct. 2021, doi: 10.1016/ S1474-4422(21)00252-0.

A. Sardar, K. Shahzad, A. R. Arshad, K. Shabbir, and S. Raza, “Correlation of caregivers’ strain with patients’ disability in stroke,” vol. 34, pp. 326–30, Mar. 2022, doi: 10.55519/JAMC-02-9488.

Q. Wu et al., “Comparison of Three Instruments for Activity Disability in Acute Ischemic Stroke Survivors,” Can. J. Neurol. Sci., vol. 48, no. 1, pp. 94–104, Jan. 2021, doi: 10.1017/cjn.2020.149.

M. Murie-Fernández and M. M. Marzo, “Predictors of Neurological and Functional Recovery in Patients with Moderate to Severe Ischemic Stroke: The EPICA Study,” Stroke Research and Treatment, vol. 2020, no. 1, p. 1419720, 2020, doi: 10.1155/2020/ 1419720.

W. Wang et al., “A systematic review of machine learning models for predicting outcomes of stroke with structured data,” PLoS ONE, vol. 15, no. 6, p. e0234722, Jun. 2020, doi: 10.1371/journal. pone.0234722.

Q. Zhang, Z. Zhang, X. Huang, C. Zhou, and J. Xu, “Application of Logistic Regression and Decision Tree Models in the Prediction of Activities of Daily Living in Patients with Stroke,” Neural Plasticity, vol. 2022, no. 1, p. 9662630, 2022, doi: 10.1155/2022/ 9662630.

A. Criminisi, J. Shotton, and E. Konukoglu, “Decision Forests for Classication, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning”.

A. Cuzzocrea, S. Francis, and M. Gaber, An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests. 2013, p. 1019.

Tin Kam Ho, “Random decision forests,” in Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, Que., Canada: IEEE Comput. Soc. Press, 1995, pp. 278–282. doi: 10.1109/ICDAR.1995.598994.

D. Marron, A. Bifet, and G. De Francisci Morales, “Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams,” in ECAI 2014, IOS Press, 2014, pp. 615–620. doi: 10.3233/ 978-1-61499-419-0-615.

A. V. Konstantinov and L. V. Utkin, “Gradient boosting machine with partially randomized decision trees,” Jun. 19, 2020, arXiv: arXiv:2006.11014. Accessed: Oct. 19, 2024. [Online]. Available: http://arxiv.org/ abs/2006.11014

G. C. Okoye and E. U. Umeh, “Predicting Functional Outcome After Ischemic Stroke Using Logistic Regression and Machine Learning Models,” Earthline Journal of Mathematical Sciences, vol. 14, no. 1, pp. 133–150, 2024.

D. Sengupta, S. Mondal, Y. R. Singh, and A. Pandey, “Performance Analysis of Machine Learning Algorithms for Prediction of Cerebral Attack (Stroke),” in Frontiers of ICT in Healthcare, vol. 519, J. K. Mandal and D. De, Eds., in Lecture Notes in Networks and Systems, vol. 519., Singapore: Springer Nature Singapore, 2023, pp. 215–228. doi: 10.1007/978-981-19-5191-6_18.

S.-C. Chang et al., “The comparison and interpretation of machine-learning models in post-stroke functional outcome prediction,” Diagnostics, vol. 11, no. 10, p. 1784, 2021.

“Building decision trees for the multi-class imbalance problem,” in SciSpace - Paper, Springer, Berlin, Heidelberg, May 2012, pp. 122–134. doi: 10.1007 /978-3-642-30217-6_11.

“A Study with Class Imbalance and Random Sampling for a Decision Tree Learning System,” in SciSpace - Paper, Springer, Boston, MA, Sep. 2008, pp. 131–140. doi: 10.1007/978-0-387-09695-7_13.

D. J. Dittman, T. M. Khoshgoftaar, and A. Napolitano, “The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data,” in 2015 IEEE International Conference on Information Reuse and Integration, Aug. 2015, pp. 457–463. doi: 10.1109/IRI.2015.76.

“An Improved Random Forest Algorithm for Class- Imbalanced Data Classification and its Application in PAD Risk Factors Analysis,” The Open Electrical & Electronic Engineering Journal, vol. 7, no. 1, pp. 62–70, Jun. 2013, doi: 10.2174/1874129001307010062.

M. Amrehn, F. Mualla, E. Angelopoulou, S. Steidl, and A. Maier, “The Random Forest Classifier in WEKA: Discussion and New Developments for Imbalanced Data,” Jan. 04, 2019, arXiv: arXiv: 1812.08102. Accessed: Oct. 02, 2024. [Online]. Available: http://arxiv.org/abs/1812.08102

“Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases | Jurnal Online Informatika.” Accessed: Oct. 02, 2024. [Online]. Available: https://join.if.uinsgd.ac.id/index.php/join/article/view/1293

S. Campagnini, C. Arienti, M. Patrini, P. Liuzzi, A. Mannini, and M. C. Carrozza, “Machine learning methods for functional recovery prediction and prognosis in post-stroke rehabilitation: a systematic review,” J NeuroEngineering Rehabil, vol. 19, no. 1, p. 54, Jun. 2022, doi: 10.1186/s12984-022-01032-4.

S.-C. Chang et al., “The Comparison and Interpre tation of Machine-Learning Models in Post-Stroke Functional Outcome Prediction,” Diagnostics, vol. 11, no. 10, Art. no. 10, Oct. 2021, doi: 10.3390/diagnostics11101784.

“Inpatient stroke rehabilitation: prediction of clinical outcomes using a machine-learning approach | Journal of NeuroEngineering and Rehabilitation.” Accessed: Oct. 02, 2024. [Online]. Available: https://link.springer.com/article/10.1186/s12984-020-00704-3

“Abstract TP77: Stroke Rank Order Correlation but Moderate Absolute Value Drift Between Early Day 2/4 and Day 90 Stroke Outcome Scales: mRS, BI, and NIHSS,” Stroke, Feb. 2024, doi: 10.1161/str.55. suppl_1.tp77.

M. Fu et al., “Barthel Index, SPAN-100, and NIHSS Studies on the Predictive Value of Prognosis in Patients With Thrombolysis,” The Neurologist, vol. 29, no. 3, p. 158, May 2024, doi: 10.1097NRL.000000 0000000554.

K. Ghandehari, “Challenging comparison of stroke scales,” J Res Med Sci, vol. 18, no. 10, pp.906–910, Oct. 2013.

Downloads

Published

2025-11-25

How to Cite

Kunaton, N. . (2025). Evaluating a machine learning models for predicting full recovery in stroke: A case study at Hatyai Hospital. Journal of the Thai Medical Informatics Association, 11(2), 135–143. retrieved from https://he03.tci-thaijo.org/index.php/jtmi/article/view/5056