Hybrid Stacked Ensemble Regression Model for Predicting Parkinson's Progression on Protein Data

K. Shastry Aditya; M. Mohan; K. Deepthi

doi:10.21512/commit.v19i1.12079

Authors

K. Shastry Aditya Nitte Meenakshi Institute of Technology
M. Mohan Amity University
K. Deepthi Nitte Meenakshi Institute of Technology

DOI:

https://doi.org/10.21512/commit.v19i1.12079

Keywords:

Parkinson’s Disease, Hybrid Stacked Ensemble Regression , Movement Disorder Society- Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Scores, Protein and Peptide Data, Predictive Modeling

Abstract

Parkinson’s Disease (PD) is a progressive neurological disorder marked by both motor and nonmotor symptoms. Accurate prediction of disease progression is critical for effective patient management. The research presents a Hybrid Stacked Ensemble Regression (HSER) model for predicting PD progression using protein and peptide data measurements, leveraging the Movement Disorder Society-Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDSUPDRS) scores. The researchers integrate three datasets: clinical data, protein data, and peptide data into a comprehensive feature-engineered dataset. The dataset is split into training and testing sets in four configurations for predicting the four UPDRS scores, namely updrs 1, updrs 2, updrs 3, updrs 4. The hybrid approach combines stacking and blending techniques. The researchers select ridge regression, gradient boosting, and extra trees as base models. A meta-model is trained using the algorithms’ out-of-fold estimates (ridge regression). The final predictions are obtained by averaging the predictions of the base models on the test data. The proposed HSER model exhibits enhanced performance compared to baseline models. These results underscore the promise of the hybrid model to enhance the prediction of PD progression, providing valuable insights for personalized treatment strategies. Future research can focus on refining model weights and exploring additional biomarkers to improve predictive accuracy.

Dimensions

Author Biographies

K. Shastry Aditya, Nitte Meenakshi Institute of Technology

Department of Information Science and Engineering

M. Mohan, Amity University

Department of Computer Science and Engineering

K. Deepthi, Nitte Meenakshi Institute of Technology

Department of Information Science and Engineering

References

[1] K. Kurihara, R. Nakagawa, M. Ishido, Y. Yoshinaga, J. Watanabe, Y. Hayashi, T. Mishima, S. Fujioka, and Y. Tsuboi, “Impact of motor and nonmotor symptoms in Parkinson disease for the quality of life: The Japanese Quality-of-Life Survey of Parkinson Disease (JAQPAD) study,” Journal of the Neurological Sciences, vol. 419, pp. 1–6, 2020.

[2] J. G. Goldman, D. Volpe, T. D. Ellis, M. A. Hirsch, J. Johnson, J. Wood, A. Aragon, R. Biundo, A. Di Rocco, G. S. Kasman et al., “Delivering multidisciplinary rehabilitation care in Parkinson’s disease: An international consensus statement,” Journal of Parkinson’s disease, vol. 14, no. 1, pp. 135–166, 2024.

[3] C. G. Goetz, “Unified Parkinson’s Disease Rating Scale (UPDRS) and the Movement-Disorder Society Sponsored-unified Parkinson’s Disease Rating Scale (MDS-UPDRS),” Encyclopedia of Movement Disorders, pp. 307–309, 2010.

[4] R. Z. U. Rehman, L. Rochester, A. J. Yarnall, and S. Del Din, “Predicting the progression of Parkinson’s disease MDS-UPDRS-III motor severity score from gait data using deep learning,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). Mexico: IEEE, Nov. 1–5, 2021, pp. 249–252.

[5] A. Ahmad, M. Imran, and H. Ahsan, “Biomarkers as biomedical bioindicators: Approaches and techniques for the detection, analysis, and validation of novel biomarkers of diseases,” Pharmaceutics, vol. 15, no. 6, pp. 1–36, 2023.

[6] W. F. Zeng, X. X. Zhou, S. Willems, C. Ammar, M. Wahle, I. Bludau, E. Voytik, M. T. Strauss, and M. Mann, “AlphaPeptDeep: A modular deep learning framework to predict peptide properties for proteomics,” Nature Communications, vol. 13, no. 1, pp. 1–14, 2022.

[7] M. Junaid, S. Ali, F. Eid, S. El-Sappagh, and T. Abuhmed, “Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson’s disease,” Computer Methods and Programs in Biomedicine, vol. 234, 2023.

[8] M. Mart´ınez-Garc´ıa and E. Hern´andez-Lemus, “Data integration challenges for machine learning in precision medicine,” Frontiers in Medicine, vol. 8, pp. 1–21, 2022.

[9] P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble learning for disease prediction: A review,” Healthcare, vol. 11, no. 12, pp. 1–21, 2023.

[10] J. Hathaliya, H. Modi, R. Gupta, S. Tanwar, F. Alqahtani, M. Elghatwary, B.-C. Neagu, and M. S. Raboaca, “Stacked model-based classification of Parkinson’s disease patients using imaging biomarker data,” Biosensors, vol. 12, no. 8, pp. 1–17, 2022.

[11] S. Ahmed, M. Komeili, and J. Park, “Predictive modelling of Parkinson’s disease progression based on RNA-Sequence with densely connected deep recurrent neural networks,” Scientific Reports, vol. 12, no. 1, pp. 1–10, 2022.

[12] H. Byeon, “Development of a stacking-based ensemble machine learning for detection of depression in Parkinson’s disease: Preliminary research,” Biology and Life Sciences Forum, vol. 9, no. 1, pp. 1–7, 2021.

[13] K. Gupta, T. Lamba, and N. Garg, “Predicting Parkinson’s disease risk through protein and peptide level analysis: An evidence from EDA and machine learning based approach,” in 2023 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). Greater Noida, India: IEEE, Nov. 3–4, 2023, pp. 662–668.

[14] S. Bharathidason and C. Sujdha, “Prediction of Parkinson’s disease using a stack ensemble modelling,” Tuijin Jishu/Journal of Propulsion Technology, vol. 45, no. 2, pp. 509–517, 2024.

[15] I. D. Dinov, B. Heavner, M. Tang, G. Glusman, K. Chard, M. Darcy, R. Madduri, J. Pa, C. Spino, C. Kesselman et al., “Predictive big data analytics: A study of Parkinson’s disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations,” PLOS ONE, vol. 11, no. 8, pp. 1–28, 2016.

[16] FNIH, “AMP Parkinson’s disease.” [Online]. Available: https://fnih.org/our-programs/accelerating-medicines-partnership-amp/amp-parkinsons-disease/

[17] I. H. Sarker, “Machine learning: Algorithms, realworld applications and research directions,” SN Computer Science, vol. 2, 2021.

[18] S. Safi, M. Alsheryani, M. Alrashdi, R. Suleiman, D. Awwad, and Z. Abdalla, “Optimizing linear regression models with Lasso and Ridge regression: A study on uae financial behavior during COVID-19,” Migration Letters, vol. 20, no. 6, pp. 139–153, 2023.

[19] L. Freijeiro-Gonz´alez, M. Febrero-Bande, and W. Gonz´alez-Manteiga, “A critical review of Lasso and its derivatives for variable selection under dependence among covariates,” International Statistical Review, vol. 90, no. 1, pp. 118–145, 2022.

[20] C. De Mol, E. De Vito, and L. Rosasco, “Elasticnet regularization in learning theory,” Journal of Complexity, vol. 25, no. 2, pp. 201–230, 2009.

[21] A.-L. Boulesteix, S. Janitza, J. Kruppa, and I. R. K¨onig, “Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 6, pp. 493–507, 2012.

[22] Y. Park and J. C. Ho, “Tackling overfitting in boosting for noisy healthcare data,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 7, pp. 2995–3006, 2019.

[23] J. Hatwell, M. M. Gaber, and R. M. Atif Azad, “Ada-WHIPS: Explaining AdaBoost classification with applications in the health sciences,” BMC Medical Informatics and Decision Making, vol. 20, pp. 1–25, 2020.

[24] P. Mahajan, S. Uddin, F. Hajati, M. A. Moni, and E. Gide, “A comparative evaluation of machine learning ensemble approaches for disease prediction using multiple datasets,” Health and Technology, vol. 14, no. 3, pp. 597–613, 2024.

[25] M. Arya, H. Sastry G, A. Motwani, S. Kumar, and A. Zaguia, “A novel Extra Tree Ensemble Optimized DL Framework (ETEODL) for early detection of diabetes,” Frontiers in Public Health, vol. 9, pp. 1–13, 2022.