Investigating Prospective Athletic Athletes: Classifiers, Benchmarking, and Post-Hoc XAI Analysis

Authors

  • Ibnu Febry Kurniawan Department of Data Science, Universitas Negeri Surabaya
  • A'yunin Sofro Department of Actuarial Science, Universitas Negeri Surabaya https://orcid.org/0000-0003-2603-4092
  • Danang Ariyanto Department of Actuarial Science, Universitas Negeri Surabaya
  • Junaidi Budi Prihanto Department of Sport Education, Universitas Negeri Surabaya
  • Dimas Avian Maulana Department of Actuarial Science, Universitas Negeri Surabaya

DOI:

https://doi.org/10.21512/comtech.v17i1.13224

Keywords:

machine learning, sports science, explainable AI, Post-Hoc analysis, benchmark

Abstract

Identifying highly potential athletes is a critical yet inherently challenging process that requires comprehensive analysis of diverse factors, including physiological attributes, demographic characteristics, and social influences. This multifaceted process requires meticulous evaluation of extensive datasets to ensure both accuracy and fairness in talent identification protocols. The complexity stems from the interconnected nature of the determinants of athletic performance, where physical capabilities intersect with psychological resilience, social support systems, and environmental factors. In recent years, machine learning (ML) algorithms gain prominence in decision-making processes, offering unprecedented opportunities to uncover subtle patterns and relationships within athlete data that might otherwise remain hidden. This study systematically benchmarks the performance of several state-of-the-art ML classifiers using a novel, self-collected dataset of athlete candidates. Furthermore, an explainable AI (XAI) technique, Shapley Additive Explanations (SHAP), is applied to interpret model decisions and provide meaningful insights into key predictive factors. Experimental results demonstrate that Gradient Boosting achieves superior predictive performance (F1) across the 10-fold sets, with a mean value of 0.46. SHAP analysis reveals the critical importance of anthropometric measurements and social group features in influencing prediction outcomes. These findings collectively underscore the substantial potential of ML to revolutionize talent identification in sports while emphasizing the importance of model interpretability in fostering trust and acceptance of AIdriven decision-making processes.

Dimensions

References

Alpsoy, Ş. (2020). Exercise and Hypertension. In J. Xiao (Ed.), Physical Exercise for Human Health (Vol. 1228, pp. 153–167). Springer Nature Singapore. https://doi.org/10.1007/978-981-15-1792-1_10

Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., & Rinzivillo, S. (2023). Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, 37(5), 1719–1778. https://doi.org/10.1007/s10618-023-00933-9

Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249–259. https://doi.org/10.1016/j.neunet.2018.07.011

Cesanelli, L., Lagoute, T., Ylaite, B., Calleja-González, J., Fernández-Peña, E., Satkunskiene, D., Leite, N., & Venckunas, T. (2024). Uncovering Success Patterns in Track Cycling: Integrating Performance Data with Coaches and Athletes’ Perspectives. Applied Sciences (Switzerland), 14(7). https://doi.org/10.3390/app14073125

Dey, S., Mukherjee, A., Pati, M. K., Kar, A., Ramanaik, S., Pujar, A., Malve, V., Mohan, H. L., Jayanna, K., & N, S. (2022). Socio-demographic, behavioural and clinical factors influencing control of diabetes and hypertension in urban Mysore, South India: A mixed-method study conducted in 2018. Archives of Public Health, 80(1), 234. https://doi.org/10.1186/s13690-022-00996-y

Harde, S., Bhawnani, V., & Savant, S. (2025). Comparative Analysis of Data Driven Techniques to Predict Transfer Prices of Football Players. International Journal of Innovative Science and Research Technology, 735–739. https://doi.org/10.38124/ijisrt/25mar351

Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M., & Hussain, A. (2024). Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cognitive Computation, 16(1), 45–74. https://doi.org/10.1007/s12559-023-10179-8

Kabanda, G. K., Nkodila, A. N., Masudi, G. M., Beya, F. E. B., Ngasa, N. N. K., Mety, R. M., Buila, N. B., Kayembe, J.-M. N., Longo, B. M., & M’Buyamba-Kabangu, J.-R. (2022). Impact of Adapted Physical Activity on Blood Pressure and Hypertension Control in the Militaries of Kinshasa Garrison, Democratic Republic of Congo: A Randomized Controlled Trial. Annales Africaines de Medecine, 15(4), e4755–e4769. https://doi.org/10.4314/aamed.v15i4.2

Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778. https://doi.org/10.1016/j.eswa.2023.122778

Lu, Y., Wiltshire, H. D., Baker, J. S., Wang, Q., & Ying, S. (2023). Associations between dairy consumption, physical activity, and blood pressure in Chinese young women. Frontiers in Nutrition, 10, 1013503. https://doi.org/10.3389/fnut.2023.1013503

Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D. K.-W., Newman, S.-F., Kim, J., & Lee, S.-I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–760. https://doi.org/10.1038/s41551-018-0304-0

Mienye, I. D., & Sun, Y. (2022). A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access, 10, 99129–99149. https://doi.org/10.1109/ACCESS.2022.3207287

Moreno-Torres, J. G., Saez, J. A., & Herrera, F. (2012). Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1304–1312. https://doi.org/10.1109/TNNLS.2012.2199516

Riddell, M. C., Scott, S. N., Fournier, P. A., Colberg, S. R., Gallen, I. W., Moser, O., Stettler, C., Yardley, J. E., Zaharieva, D. P., Adolfsson, P., & Bracken, R. M. (2020). The competitive athlete with type 1 diabetes. Diabetologia, 63(8), 1475–1490. https://doi.org/10.1007/s00125-020-05183-8

Schweiger, V., Niederseer, D., Schmied, C., Attenhofer-Jost, C., & Caselli, S. (2021). Athletes and Hypertension. Current Cardiology Reports, 23(12), 176. https://doi.org/10.1007/s11886-021-01608-x

Sharma, S., Raval, M. S., Roy, M., Kaya, T., & Kapdi, R. (2023). Interpretable Machine Learning in Athletics for Injury Risk Prediction. In Explainable AI in Healthcare: Unboxing Machine Learning for Biomedicine (1st edn). Chapman and Hall/CRC. https://doi.org/10.1201/9781003333425

Sofro, A., Ariyanto, D., Budi Prihanto, J., A. Maulana, D., W. Romadhonia, R., & Maharani, A. (2024). Integration of Bivariate Logistic Regression Models and Decision Trees to Explore the Relationship between Socio-Demographic and Anthropometric Factors with the Incidence of Hypertension and Diabetes in Prospective Athletes. Sport Mont, 22(1), 71–78. https://doi.org/10.26773/smj.240210

Szeghalmy, S., & Fazekas, A. (2023). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4), 2333. https://doi.org/10.3390/s23042333

Wrang, C. M., Rossing, N. N., Agergaard, S., & Martin, L. J. (2022). The missing children: A systematic scoping review on talent identification and selection in football (soccer). European Journal for Sport and Society, 19(2), 135–150. https://doi.org/10.1080/16138171.2021.1916224

Zhang, W., & Cao, D. (2025). Comparative Analysis of Hybrid and Ensemble Machine Learning Approaches in Predicting Football Player Transfer Values. Cognitive Computation, 17(2), 88. https://doi.org/10.1007/s12559-025-10443-z

Downloads

Published

2026-01-29

How to Cite

Kurniawan, I. F., Sofro, A., Ariyanto, D., Prihanto, J. B., & Maulana, D. A. (2026). Investigating Prospective Athletic Athletes: Classifiers, Benchmarking, and Post-Hoc XAI Analysis. ComTech: Computer, Mathematics and Engineering Applications, 17(1). https://doi.org/10.21512/comtech.v17i1.13224
Abstract 76  .
PDF downloaded 3  .