Investigating Prospective Athletic Athletes: Classifiers, Benchmarking, and Post-Hoc XAI Analysis
DOI:
https://doi.org/10.21512/comtech.v17i1.13224Keywords:
machine learning, sports science, explainable AI, Post-Hoc analysis, benchmarkAbstract
Identifying highly potential athletes is a critical yet inherently challenging process that requires comprehensive analysis of diverse factors, including physiological attributes, demographic characteristics, and social influences. This multifaceted process requires meticulous evaluation of extensive datasets to ensure both accuracy and fairness in talent identification protocols. The complexity stems from the interconnected nature of the determinants of athletic performance, where physical capabilities intersect with psychological resilience, social support systems, and environmental factors. In recent years, machine learning (ML) algorithms gain prominence in decision-making processes, offering unprecedented opportunities to uncover subtle patterns and relationships within athlete data that might otherwise remain hidden. This study systematically benchmarks the performance of several state-of-the-art ML classifiers using a novel, self-collected dataset of athlete candidates. Furthermore, an explainable AI (XAI) technique, Shapley Additive Explanations (SHAP), is applied to interpret model decisions and provide meaningful insights into key predictive factors. Experimental results demonstrate that Gradient Boosting achieves superior predictive performance (F1) across the 10-fold sets, with a mean value of 0.46. SHAP analysis reveals the critical importance of anthropometric measurements and social group features in influencing prediction outcomes. These findings collectively underscore the substantial potential of ML to revolutionize talent identification in sports while emphasizing the importance of model interpretability in fostering trust and acceptance of AIdriven decision-making processes.
References
Alpsoy, Ş. (2020). Exercise and Hypertension. In J. Xiao (Ed.), Physical Exercise for Human Health (Vol. 1228, pp. 153–167). Springer Nature Singapore. https://doi.org/10.1007/978-981-15-1792-1_10
Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., & Rinzivillo, S. (2023). Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, 37(5), 1719–1778. https://doi.org/10.1007/s10618-023-00933-9
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249–259. https://doi.org/10.1016/j.neunet.2018.07.011
Cesanelli, L., Lagoute, T., Ylaite, B., Calleja-González, J., Fernández-Peña, E., Satkunskiene, D., Leite, N., & Venckunas, T. (2024). Uncovering Success Patterns in Track Cycling: Integrating Performance Data with Coaches and Athletes’ Perspectives. Applied Sciences (Switzerland), 14(7). https://doi.org/10.3390/app14073125
Dey, S., Mukherjee, A., Pati, M. K., Kar, A., Ramanaik, S., Pujar, A., Malve, V., Mohan, H. L., Jayanna, K., & N, S. (2022). Socio-demographic, behavioural and clinical factors influencing control of diabetes and hypertension in urban Mysore, South India: A mixed-method study conducted in 2018. Archives of Public Health, 80(1), 234. https://doi.org/10.1186/s13690-022-00996-y
Harde, S., Bhawnani, V., & Savant, S. (2025). Comparative Analysis of Data Driven Techniques to Predict Transfer Prices of Football Players. International Journal of Innovative Science and Research Technology, 735–739. https://doi.org/10.38124/ijisrt/25mar351
Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M., & Hussain, A. (2024). Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cognitive Computation, 16(1), 45–74. https://doi.org/10.1007/s12559-023-10179-8
Kabanda, G. K., Nkodila, A. N., Masudi, G. M., Beya, F. E. B., Ngasa, N. N. K., Mety, R. M., Buila, N. B., Kayembe, J.-M. N., Longo, B. M., & M’Buyamba-Kabangu, J.-R. (2022). Impact of Adapted Physical Activity on Blood Pressure and Hypertension Control in the Militaries of Kinshasa Garrison, Democratic Republic of Congo: A Randomized Controlled Trial. Annales Africaines de Medecine, 15(4), e4755–e4769. https://doi.org/10.4314/aamed.v15i4.2
Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778. https://doi.org/10.1016/j.eswa.2023.122778
Lu, Y., Wiltshire, H. D., Baker, J. S., Wang, Q., & Ying, S. (2023). Associations between dairy consumption, physical activity, and blood pressure in Chinese young women. Frontiers in Nutrition, 10, 1013503. https://doi.org/10.3389/fnut.2023.1013503
Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D. K.-W., Newman, S.-F., Kim, J., & Lee, S.-I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–760. https://doi.org/10.1038/s41551-018-0304-0
Mienye, I. D., & Sun, Y. (2022). A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access, 10, 99129–99149. https://doi.org/10.1109/ACCESS.2022.3207287
Moreno-Torres, J. G., Saez, J. A., & Herrera, F. (2012). Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1304–1312. https://doi.org/10.1109/TNNLS.2012.2199516
Riddell, M. C., Scott, S. N., Fournier, P. A., Colberg, S. R., Gallen, I. W., Moser, O., Stettler, C., Yardley, J. E., Zaharieva, D. P., Adolfsson, P., & Bracken, R. M. (2020). The competitive athlete with type 1 diabetes. Diabetologia, 63(8), 1475–1490. https://doi.org/10.1007/s00125-020-05183-8
Schweiger, V., Niederseer, D., Schmied, C., Attenhofer-Jost, C., & Caselli, S. (2021). Athletes and Hypertension. Current Cardiology Reports, 23(12), 176. https://doi.org/10.1007/s11886-021-01608-x
Sharma, S., Raval, M. S., Roy, M., Kaya, T., & Kapdi, R. (2023). Interpretable Machine Learning in Athletics for Injury Risk Prediction. In Explainable AI in Healthcare: Unboxing Machine Learning for Biomedicine (1st edn). Chapman and Hall/CRC. https://doi.org/10.1201/9781003333425
Sofro, A., Ariyanto, D., Budi Prihanto, J., A. Maulana, D., W. Romadhonia, R., & Maharani, A. (2024). Integration of Bivariate Logistic Regression Models and Decision Trees to Explore the Relationship between Socio-Demographic and Anthropometric Factors with the Incidence of Hypertension and Diabetes in Prospective Athletes. Sport Mont, 22(1), 71–78. https://doi.org/10.26773/smj.240210
Szeghalmy, S., & Fazekas, A. (2023). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4), 2333. https://doi.org/10.3390/s23042333
Wrang, C. M., Rossing, N. N., Agergaard, S., & Martin, L. J. (2022). The missing children: A systematic scoping review on talent identification and selection in football (soccer). European Journal for Sport and Society, 19(2), 135–150. https://doi.org/10.1080/16138171.2021.1916224
Zhang, W., & Cao, D. (2025). Comparative Analysis of Hybrid and Ensemble Machine Learning Approaches in Predicting Football Player Transfer Values. Cognitive Computation, 17(2), 88. https://doi.org/10.1007/s12559-025-10443-z
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ibnu Febry Kurniawan, A'yunin Sofro, Danang Ariyanto, Junaidi Budi Prihanto, Dimas Avian Maulana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
 USER RIGHTS
 All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows:

















