Binary Classification of Asthma for the CAPS Pediatric Dataset in Malawi Using Machine Learning
DOI:
https://doi.org/10.21512/emacsjournal.v7i3.14108Keywords:
classification, lung, asthma, machine learning, child, health, logistic regression, random forest, XGBoostAbstract
Childhood asthma poses a significant public health challenge, especially in low- and middle-income countries. An early intervention is essential for effective management and improved prevention of Childhood asthma. This study aims to develop a predictive model for childhood asthma by applying machine learning (ML) techniques. The dataset includes self-reported information on respiratory symptoms, anthropometric measurements, spirometry data, and personal carbon monoxide (CO) exposure among children aged 6–8 years in rural Malawi. We employed a supervised ML approach, focusing on classification algorithms and handling imbalanced outcomes, including Random Forest, Logistic Regression, and XGBoost. Additionally, this study applied the Synthetic Minority Over-sampling Technique (SMOTE), creating synthetic samples of the minority class to balance the distribution of the outcome variable in the training data. Data preprocessing involved handling missing values, feature selection, and normalization to ensure data quality and model performance. Model evaluation was conducted using cross-validation and performance metrics, including precision, recall, and F1-score. Among the evaluated models, Logistic Regression emerged as the most balanced approach, offering strong precision and the highest F1-score while maintaining a reasonable recall rate. This balance reduces the likelihood of overdiagnosis while still capturing a significant proportion of true positives, making it suitable for early screening applications. Moreover, Logistic regression, with its simple mathematical structure, provides more transparency and explainability, which are vital for clinical adoption and gaining practitioner trust.
Plum Analytics
References
Albani, E., Michalopoulos, E., Strakadouna, E., Sakka, A., Triga, E., Saridi, M., Karali, M., & Tzenalis, A. (2020). The impact of asthma on children’s school life aged 6 to 12 years. International Journal of Medical Reviews and Case Reports, 0, 1. https://doi.org/10.5455/ijmrcr.asthma-children
Bansal, M., Goyal, A., & Choudhary, A. (2022). A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decision Analytics Journal, 3, 100071. https://doi.org/10.1016/j.dajour.2022.100071
Bhardwaj, P., Tyagi, A., Tyagi, S., Antão, J., Deng, Q. (2023). Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization. J Asthma, 60(3), 487 – 195.
Budiarto, A., Tsang, K.C.H., Wilson, A.M., Sheikh, A., Shah, S.A. (2023). Machine Learning–Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review. JMIR AI, 2, e46717.
Chen, Y., Sun, J., Chen, Y., Li, E., Lu, J., Tang, H., Xie, Y., Zhang, J.,Peng, L., Wu, H., Cheng, Z. J., Sun, B. (2025). Machine learning-based model for acute asthma exacerbation detection using routine blood parameters. World Allergy Organization Journal, 18(7), 101074.
Jeddi, Z., Gryech, I., Ghogho, M., Hammoumi, M. E. L., & Mahraoui, C. (2021). Machine learning for predicting the risk for childhood asthma using prenatal, perinatal, postnatal and environmental factors. Healthcare (Switzerland), 9(11). https://doi.org/10.3390/healthcare9111464
Kotlia, P., Pant, J., Lohani, M. C. (2025). Identifying Asthma Risk Factors and Developing Predictive Models for Early Intervention Using Machine Learning. Biomed Pharmacol Journal, 18.
Molfino, N.A., Turcatel, G., Riskin, D. (2024). Machine Learning Approaches to Predict Asthma Exacerbations: A Narrative Review. Adv Ther, 41(2), 534-552.
Mortimer, K., Reddel, H. K., Pitrez, P. M., & Bateman, E. D. (2022). Asthma management in low and middle income countries: case for change. European Respiratory Journal, 60(3). https://doi.org/10.1183/13993003.03179-2021
Muqarrabin, K. A., Fadlisyah, Safari, T. M. (2025). Classification of Asthma Diseases Using Machine Learning Models at Arun Hospital. Journal of Advanced Computer Knowledge and Algorithms, 2(2), 30 – 34.
Ojha, T., Patel, A., Sivapragasam, K., Sharma, R., Vosoughi, T., Skidmore, B., Pinto, A.D., Hosseini, B. (2024). Exploring Machine Learning Applications in Pediatric Asthma Management: Scoping Review. JMIR AI, 3, e57983.
Pourhomayoun, M., & Shakibi, M. (2021). Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making. Smart Health, 20. https://doi.org/10.1016/j.smhl.2020.100178
Rylance S, Nightingale R, Naunje A, Mbalume F, Jewell C, Balmes JR, Grigg J, Mortimer K. (2019). Lung health and exposure to air pollution in Malawian children (CAPS): a cross-sectional study. Thorax, 74(11), 1070-1077. doi: 10.1136/thoraxjnl-2018-212945
Soriano, J. B., Abajobir, A. A., Abate, K. H., Abera, S. F., Agrawal, A., Ahmed, M. B., Aichour, A. N., Aichour, I., Eddine Aichour, M. T., Alam, K., Alam, N., Alkaabi, J. M., Al-Maskari, F., Alvis-Guzman, N., Amberbir, A., Amoako, Y. A., Ansha, M. G., Antó, J. M., Asayesh, H., … Vos, T. (2017). Global, regional, and national deaths, prevalence, disability-adjusted life years, and years lived with disability for chronic obstructive pulmonary disease and asthma, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. The Lancet Respiratory Medicine, 5(9), 691–706. https://doi.org/10.1016/S2213-2600(17)30293-X
Xie, M., & Xu, C. (2024). Predicting the Risk of Asthma Development in Youth Using Machine Learning Models. https://doi.org/10.1101/2024.06.24.24309438
Zhou, W., & Tang, J. (2025). Prevalence and risk factors for childhood asthma: a systematic review and meta-analysis. BMC Pediatrics, 25(1). https://doi.org/10.1186/s12887-025-05409-x
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jaffarus Sodiq, Syarifah Diana Permai

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)