Binary Classification of Asthma for the CAPS Pediatric Dataset in Malawi Using Machine Learning

Authors

  • Jaffarus Sodiq Bina Nusantara University
  • Syarifah Diana Permai Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v7i3.14108

Keywords:

classification, lung, asthma, machine learning, child, health, logistic regression, random forest, XGBoost

Abstract

Childhood asthma poses a significant public health challenge, especially in low- and middle-income countries. An early intervention is essential for effective management and improved prevention of Childhood asthma. This study aims to develop a predictive model for childhood asthma by applying machine learning (ML) techniques. The dataset includes self-reported information on respiratory symptoms, anthropometric measurements, spirometry data, and personal carbon monoxide (CO) exposure among children aged 6–8 years in rural Malawi. We employed a supervised ML approach, focusing on classification algorithms and handling imbalanced outcomes, including Random Forest, Logistic Regression, and XGBoost. Additionally, this study applied the Synthetic Minority Over-sampling Technique (SMOTE), creating synthetic samples of the minority class to balance the distribution of the outcome variable in the training data. Data preprocessing involved handling missing values, feature selection, and normalization to ensure data quality and model performance. Model evaluation was conducted using cross-validation and performance metrics, including precision, recall, and F1-score. Among the evaluated models, Logistic Regression emerged as the most balanced approach, offering strong precision and the highest F1-score while maintaining a reasonable recall rate. This balance reduces the likelihood of overdiagnosis while still capturing a significant proportion of true positives, making it suitable for early screening applications. Moreover, Logistic regression, with its simple mathematical structure, provides more transparency and explainability, which are vital for clinical adoption and gaining practitioner trust.

Dimensions

Plum Analytics

Author Biographies

Jaffarus Sodiq, Bina Nusantara University

Statistics Department, School of Computer Science

Syarifah Diana Permai, Bina Nusantara University

Statistics Department, School of Computer Science

References

Albani, E., Michalopoulos, E., Strakadouna, E., Sakka, A., Triga, E., Saridi, M., Karali, M., & Tzenalis, A. (2020). The impact of asthma on children’s school life aged 6 to 12 years. International Journal of Medical Reviews and Case Reports, 0, 1. https://doi.org/10.5455/ijmrcr.asthma-children

Bansal, M., Goyal, A., & Choudhary, A. (2022). A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decision Analytics Journal, 3, 100071. https://doi.org/10.1016/j.dajour.2022.100071

Bhardwaj, P., Tyagi, A., Tyagi, S., Antão, J., Deng, Q. (2023). Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization. J Asthma, 60(3), 487 – 195.

Budiarto, A., Tsang, K.C.H., Wilson, A.M., Sheikh, A., Shah, S.A. (2023). Machine Learning–Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review. JMIR AI, 2, e46717.

Chen, Y., Sun, J., Chen, Y., Li, E., Lu, J., Tang, H., Xie, Y., Zhang, J.,Peng, L., Wu, H., Cheng, Z. J., Sun, B. (2025). Machine learning-based model for acute asthma exacerbation detection using routine blood parameters. World Allergy Organization Journal, 18(7), 101074.

Jeddi, Z., Gryech, I., Ghogho, M., Hammoumi, M. E. L., & Mahraoui, C. (2021). Machine learning for predicting the risk for childhood asthma using prenatal, perinatal, postnatal and environmental factors. Healthcare (Switzerland), 9(11). https://doi.org/10.3390/healthcare9111464

Kotlia, P., Pant, J., Lohani, M. C. (2025). Identifying Asthma Risk Factors and Developing Predictive Models for Early Intervention Using Machine Learning. Biomed Pharmacol Journal, 18.

Molfino, N.A., Turcatel, G., Riskin, D. (2024). Machine Learning Approaches to Predict Asthma Exacerbations: A Narrative Review. Adv Ther, 41(2), 534-552.

Mortimer, K., Reddel, H. K., Pitrez, P. M., & Bateman, E. D. (2022). Asthma management in low and middle income countries: case for change. European Respiratory Journal, 60(3). https://doi.org/10.1183/13993003.03179-2021

Muqarrabin, K. A., Fadlisyah, Safari, T. M. (2025). Classification of Asthma Diseases Using Machine Learning Models at Arun Hospital. Journal of Advanced Computer Knowledge and Algorithms, 2(2), 30 – 34.

Ojha, T., Patel, A., Sivapragasam, K., Sharma, R., Vosoughi, T., Skidmore, B., Pinto, A.D., Hosseini, B. (2024). Exploring Machine Learning Applications in Pediatric Asthma Management: Scoping Review. JMIR AI, 3, e57983.

Pourhomayoun, M., & Shakibi, M. (2021). Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making. Smart Health, 20. https://doi.org/10.1016/j.smhl.2020.100178

Rylance S, Nightingale R, Naunje A, Mbalume F, Jewell C, Balmes JR, Grigg J, Mortimer K. (2019). Lung health and exposure to air pollution in Malawian children (CAPS): a cross-sectional study. Thorax, 74(11), 1070-1077. doi: 10.1136/thoraxjnl-2018-212945

Soriano, J. B., Abajobir, A. A., Abate, K. H., Abera, S. F., Agrawal, A., Ahmed, M. B., Aichour, A. N., Aichour, I., Eddine Aichour, M. T., Alam, K., Alam, N., Alkaabi, J. M., Al-Maskari, F., Alvis-Guzman, N., Amberbir, A., Amoako, Y. A., Ansha, M. G., Antó, J. M., Asayesh, H., … Vos, T. (2017). Global, regional, and national deaths, prevalence, disability-adjusted life years, and years lived with disability for chronic obstructive pulmonary disease and asthma, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. The Lancet Respiratory Medicine, 5(9), 691–706. https://doi.org/10.1016/S2213-2600(17)30293-X

Xie, M., & Xu, C. (2024). Predicting the Risk of Asthma Development in Youth Using Machine Learning Models. https://doi.org/10.1101/2024.06.24.24309438

Zhou, W., & Tang, J. (2025). Prevalence and risk factors for childhood asthma: a systematic review and meta-analysis. BMC Pediatrics, 25(1). https://doi.org/10.1186/s12887-025-05409-x

Downloads

Published

2025-09-30

How to Cite

Sodiq, J., & Syarifah Diana Permai. (2025). Binary Classification of Asthma for the CAPS Pediatric Dataset in Malawi Using Machine Learning. Engineering, MAthematics and Computer Science Journal (EMACS), 7(3), 337–342. https://doi.org/10.21512/emacsjournal.v7i3.14108
Abstract 5  .
PDF downloaded 7  .