Cost-Sensitive Learning with LightGBM for Class Imbalance in Intrusion Detection Systems
DOI:
https://doi.org/10.21512/emacsjournal.v7i2.13435Keywords:
LightGBM, Imbalanced Dataset, KDD99, CybersecurityAbstract
Imbalanced data is a common challenge in classification problems, where standard models tend to be biased toward majority classes, leading to poor detection of minority instances. This paper presents a comparative study of Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient Boosting (XGBoost) models, enhanced with cost-sensitive learning to address class imbalance at the algorithmic level. The objective is to evaluate the impact of cost-sensitive loss adjustments on model performance using various evaluation metrics. Experimental results show that both models achieved high cross-validation and test accuracies, with LightGBM and XGBoost recording over 99.9% accuracy. However, only cost-sensitive LightGBM achieved perfect scores in precision, recall, and F1-score, indicating its ability to handle minority class identification effectively. In contrast, XGBoost exhibited lower recall and F1-score despite similar accuracy, reflecting limitations in sensitivity to minority instances. Models without cost-sensitive learning demonstrated further drops in performance across minority-related metrics. The findings suggest that cost-sensitive LightGBM is a more robust solution for imbalanced classification tasks, outperforming both its baseline and the cost-sensitive XGBoost variant. This approach is particularly beneficial for critical applications such as fraud detection, cybersecurity, and medical diagnostics, where class imbalance is prevalent and misclassification costs are high
Plum Analytics
References
Altalhan, M., Algarni, A., & Turki-Hadj Alouane, M. (2025). Imbalanced Data Problem in Machine Learning: A Review. IEEE Access, 13, 13686–13699. https://doi.org/10.1109/ACCESS.2025.3531662
Araf, I., Idri, A., & Chairi, I. (2024). Cost-sensitive learning for imbalanced medical data: A review. Artificial Intelligence Review, 57(4), 80. https://doi.org/10.1007/s10462-023-10652-8
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035
Jeong, D.-H., Kim, S.-E., Choi, W.-H., & Ahn, S.-H. (2022). A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset. Healthcare, 10(7), 1255. https://doi.org/10.3390/healthcare10071255
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree.
Liao, H., Zhang, X., Zhao, C., Chen, Y., Zeng, X., & Li, H. (2022). LightGBM: An efficient and accurate method for predicting pregnancy diseases. Journal of Obstetrics and Gynaecology, 42(4), 620–629. https://doi.org/10.1080/01443615.2021.1945006
Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289
Mienye, I. D., & Sun, Y. (2021). Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Informatics in Medicine Unlocked, 25, 100690. https://doi.org/10.1016/j.imu.2021.100690
Sadig, H. E., Kamal, M., Rehman, M. U., Habadi, M. I., Alnagar, D. K., Yusuf, M., Musa Mohammed, M. O., Alqasem, O. A., & Meraou, M. A. (2025). Advanced time complexity analysis for real-time COVID-19 prediction in Saudi Arabia using LightGBM and XGBoost. Journal of Radiation Research and Applied Sciences, 18(2), 101364. https://doi.org/10.1016/j.jrras.2025.101364
Spelmen, V. S., & Porkodi, R. (2018). A Review on Handling Imbalanced Data. 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–11. https://doi.org/10.1109/icctct.2018.8551020
Telikani, A., Gandomi, A. H., Choo, K.-K. R., & Shen, J. (2022). A Cost-Sensitive Deep Learning-Based Approach for Network Traffic Classification. IEEE Transactions on Network and Service Management, 19(1), 661–670. https://doi.org/10.1109/TNSM.2021.3112283
Wang, J., Jiang, X., Meng, Q., Saada, M., & Cai, H. (2022). Walking motion real-time detection method based on walking stick, IoT, COPOD and improved LightGBM. Applied Intelligence, 52(14), 16398–16416. https://doi.org/10.1007/s10489-022-03264-2
Zhang, D., & Gong, Y. (2020). The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure. IEEE Access, 8, 220990–221003. https://doi.org/10.1109/ACCESS.2020.3042848
Zhao, C., Yan, Z., Sun, X., & Wu, M. (2024). Enhancing aspect category detection in imbalanced online reviews: An integrated approach using Select-SMOTE and LightGBM. International Journal of Intelligent Networks, 5, 364–372. https://doi.org/10.1016/j.ijin.2024.10.002
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Andien Dwi Novika, Almuzhidul Mujhid

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
Â
USER RIGHTS
 All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)