Cost-Sensitive Learning with LightGBM for Class Imbalance in Intrusion Detection Systems

Authors

  • Andien Dwi Novika Bina Nusantara University
  • Almuzhidul Mujhid Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v7i2.13435

Keywords:

LightGBM, Imbalanced Dataset, KDD99, Cybersecurity

Abstract

Imbalanced data is a common challenge in classification problems, where standard models tend to be biased toward majority classes, leading to poor detection of minority instances. This paper presents a comparative study of Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient Boosting (XGBoost) models, enhanced with cost-sensitive learning to address class imbalance at the algorithmic level. The objective is to evaluate the impact of cost-sensitive loss adjustments on model performance using various evaluation metrics. Experimental results show that both models achieved high cross-validation and test accuracies, with LightGBM and XGBoost recording over 99.9% accuracy. However, only cost-sensitive LightGBM achieved perfect scores in precision, recall, and F1-score, indicating its ability to handle minority class identification effectively. In contrast, XGBoost exhibited lower recall and F1-score despite similar accuracy, reflecting limitations in sensitivity to minority instances. Models without cost-sensitive learning demonstrated further drops in performance across minority-related metrics. The findings suggest that cost-sensitive LightGBM is a more robust solution for imbalanced classification tasks, outperforming both its baseline and the cost-sensitive XGBoost variant. This approach is particularly beneficial for critical applications such as fraud detection, cybersecurity, and medical diagnostics, where class imbalance is prevalent and misclassification costs are high

Dimensions

Plum Analytics

Author Biographies

Andien Dwi Novika, Bina Nusantara University

Computer Science Program, Computer Science Department, School of Computer Science

Almuzhidul Mujhid, Bina Nusantara University

Computer Science Program, Computer Science Department, School of Computer Science

References

Altalhan, M., Algarni, A., & Turki-Hadj Alouane, M. (2025). Imbalanced Data Problem in Machine Learning: A Review. IEEE Access, 13, 13686–13699. https://doi.org/10.1109/ACCESS.2025.3531662

Araf, I., Idri, A., & Chairi, I. (2024). Cost-sensitive learning for imbalanced medical data: A review. Artificial Intelligence Review, 57(4), 80. https://doi.org/10.1007/s10462-023-10652-8

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035

Jeong, D.-H., Kim, S.-E., Choi, W.-H., & Ahn, S.-H. (2022). A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset. Healthcare, 10(7), 1255. https://doi.org/10.3390/healthcare10071255

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree.

Liao, H., Zhang, X., Zhao, C., Chen, Y., Zeng, X., & Li, H. (2022). LightGBM: An efficient and accurate method for predicting pregnancy diseases. Journal of Obstetrics and Gynaecology, 42(4), 620–629. https://doi.org/10.1080/01443615.2021.1945006

Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289

Mienye, I. D., & Sun, Y. (2021). Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Informatics in Medicine Unlocked, 25, 100690. https://doi.org/10.1016/j.imu.2021.100690

Sadig, H. E., Kamal, M., Rehman, M. U., Habadi, M. I., Alnagar, D. K., Yusuf, M., Musa Mohammed, M. O., Alqasem, O. A., & Meraou, M. A. (2025). Advanced time complexity analysis for real-time COVID-19 prediction in Saudi Arabia using LightGBM and XGBoost. Journal of Radiation Research and Applied Sciences, 18(2), 101364. https://doi.org/10.1016/j.jrras.2025.101364

Spelmen, V. S., & Porkodi, R. (2018). A Review on Handling Imbalanced Data. 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–11. https://doi.org/10.1109/icctct.2018.8551020

Telikani, A., Gandomi, A. H., Choo, K.-K. R., & Shen, J. (2022). A Cost-Sensitive Deep Learning-Based Approach for Network Traffic Classification. IEEE Transactions on Network and Service Management, 19(1), 661–670. https://doi.org/10.1109/TNSM.2021.3112283

Wang, J., Jiang, X., Meng, Q., Saada, M., & Cai, H. (2022). Walking motion real-time detection method based on walking stick, IoT, COPOD and improved LightGBM. Applied Intelligence, 52(14), 16398–16416. https://doi.org/10.1007/s10489-022-03264-2

Zhang, D., & Gong, Y. (2020). The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure. IEEE Access, 8, 220990–221003. https://doi.org/10.1109/ACCESS.2020.3042848

Zhao, C., Yan, Z., Sun, X., & Wu, M. (2024). Enhancing aspect category detection in imbalanced online reviews: An integrated approach using Select-SMOTE and LightGBM. International Journal of Intelligent Networks, 5, 364–372. https://doi.org/10.1016/j.ijin.2024.10.002

Downloads

Published

2025-05-31
Abstract 93  .
PDF downloaded 71  .