Hybrid Stacking Model for Web Attack Classification Using LightGBM, Random Forest, and MLP

Fadli Dony Pradana; Farikhin; Budi Warsito

doi:10.21512/commit.v20i1.13718

Authors

Fadli Dony Pradana Universitas Diponegoro
Farikhin Universitas Diponegoro
Budi Warsito Universitas Diponegoro

DOI:

https://doi.org/10.21512/commit.v20i1.13718

Keywords:

Stacking Hybrid, Web Attack Classification, Intrusion Detection System (IDS), Light Gradient Boosting Machine (LightGBM), Random Forest, Multi- Layer Perceptron (MLP)

Abstract

The research presents a stacking-based hybrid intrusion detection framework for web application attacks, addressing the persistent limitation that minority classes, including Brute Force, Cross-Site Scripting (XSS), and Structured Query Language (SQL) Injection, are frequently underdetected in conventional Intrusion Detection Systems (IDS) due to severe class imbalance. The proposed architecture combines LightGBM and Random Forest as base learners, while a Multi-Layer Perceptron (MLP) functions as the meta-learner. The framework is supported by rigorous preprocessing, ANOVA F-testbased feature selection, and domain-informed augmentation of critical traffic features, such as Flow Inter-Arrival Time (IAT) Min, Init Win bytes forward, and Backward (Bwd) Packets/s, through optimized weighting strategies. Evaluation on the CICIDS-2017 web attack subset using 10-fold stratified cross-validation shows that the proposed model improves the macro F1-Score from 0.62 ± 0.004 to 0.76 ± 0.003 and achieves a binary accuracy of 99.67% with a macro F1 of 0.94. The observed performance gains are statistically significant (p < 0.001), confirming the robustness of the framework. These findings indicate that targeted feature engineering and heterogeneous stacking substantially improve minority-attack detection while preserving majority-class performance. In addition, the framework demonstrates sub-millisecond inference time, highlighting its practical suitability for real-time IDS deployment in resource-constrained and high-throughput operational cybersecurity environments. The proposed design also offers methodological generalizability for broader anomaly detection tasks in dynamic network environments, where reliable recognition of low-frequency but high-impact attack patterns remains increasingly critically important.

Dimensions

Author Biographies

Fadli Dony Pradana, Universitas Diponegoro

Master Program of Information Systems, Postgraduate School

Farikhin, Universitas Diponegoro

Master Program of Information Systems, Postgraduate School

Budi Warsito, Universitas Diponegoro

Master Program of Information Systems, Postgraduate School

References

[1] H. Hindy, R. Atkinson, C. Tachtatzis, J. N. Colin, E. Bayne, and X. Bellekens, “Utilising deep learning techniques for effective zero-day attack detection,” Electronics, vol. 9, no. 10, pp. 1–16, 2020.

[2] N. Chergui and N. Boustia, “Contextual-based approach to reduce false positives,” IET Information Security, vol. 14, no. 1, pp. 89–98, 2020.

[3] Z. K. Maseer, R. Yusof, N. Bahaman, S. A. Mostafa, and C. F. M. Foozy, “Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset,” IEEE Access, vol. 9, pp. 22 351–22 370, 2021.

[4] A. Choubey and A. V. N. Krishna, “Intrusion detection system using deep learning methodologies,” Journal of Mathematical and Computational Science, vol. 11, no. 5, pp. 5278–5295, 2021.

[5] Y. C. Wang, Y. C. Houng, H. X. Chen, and S. M. Tseng, “Network anomaly intrusion detection based on deep learning approach,” Sensors, vol. 23, no. 4, pp. 1–21, 2023.

[6] D. K. Kang, K. B. Lee, and Y. C. Kim, “Cost efficient GPU cluster management for training and inference of deep learning,” Energies, vol. 15, no. 2, pp. 1–20, 2022.

[7] A. Ito, K. Saito, R. Ueno, and N. Homma, “Imbalanced data problems in deep learningbased side-channel attacks: Analysis and solution,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3790–3802, 2021.

[8] S. Basodi, C. Ji, H. Zhang, and Y. Pan, “Gradient amplification: An efficient way to train deep neural networks,” Big Data Mining and Analytics, vol. 3, no. 3, pp. 196–207, 2020.

[9] B. Selvakumar, M. Sivaanandh, K. Muneeswaran, and B. Lakshmanan, “Ensemble of feature augmented convolutional neural network and deep autoencoder for efficient detection of network attacks,” Scientific Reports, vol. 15, pp. 1–17, 2025.

[10] K. V. K. Chithanya and L. Reddy, “Automatic intrusion detection model with secure data storage on cloud using adaptive cyclic shift transposition with enhanced ANFIS classifier,” Cyber Security and Applications, vol. 3, pp. 1–15, 2025.

[11] A. S. Dina and D. Manivannan, “Intrusion detection based on machine learning techniques in computer networks,” Internet of Things, vol. 16, 2021.

[12] S. M. Kasongo, “A deep learning technique for intrusion detection system using a recurrent neural networks based framework,” Computer Communications, vol. 199, pp. 113–125, 2023.

[13] M. H. Alsulami, “Residual dense optimizationbased multi-attention transformer to detect network intrusion against cyber attacks,” Applied Sciences, vol. 14, no. 17, p. 7763, 2024.

[14] M. Aamir and S. M. A. Zaidi, “Clustering based semi-supervised machine learning for DDoS attack classification,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 4, pp. 436–446, 2021.

[15] M. H. Almourish, O. A. Abduljalil, and A. E. B. Alawi, “Anomaly-based web attacks detection using machine learning,” in International Conference on Smart Computing and Cyber Security: Strategic Foresight, Security Challenges and Innovation. Wonju-si, South Korea: Springer, Oct. 28–29, 2021, pp. 306–314.

[16] V. Q. Nguyen, V. H. Nguyen, T. C. Nguyen, and N. Shone, “A novel deep learning approach with magnet loss optimization for website attack detection,” in 2024 1st International Conference on Cryptography and Information Security (VCRIS). Hanoi, Vietnam: IEEE, Dec. 3–4, 2024, pp. 1–6.

[17] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.

[18] ——, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

[19] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992.

[20] E. Sevgen and S. Abdikan, “Classification of large-scale mobile laser scanning data in urban area with LightGBM,” Remote Sensing, vol. 15, no. 15, pp. 1–19, 2023.

[21] H. Wang, “Research on the application of random forest-based feature selection algorithm in data mining experiments,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 10, pp. 505–518, 2023.

[22] M. Massaoudi, S. S. Refaat, I. Chihi, M. Trabelsi, F. S. Oueslati, and H. Abu-Rub, “A novel stacked generalization ensemble-based hybrid LGBMXGB-MLP model for short-term load forecasting,” Energy, vol. 214, 2021.

[23] K. Vamsi Krishna, K. Swathi, P. Rama Koteswara Rao, and B. Basaveswara Rao, “A detailed analysis of the CIDDS-001 and CICIDS-2017 datasets,” in Pervasive Computing and Social Networking: Proceedings of ICPCSN 2021. Salem, India: Springer, 2022, pp. 619–638.

[24] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018). Madeira, Portugal: SCITEPRESS – Science and Technology Publications, Lda., Jan. 22–24, 2018, pp. 108–116.

[25] Z. Wang, C. F. Tsai, and W. C. Lin, “Data cleaning issues in class imbalanced datasets: Instance selection and missing values imputation for one-class classifiers,” Data Technologies and Applications, vol. 55, no. 5, pp. 771–787, 2021.

[26] Z. Ning, Z. Jiang, and D. Zhang, “Sparse projection infinite selection ensemble for imbalanced classification,” Knowledge-Based Systems, vol. 262, 2023.

[27] G. Kabir, S. Tesfamariam, J. Hemsing, and R. Sadiq, “Handling incomplete and missing data in water network database using imputation methods,” Sustainable and Resilient Infrastructure, vol. 5, no. 6, pp. 365–377, 2020.

[28] H. Chamlal, T. Ouaderhman, and F. Aaboub, “A graph based preordonnances theoretic supervised feature selection in high dimensional data,” Knowledge-Based Systems, vol. 257, 2022.

[29] L. K. Mramba, X. Liu, K. F. Lynch, J. Yang, C. A. Aronsson, S. Hummel, J. M. Norris, S. M. Virtanen, L. Hakola, U. M. Uusitalo, and J. P. Krischer, “Detecting potential outliers in longitudinal data with time-dependent covariates,” European Journal of Clinical Nutrition, vol. 78, no. 4, pp. 344–350, 2024.

[30] A. R´acz, D. Bajusz, and K. H´eberger, “Effect of dataset size and train/test split ratios in QSAR/QSPR multiclass classification,” Molecules, vol. 26, no. 4, pp. 1–16, 2021.

[31] T. Fontanari, T. C. Fr´oes, and M. Recamonde-Mendoza, “Cross-validation strategies for balanced and imbalanced datasets,” in Brazilian Conference on Intelligent Systems. Campinas, Brazil: Springer, Nov. 28–Dec. 1, 2022, pp. 626–640.

[32] M. A. Siddiqi and W. Pak, “An agile approach to identify single and hybrid normalization for enhancing machine learning-based network intrusion detection,” IEEE Access, vol. 9, pp. 137 494–137 513, 2021.

[33] S. S. Panwar, Y. P. Raiwani, and L. S. Panwar, “An intrusion detection model for CICIDS-2017 dataset using machine learning algorithms,” in 2022 International Conference on Advances in Computing, Communication and Materials (ICACCM). Dehradun, India: IEEE, Nov. 10–11, 2022, pp. 1–10.

[34] V. Madhumithaa and J. Govindarajan, “Domain based network intrusion detection system for IoT,” in 2023 IEEE 7th Conference on Information and Communication Technology (CICT). IEEE, 2023, pp. 1–7.

[35] F. Kamalov, S. E. Choutri, and A. F. Atiya, “Analytical formulation of Synthetic Minority Oversampling Technique (SMOTE) for imbalanced learning,” Gulf Journal of Mathematics, vol. 19, no. 1, pp. 400–415, 2025.

[36] A. O. Widodo, B. Setiawan, and R. Indraswari, “Machine learning-based intrusion detection on multi-class imbalanced dataset using SMOTE,” Procedia Computer Science, vol. 234, pp. 578–583, 2024.

[37] I. K. Nti, O. Narko-Boateng, A. F. Adekoya, and A. R. Somanathan, “Stacknet based decision fusion classifier for network intrusion detection.” The International Arab Journal of Information Technology, vol. 19, no. 3A, pp. 478–490, 2022.

[38] E. M. Hameed, H. Joshi, and A. A. A. Ismael, “The effect of combining datasets in diabetes prediction using ensemble learning techniques,” CommIT (Communication and Information Technology) Journal, vol. 19, no. 1, pp. 129–140, 2025.

[39] H. Pham and S. Olafsson, “On cesaro averages for weighted trees in the random forest,” Journal of Classification, vol. 37, no. 1, pp. 223–236, 2020.

[40] X. Xiao, J. Liu, D. Liu, Y. Tang, J. Dai, and F. Zhang, “SSAE-MLP: Stacked sparse autoencoders-based multi-layer perceptron for main bearing temperature prediction of largescale wind turbines,” Concurrency and Computation: Practice and Experience, vol. 33, no. 17, 2021.

[41] M. Liang, B. An, K. Li, L. Du, T. Deng, S. Cao, Y. Du, L. Xu, X. Gao, L. Zhang, J. Li, and H. Gao, “Improving genomic prediction with machine learning incorporating TPE for hyperparameters optimization,” Biology, vol. 11, no. 11, pp. 1–13, 2022.

[42] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS). Canberra, ACT, Australia: IEEE, Nov. 10–12, 2015, pp. 1–6.

[43] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. Ottawa, ON, Canada: IEEE, July 8–10, 2009, pp. 1–6.

[44] N. He, Z. Zhang, X. Wang, and T. Gao, “Efficient privacy-preserving federated deep learning for network intrusion of industrial IoT,” International Journal of Intelligent Systems, vol. 2023, no. 1, pp. 1–22, 2023.