Comparative Analysis of Decision Tree, Random Forest, and XGBoost for Student Category Prediction

Authors

  • Rayson Calvianto Lim Bina Nusantara University
  • Harvianto Harvianto Bina Nusantara University

DOI:

https://doi.org/10.21512/ijcshai.v3i1.14936

Keywords:

Machine Learning, Student category prediction, Random Forest, XGBoost, Descision Tree

Abstract

This research aims to develop and evaluate a lightweight machine learning framework for predicting student performance categories as a foundation for personalized curriculum design in a mid-sized school context. The study compares three baseline algorithms such as Decision Tree, Random Forest, and XGBoost implemented using an end-to-end workflow involving data preprocessing, feature engineering, model training, and evaluation. A dataset of anonymized student academic and behavioral attributes was prepared through cleaning, encoding, normalization, and stratified splitting to ensure consistency and reliability. Each model was assessed using accuracy, precision, recall, and F1-score to determine its predictive effectiveness. The experimental results show that the Random Forest model achieved the highest overall performance, demonstrating stronger generalization compared to Decision Tree and XGBoost. Medium-performing students were classified most reliably, while Low-performing students displayed greater variability, indicating the need for more comprehensive data to improve sensitivity toward at-risk learners. The originality of this study lies in its focus on implementing an accessible, resource-efficient predictive pipeline suitable for schools with limited technological capacity. The findings provide evidence that practical machine learning approaches can support early stages of data-driven curriculum planning and help educators make more informed instructional decisions. The study also highlights opportunities for future work, including the expansion of data sources and adoption of more advanced algorithms to enhance predictive accuracy and support broader educational applications.

Dimensions

Author Biographies

Rayson Calvianto Lim, Bina Nusantara University

Computer Science Department, School of Computer Science

Harvianto Harvianto, Bina Nusantara University

Computer Science Department, School of Computer Science

References

[1] M. Yanto, Mad Sa’I, and Nailatur Rizqiyah, “Personalisasi Pendidikan Berbasis AI dalam Meningkatkan Kualitas Belajar Siswa,” Entita: Jurnal Pendidikan Ilmu Pengetahuan Sosial dan Ilmu-Ilmu Sosial, pp. 507–522, May 2025, doi: 10.19105/ejpis.v1i.19116.

[2] J. Putri and A. Yudiawan, “ANALISIS KEMAMPUAN GURU DALAM MENGINTEGRASIKAN MATERI AJAR DENGAN KEBUTUHAN BELAJAR PESERTA DIDIK DI SEKOLAH DASAR,” Jurnal Ilmu Pendidikan (JIP), vol. 3, no. 3, pp. 491–502, 2025.

[3] E. Rumahlewang, M. Amin Lasaiba, L. Lokollo, and F. S. Noya, “EFEKTIVITAS E-LEARNING DAN KOMPETENSI ASESMEN DALAM MENINGKATKAN PRESTASI AKADEMIK,” SAP (Susunan Artikel Pendidikan, vol. 9, no. 3, 2025.

[4] S. H. Soro, E. Hayati, D. Tejawati3, and A. Susanti4, “Optimalisasi Perencanaan Strategik Pembelajaran Dalam Meningkatkan Mutu Lulusan Era Digital,” EDUKASIA: Jurnal Pendidikan dan Pembelajaran, vol. 5, pp. 2243–2252, 2024, [Online]. Available: https://jurnaledukasia.org

[5] F. Fithroni, W. Ika Setiyati, W. Aditiono, and R. Krismanto Priyoatmojo, “Langkah Kolaboratif Guru dalam Manajemen dan Intervensi Pembelajaran Model Pull Out bagi Siswa Slow Learner,” Ideguru : Jurnal Karya Ilmiah Guru, vol. 9, no. 2, 2024, doi: 10.51169/ideguru.v9i2.981.

[6] E. Kalita et al., “Educational data mining: a 10-year review,” Dec. 01, 2025, Springer Science and Business Media B.V. doi: 10.1007/s10791-025-09589-z.

[7] H. Al Aziz and H. A. Santoso, “Model Prediksi Stunting Anak di Indonesia Menggunakan Extreme Gradient Boosting,” Jurnal Algoritma, 2025, doi: 10.33364/algoritma/v.22-1.2289.

[8] M. R. Islam, A. M. Nitu, M. A. Marjan, M. P. Uddin, M. I. Afjal, and M. A. Al Mamun, “Enhancing tertiary students’ programming skills with an explainable Educational Data Mining approach,” PLoS One, vol. 19, no. 9 September, Sep. 2024, doi: 10.1371/journal.pone.0307536.

[9] P. R. Pitchapati, S. R. Vemula, and M. C. Moraes, “Evaluating Machine Learning Algorithms for Student Performance Prediction in Real-Time Learning Analytics Dashboards,” in Proceedings of the 57th ACM Technical Symposium on Computer Science Education V.2, New York, NY, USA: ACM, Feb. 2026, pp. 1481–1482. doi: 10.1145/3770761.3777204.

[10] N. Shirahama, “A Quasi-Experimental Study of Parallel Python-R Learning in Data Science Education: Implementation and Assessment in a BYOD Environment,” 2025.

[11] M. Rahbari, S. Rahlfs, E. Jortzik, I. Bogeski, and K. Becker, “H2O2 dynamics in the malaria parasite Plasmodium falciparum,” PLoS One, vol. 12, no. 4, Mar. 2017, doi: 10.1371/journal.

[12] A. Ariyanti, Y. T. Herlambang, and T. Muhtar, “Urgensi Kompetensi Pedagogik Guru dalam Pembelajaran Abad Ke- 21: Studi Kritis Pedagogik Futuristik,” Ideguru: Jurnal Karya Ilmiah Guru, vol. 10, no. 1, pp. 389–395, Nov. 2024, doi: 10.51169/ideguru.v10i1.1417.

[13] I. Datuzuhriah et al., “TEORI KONSTRUKTIVISTIK DAN IMPLIKASINYA DALAM PEMBELAJARAN PENDIDIKAN AGAMA ISLAM DI SMPN 1 RAMBUTAN,” Jurnal Pendidikan Agama Islam, vol. 12, 2025.

[14] N. Nafakoti and Sri Atun, “Pengaruh Inquiry-based Contextual Learning terhadap Kemampuan Berpikir Kritis dan Efikasi Diri pada Materi Laju Reaksi dalam Menyongsong Pendidikan yang Berkualitas,” Jurnal Pendidikan Matematika dan Sains, vol. 13, no. Special_issue, pp. 102–114, Aug. 2025, doi: 10.21831/jpms.v13iSpecial_issue.88419.

[15] S. Madani Rambe and S. Yeli, “DIGITAL DIVIDE DALAM PENERAPAN PENDEKATAN DEEP LEARNING DI SEKOLAH NEGERI KOTA PEKANBARU,” Research and Development Journal Of Education, vol. 11, no. 2, pp. 942–952, 2025, doi: 10.30998/rdje.v11i2.23989.

[16] T. J. Sinaga, “Jurnal J-MendiKKom (Jurnal Manajemen, Pendidikan dan Ilmu Komputer) Analitik Pendidikan 4.0: Penerapan Data Mining dalam Mengungkap Karakteristik Siswa,” Jurnal J-MendiKKom (Jurnal Manajemen, Pendidikan dan Ilmu Komputer), vol. 2, no. 2, pp. 3046–5893, 2025.

[17] V. M. Dinata, A. Wedi, and O. Fajarianto, “Pengembangan Model Pembelajaran Adaptif dengan Implementasi Learning Analytics Berdasarkan Gaya Belajar VARK,” 2024. [Online]. Available: https://jurnaldidaktika.org

[18] P. Chyan et al., Pengantar Data Science Mengambil Keputusan Berdasarkan Data PT. MIFANDI MANDIRI DIGITAL. 2024.

[19] D. Anggraini, “DATA MINING PENDIDIKAN: PREDIKSI GAYA BELAJAR MAHASISWA TEKNIK MENGGUNAKAN MACHINE LEARNING,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 11, no. 3, pp. 563–572, 2025.

[20] M. Ardiansyah and M. L. Nugraha, “IMPLEMENTASI DEEP LEARNING UNTUK MENINGKATKAN HASIL PEMBELAJARAN DI SEKOLAH MENENGAH KEJURUAN (SMK) SE-JAKARTA BARAT,” Research and Development Journal of Education, vol. 11, no. 1, p. 302, Apr. 2025, doi: 10.30998/rdje.v11i1.26453.

[21] H. A. Nabila and Endang Wahyu Pamungkas, “PERBANDINGAN ALGORITMA MACHINE LEARNING: SVM, RANDOM FOREST, DAN XGBOOST UNTUK PREDIKSI STROKE,” Rabit : Jurnal Teknologi dan Sistem Informasi Univrab, vol. 10, no. 2, pp. 1098–1110, Jul. 2025, doi: 10.36341/rabit.v10i2.6444.

[22] A. Kurniawan and R. Abdul Aziz, “Prediksi Kabut Bandar Udara di Indonesia Menggunakan Neural Network dan Radom Forest,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 2, pp. 746–757, Sep. 2024, doi: 10.47065/bits.v6i2.5544.

[23] E. D. Anggara, A. Widjaja, and B. R. Suteja, “Prediksi Kinerja Pegawai sebagai Rekomendasi Kenaikan Golongan dengan Metode Decision Tree dan Regresi Logistik,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 8, no. 1, Apr. 2022, doi: 10.28932/jutisi.v8i1.4479.

[24] R. Irfannandhy, L. B. Handoko, and N. Ariyanto, “Analisis Performa Model Random Forest dan CatBoost dengan Teknik SMOTE dalam Prediksi Risiko Diabetes,” Edumatic: Jurnal Pendidikan Informatika, vol. 8, no. 2, pp. 714–723, Dec. 2024, doi: 10.29408/edumatic.v8i2.27990.

[25] A. P. Sari, Billy, D. A. Tsaqif, B. Sartono, and A. R. Firdawanti, “Classification of Drinking Water Source Suitability in West Java Using XGBoost and Cluster Analysis Based on SHAP Values,” Indonesian Journal of Statistics and Its Applications, vol. 8, no. 2, pp. 202–214, Dec. 2024, doi: 10.29244/ijsa.v8i2p202-214.

[26] S. Mujiyono, U. P. Sanjaya, I. S. Wibisono, and H. Setyowati, “Prediksi Fluktuasi Berat Badan Berdasarkan Pola Hidup Menggunakan Model XGBoost dan Deep Learning,” Jurnal Algoritma, vol. 22, no. 1, pp. 221–233, May 2025, doi: 10.33364/algoritma/v.22-1.2253.

[27] S. N. Safitri, H. Setiadi, and E. Suryani, “Educational Data Mining Using Cluster Analysis Methods and Decision Trees based on Log Mining,” Jurnal RESTI, vol. 6, no. 3, pp. 448–456, Jun. 2022, doi: 10.29207/resti.v6i3.3935.

[28] A. Wibowo, M. Kom, and M. Si, Pengantar AI, BIG DATA dan ILMU DATA. 2025.

[29] S. Rohmah Nurpadilah and K. Nur Amany, “HukumMeminjam Uang Di Pegadaian Syariah Dengan Menggunakan Data-Data Milik Orang Lain,” Jurnal Kajian Islam Dan Sosial Keagamaan, vol. 2, no. 3, p. 429, 2025, [Online]. Available: https://jurnal.ittc.web.id/index.php/jkis/index

[30] Y. Shi, F. Sun, H. Zuo, and F. Peng, “Analysis of Learning Behavior Characteristics and Prediction of Learning Effect for Improving College Students’ Information Literacy Based on Machine Learning,” IEEE Access, vol. 11, pp. 50447–50461, 2023, doi: 10.1109/ACCESS.2023.3278370.

[31] S. Gangadhar Patchipala, “Data Anonymization in AI and ML Engineering: Balancing Privacy and Model Performance Using Presidio,” 2023.

[32] R. Fitriansyah and R. Tommy Gumelar, “Prediksi Faktor Risiko Gangguan Tidur Menggunakan Pendekatan Machine Learning Logistic Regression Dan Gradient Boosting,” vol. 5, no. 2, p. 2025, 2025.

[33] Y. Gong, G. Liu, Y. Xue, R. Li, and L. Meng, “A survey on dataset quality in machine learning,” Inf. Softw. Technol., vol. 162, Oct. 2023, doi: 10.1016/j.infsof.2023.107268.

[34] P.-O. Côté, A. Nikanjam, N. Ahmed, D. Humeniuk, and F. Khomh, “Data Cleaning and Machine Learning: A Systematic Literature Review,” May 2024, [Online]. Available: http://arxiv.org/abs/2310.01765

[35] H. A. Ahmed, P. J. Muhammad Ali, A. K. Faeq, and S. M. Abdullah, “An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method,” ARO-The Scientific Journal of Koya University, vol. 10, no. 2, pp. 29–37, Dec. 2022, doi: 10.14500/aro.10970.

[36] A. Decoux et al., “Comparative performances of machine learning algorithms in radiomics and impacting factors,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-39738-7.

[37] T. Gori, A. Sunyoto, and H. Al Fatta, “Preprocessing Data dan Klasifikasi untuk Prediksi Kinerja Akademik Siswa,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 1, pp. 215–224, Feb. 2024, doi: 10.25126/jtiik.20241118074.

[38] M. Nurus Siroj, A. Khanif Zyen, G. Wahyu, and N. Wibowo, “Optimizing Decision Tree and Random Forest with Grid Search and SMOTE for Malware Classification on IoT Network Traffic,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

[39] E. J. Kusuma, R. Nurmandhani, L. Aryani, I. Pantiawati, and G. F. Shidik, “Optimasi Model Extreme Gradient Boosting Dalam Upaya Penentuan Tingkat Risiko Pada Ibu Hamil Berbasis Bayesian Optimization (BOXGB),” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 12, no. 1, pp. 111–120, Feb. 2025, doi: 10.25126/jtiik.2025129001.

[40] A. P. Rahmadina, N. Y. Setiawan, and A. N. Rusydi, “ANALISIS SENTIMEN TERHADAP KEBIJAKAN KENAIKAN PPN 12% MENGGUNAKAN SUPPORT VECTOR MACHINE (SVM),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 9, no. 9, pp. 2548–964, 2025, [Online]. Available: http://j-ptiik.ub.ac.id

[41] I. Gede, J. Kurniarwan, C. Dewi, and M. A. Rahman, “PENERAPAN MACHINE LEARNING EXTREME GRADIENT BOOSTING DALAM KLASIFIKASI POTENSI TSUNAMI BERDASARKAN DATA GEMPA BUMI,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 9, no. 2, pp. 2548–964, 2025, [Online]. Available: http://j-ptiik.ub.ac.id

[42] M. J. Faisti, R. Hadapiningradja Kusumodestoni, G. Wahyu, and N. Wibowo, “Mental Health Classification Using Naïve Bayes and Random Forest Algorithms,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

Downloads

Published

2026-03-29

How to Cite

Lim, R. C., & Harvianto, H. (2026). Comparative Analysis of Decision Tree, Random Forest, and XGBoost for Student Category Prediction. International Journal of Computer Science and Humanitarian AI, 3(1), 9–19. https://doi.org/10.21512/ijcshai.v3i1.14936
Abstract 32  .
PDF downloaded 4  .