SMOTE Effectiveness and various Machine Learning Algorithms to Predict Self-Esteem Levels of Indonesian Student

Mochammad Anshori; Risqy Siwi Pradini; Wahyu Teja Kusuma

doi:10.21512/emacsjournal.v7i2.13521

Authors

Mochammad Anshori Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW
Risqy Siwi Pradini Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW
Wahyu Teja Kusuma Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW

DOI:

https://doi.org/10.21512/emacsjournal.v7i2.13521

Keywords:

self-esteem, machine learning, psychoinformatics, healthinformatics

Abstract

Self-esteem plays a crucial role in students' psychological well-being, influencing their academic performance and personal development. Despite its importance, self-esteem is challenging to measure due to its abstract and subjective nature. This study aims to develop a predictive model to classify students’ self-esteem levels as high or low using machine learning and tabular data obtained through questionnaires. A dataset comprising 47 student responses, with 19 features consisting of social, emotional, demographic aspects, were analyzed. Five machine learning models were evaluated: Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, and Support Vector Machine (SVM). To address the class imbalance in the dataset, the study applied SMOTE for data balancing and min-max normalization for feature standardization. Model performance was assessed using accuracy and F1-score. The results reveal that SVM, particularly with an RBF kernel, outperformed other models across all scenarios. On raw data, SVM achieved 66% accuracy and an F1-score of 57.3%. After applying SMOTE, the performance improved to 80% accuracy and a 79.9% F1-score. Further enhancement with normalization resulted in the best performance, with SVM achieving 83.33% accuracy and an F1-score of 83.3%. These results demonstrate how well preprocessing methods work to enhance machine learning models for datasets that are unbalanced. The proposed SVM-based model offers promising applications in educational and psychological settings, enabling early interventions to support students’ mental health.

Dimensions

References

Anshori, M., & Haris, M. S. (2022). Predicting Heart Disease using Logistic Regression. Knowledge Engineering and Data Science, 5(2), 188. https://doi.org/10.17977/um018v5i22022p188-196

Anshori, M., Mahmudy, F., & Supianto, A. A. (2019). Preprocessing Approach for Tuberculosis DNA Classification using Support Vector Machines ( SVM ). Journal of Information Technology and Computer Science, 4(3), 233–240. https://doi.org/https://doi.org/10.25126/jitecs.201943113

Anshori, M., Mar’i, F., & Bachtiar, F. A. (2019). Comparison of Machine Learning Methods for Android Malicious Software Classification based on System Call. Proceedings of 2019 4th International Conference on Sustainable Information Engineering and Technology, SIET 2019, 343–348. https://doi.org/10.1109/SIET48054.2019.8985998

Anshori, M., & Pangestu, G. (2024). Support vector model to predict smartphone addiction in early adolescents. AIP Conference Proceedings, 2927(1). https://doi.org/10.1063/5.0192301/3279174

Anshori, M., Rikatsih, N., Haris, M. S., Kesehatan, T., Rs, I., & Kesdam, S. (2023). PREDIKSI PASIEN DENGAN PENYAKIT KARDIOVASKULAR MENGGUNAKAN RANDOM FOREST. TEKTRIKA, 7(2), 58–64.

Ariyanti, V., & Purwoko, B. (2023). Faktor-Faktor yang Memengaruhi Self-Esteem Remaja: Literature Review. Teraputik: Jurnal Bimbingan Dan Konseling, 6(3), 362–368. https://doi.org/10.26539/teraputik.631389

Buettner, R., Sauter, D., Eckert, I., & Baumgartl, H. (2021). Classifying High and Low Self-Esteem using a Novel Machine Learning Method based on EEG Data. PACIS 2021 Proceedings.

Callahan, A., & Shah, N. H. (2017). Machine Learning in Healthcare. In Key Advances in Clinical Informatics: Transforming Health Care through Health Information Technology (pp. 279–291). Elsevier Inc. https://doi.org/10.1016/B978-0-12-809523-2.00019-4

Charbuty, B., & Abdulazeez, A. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165

Chen, X., & Ma, R. (2023). Adolescents’ Self-Esteem: The Influence Factors and Solutions. Journal of Education, Humanities and Social Sciences, 8, 1562–1566. https://doi.org/10.54097/ehss.v8i.4520

Cutillo, C. M., Sharma, K. R., Foschini, L., Kundu, S., Mackintosh, M., Mandl, K. D., Beck, T., Collier, E., Colvis, C., Gersing, K., Gordon, V., Jensen, R., Shabestari, B., & Southall, N. (2020). Machine intelligence in healthcare—perspectives on trustworthiness, explainability, usability, and transparency. Npj Digital Medicine, 3(1), 1–5. https://doi.org/10.1038/s41746-020-0254-2

Dewi, C. G., & Ibrahim, Y. (2019). Hubungan Self-Esteem (Harga Diri) dengan Perilaku Narsisme Pengguna Media Sosial Instagram pada Siswa SMA. Jurnal Neo Konseling, 1(2), 2019. https://doi.org/10.24036/0099kons2019

Douzas, G., Bacao, F., Fonseca, J., & Khudinyan, M. (2019). Imbalanced learning in land cover classification: Improving minority classes’ prediction accuracy using the geometric SMOTE algorithm. Remote Sensing, 11(24). https://doi.org/10.3390/rs11243040

Farida, Y., Ulinnuha, N., Sari, S. K., & Desinaini, L. N. (2023). Comparing Support Vector Machine and Naïve Bayes Methods with A Selection of Fast Correlation Based Filter Features in Detecting Parkinson’s Disease. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 14(2), 80. https://doi.org/10.24843/lkjiti.2023.v14.i02.p02

Gupta, V. K., Gupta, A., Kumar, D., & Sardana, A. (2021). Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model. Big Data Mining and Analytics, 4(2), 116–123. https://doi.org/10.26599/BDMA.2020.9020016

Henderi, H. (2021). Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. International Journal of Informatics and Information Systems, 4(1), 13–20. https://doi.org/10.47738/ijiis.v4i1.73

Maulud, D., & Abdulazeez, A. M. (2020). A Review on Linear Regression Comprehensive in Machine Learning. Journal of Applied Science and Technology Trends, 1(4), 140–147. https://doi.org/10.38094/jastt1457

Mirah Yunita, M., Isabel, K., Ernest Keziah, B., Cristina Natasya, M., Chandra Wijaya, S., & Studi Psikologi, P. (2022). Self-Esteem Dan Kesepian Pada Mahasiswa Selama Masa Pandemi. Jurnal Psikologi Malahayati, 4(2), 114–128.

Nidia Suryani, & Hamidah Rahim. (2022). Korelasi Self Esteem Dengan Tingkah Laku Sosial Serta Implikasinya Pada SD Muhammadiyah IV Padang. Jurnal Riset Madrasah Ibtidaiyah (JURMIA), 2(2), 237–246. https://doi.org/10.32665/jurmia.v2i2.511

Nusinovici, S., Tham, Y. C., Chak Yan, M. Y., Wei Ting, D. S., Li, J., Sabanayagam, C., Wong, T. Y., & Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of Clinical Epidemiology, 122, 56–69. https://doi.org/10.1016/j.jclinepi.2020.03.002

Saraswati, N. W. S., Dewi, D. A. P. R., & Pirozmand, P. (2024). Comparative Analysis of SVM and CNN for Pneumonia Detection in Chest X-Ray. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 15(1), 38. https://doi.org/10.24843/lkjiti.2024.v15.i01.p04

Selfilia Arum Kristanti, & Eva, N. (2022). Self-esteem dan Self-disclosure Generasi Z Pengguna Instagram. Jurnal Penelitian Psikologi, 13(1), 10–20. https://doi.org/10.29080/jpp.v13i1.697

Shiddiqi, H. A., Setiawan, K. E., & Fredyan, R. (2025). Leveraging Support Vector Machines and Ensemble Learning for Early Diabetes Risk Assessment : A Comparative Study. 7(1), 1–6. https://doi.org/10.21512/emacsjournal.v6

Sugihdharma, J. A., & Bachtiar, F. A. (2022). Myers-Briggs Type Indicator Personality Model Classification in English Text using Convolutional Neural Network Method. Jurnal Ilmu Komputer Dan Informasi, 15(2), 93–103. https://doi.org/10.21609/jiki.v15i2.1052

Suhartono, D., Ciputri, M. M., & Susilo, S. (2024). Machine Learning for Predicting Personality using Facebook-Based Posts. Engineering, MAthematics and Computer Science Journal (EMACS), 6(1), 1–6. https://doi.org/10.21512/emacsjournal.v6i1.10748

Trang, K., & Nguyen, A. H. (2022). A Comparative Study of Machine Learning-based Approach for Network Traffic Classification. Knowledge Engineering and Data Science, 4(2), 128. https://doi.org/10.17977/um018v4i22021p128-137

Wang, S., Dai, Y., Shen, J., & Xuan, J. (2021). Research on expansion and classification of imbalanced data based on SMOTE algorithm. Scientific Reports, 11(1), 1–11. https://doi.org/10.1038/s41598-021-03430-5

Wang, W., & Sun, D. (2021). The improved AdaBoost algorithms for imbalanced data classification. Information Sciences, 563, 358–374. https://doi.org/10.1016/j.ins.2021.03.042

Wicahyo, A., Pudoli, A., & Kusumaningsih, D. (2021). Penggunaan Algoritma Naive Bayes dalam klasifikasi Pengaruh Pencemaran Udara. Jurnal ICT : Information Communication & Technology, 20(1), 103–108. https://ejournal.ikmi.ac.id/index.php/jict-ikmi/article/view/332

Wu, T., Fan, H., Zhu, H., You, C., Zhou, H., & Huang, X. (2022). Intrusion detection system combined enhanced random forest with SMOTE algorithm. Eurasip Journal on Advances in Signal Processing, 2022(1). https://doi.org/10.1186/s13634-022-00871-6

Xie, Y., Xu, E., & Al-Aly, Z. (2022). Risks of mental health outcomes in people with covid-19: Cohort study. The BMJ, 376, 1–13. https://doi.org/10.1136/bmj-2021-068993

Zhao, Y., Zheng, Z., Pan, C., & Zhou, L. (2021). Self-Esteem and Academic Engagement Among Adolescents: A Moderated Mediation Model. Frontiers in Psychology, 12(June). https://doi.org/10.3389/fpsyg.2021.690828