SMOTE Effectiveness and various Machine Learning Algorithms to Predict Self-Esteem Levels of Indonesian Student
DOI:
https://doi.org/10.21512/emacsjournal.v7i2.13521Keywords:
self-esteem, machine learning, psychoinformatics, healthinformaticsAbstract
Self-esteem plays a crucial role in students' psychological well-being, influencing their academic performance and personal development. Despite its importance, self-esteem is challenging to measure due to its abstract and subjective nature. This study aims to develop a predictive model to classify students’ self-esteem levels as high or low using machine learning and tabular data obtained through questionnaires. A dataset comprising 47 student responses, with 19 features consisting of social, emotional, demographic aspects, were analyzed. Five machine learning models were evaluated: Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, and Support Vector Machine (SVM). To address the class imbalance in the dataset, the study applied SMOTE for data balancing and min-max normalization for feature standardization. Model performance was assessed using accuracy and F1-score. The results reveal that SVM, particularly with an RBF kernel, outperformed other models across all scenarios. On raw data, SVM achieved 66% accuracy and an F1-score of 57.3%. After applying SMOTE, the performance improved to 80% accuracy and a 79.9% F1-score. Further enhancement with normalization resulted in the best performance, with SVM achieving 83.33% accuracy and an F1-score of 83.3%. These results demonstrate how well preprocessing methods work to enhance machine learning models for datasets that are unbalanced. The proposed SVM-based model offers promising applications in educational and psychological settings, enabling early interventions to support students’ mental health.
Plum Analytics
References
Anshori, M., & Haris, M. S. (2022). Predicting Heart Disease using Logistic Regression. Knowledge Engineering and Data Science, 5(2), 188. https://doi.org/10.17977/um018v5i22022p188-196
Anshori, M., Mahmudy, F., & Supianto, A. A. (2019). Preprocessing Approach for Tuberculosis DNA Classification using Support Vector Machines ( SVM ). Journal of Information Technology and Computer Science, 4(3), 233–240. https://doi.org/https://doi.org/10.25126/jitecs.201943113
Anshori, M., Mar’i, F., & Bachtiar, F. A. (2019). Comparison of Machine Learning Methods for Android Malicious Software Classification based on System Call. Proceedings of 2019 4th International Conference on Sustainable Information Engineering and Technology, SIET 2019, 343–348. https://doi.org/10.1109/SIET48054.2019.8985998
Anshori, M., & Pangestu, G. (2024). Support vector model to predict smartphone addiction in early adolescents. AIP Conference Proceedings, 2927(1). https://doi.org/10.1063/5.0192301/3279174
Anshori, M., Rikatsih, N., Haris, M. S., Kesehatan, T., Rs, I., & Kesdam, S. (2023). PREDIKSI PASIEN DENGAN PENYAKIT KARDIOVASKULAR MENGGUNAKAN RANDOM FOREST. TEKTRIKA, 7(2), 58–64.
Ariyanti, V., & Purwoko, B. (2023). Faktor-Faktor yang Memengaruhi Self-Esteem Remaja: Literature Review. Teraputik: Jurnal Bimbingan Dan Konseling, 6(3), 362–368. https://doi.org/10.26539/teraputik.631389
Buettner, R., Sauter, D., Eckert, I., & Baumgartl, H. (2021). Classifying High and Low Self-Esteem using a Novel Machine Learning Method based on EEG Data. PACIS 2021 Proceedings.
Callahan, A., & Shah, N. H. (2017). Machine Learning in Healthcare. In Key Advances in Clinical Informatics: Transforming Health Care through Health Information Technology (pp. 279–291). Elsevier Inc. https://doi.org/10.1016/B978-0-12-809523-2.00019-4
Charbuty, B., & Abdulazeez, A. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165
Chen, X., & Ma, R. (2023). Adolescents’ Self-Esteem: The Influence Factors and Solutions. Journal of Education, Humanities and Social Sciences, 8, 1562–1566. https://doi.org/10.54097/ehss.v8i.4520
Cutillo, C. M., Sharma, K. R., Foschini, L., Kundu, S., Mackintosh, M., Mandl, K. D., Beck, T., Collier, E., Colvis, C., Gersing, K., Gordon, V., Jensen, R., Shabestari, B., & Southall, N. (2020). Machine intelligence in healthcare—perspectives on trustworthiness, explainability, usability, and transparency. Npj Digital Medicine, 3(1), 1–5. https://doi.org/10.1038/s41746-020-0254-2
Dewi, C. G., & Ibrahim, Y. (2019). Hubungan Self-Esteem (Harga Diri) dengan Perilaku Narsisme Pengguna Media Sosial Instagram pada Siswa SMA. Jurnal Neo Konseling, 1(2), 2019. https://doi.org/10.24036/0099kons2019
Douzas, G., Bacao, F., Fonseca, J., & Khudinyan, M. (2019). Imbalanced learning in land cover classification: Improving minority classes’ prediction accuracy using the geometric SMOTE algorithm. Remote Sensing, 11(24). https://doi.org/10.3390/rs11243040
Farida, Y., Ulinnuha, N., Sari, S. K., & Desinaini, L. N. (2023). Comparing Support Vector Machine and Naïve Bayes Methods with A Selection of Fast Correlation Based Filter Features in Detecting Parkinson’s Disease. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 14(2), 80. https://doi.org/10.24843/lkjiti.2023.v14.i02.p02
Gupta, V. K., Gupta, A., Kumar, D., & Sardana, A. (2021). Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model. Big Data Mining and Analytics, 4(2), 116–123. https://doi.org/10.26599/BDMA.2020.9020016
Henderi, H. (2021). Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. International Journal of Informatics and Information Systems, 4(1), 13–20. https://doi.org/10.47738/ijiis.v4i1.73
Maulud, D., & Abdulazeez, A. M. (2020). A Review on Linear Regression Comprehensive in Machine Learning. Journal of Applied Science and Technology Trends, 1(4), 140–147. https://doi.org/10.38094/jastt1457
Mirah Yunita, M., Isabel, K., Ernest Keziah, B., Cristina Natasya, M., Chandra Wijaya, S., & Studi Psikologi, P. (2022). Self-Esteem Dan Kesepian Pada Mahasiswa Selama Masa Pandemi. Jurnal Psikologi Malahayati, 4(2), 114–128.
Nidia Suryani, & Hamidah Rahim. (2022). Korelasi Self Esteem Dengan Tingkah Laku Sosial Serta Implikasinya Pada SD Muhammadiyah IV Padang. Jurnal Riset Madrasah Ibtidaiyah (JURMIA), 2(2), 237–246. https://doi.org/10.32665/jurmia.v2i2.511
Nusinovici, S., Tham, Y. C., Chak Yan, M. Y., Wei Ting, D. S., Li, J., Sabanayagam, C., Wong, T. Y., & Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of Clinical Epidemiology, 122, 56–69. https://doi.org/10.1016/j.jclinepi.2020.03.002
Saraswati, N. W. S., Dewi, D. A. P. R., & Pirozmand, P. (2024). Comparative Analysis of SVM and CNN for Pneumonia Detection in Chest X-Ray. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 15(1), 38. https://doi.org/10.24843/lkjiti.2024.v15.i01.p04
Selfilia Arum Kristanti, & Eva, N. (2022). Self-esteem dan Self-disclosure Generasi Z Pengguna Instagram. Jurnal Penelitian Psikologi, 13(1), 10–20. https://doi.org/10.29080/jpp.v13i1.697
Shiddiqi, H. A., Setiawan, K. E., & Fredyan, R. (2025). Leveraging Support Vector Machines and Ensemble Learning for Early Diabetes Risk Assessment : A Comparative Study. 7(1), 1–6. https://doi.org/10.21512/emacsjournal.v6
Sugihdharma, J. A., & Bachtiar, F. A. (2022). Myers-Briggs Type Indicator Personality Model Classification in English Text using Convolutional Neural Network Method. Jurnal Ilmu Komputer Dan Informasi, 15(2), 93–103. https://doi.org/10.21609/jiki.v15i2.1052
Suhartono, D., Ciputri, M. M., & Susilo, S. (2024). Machine Learning for Predicting Personality using Facebook-Based Posts. Engineering, MAthematics and Computer Science Journal (EMACS), 6(1), 1–6. https://doi.org/10.21512/emacsjournal.v6i1.10748
Trang, K., & Nguyen, A. H. (2022). A Comparative Study of Machine Learning-based Approach for Network Traffic Classification. Knowledge Engineering and Data Science, 4(2), 128. https://doi.org/10.17977/um018v4i22021p128-137
Wang, S., Dai, Y., Shen, J., & Xuan, J. (2021). Research on expansion and classification of imbalanced data based on SMOTE algorithm. Scientific Reports, 11(1), 1–11. https://doi.org/10.1038/s41598-021-03430-5
Wang, W., & Sun, D. (2021). The improved AdaBoost algorithms for imbalanced data classification. Information Sciences, 563, 358–374. https://doi.org/10.1016/j.ins.2021.03.042
Wicahyo, A., Pudoli, A., & Kusumaningsih, D. (2021). Penggunaan Algoritma Naive Bayes dalam klasifikasi Pengaruh Pencemaran Udara. Jurnal ICT : Information Communication & Technology, 20(1), 103–108. https://ejournal.ikmi.ac.id/index.php/jict-ikmi/article/view/332
Wu, T., Fan, H., Zhu, H., You, C., Zhou, H., & Huang, X. (2022). Intrusion detection system combined enhanced random forest with SMOTE algorithm. Eurasip Journal on Advances in Signal Processing, 2022(1). https://doi.org/10.1186/s13634-022-00871-6
Xie, Y., Xu, E., & Al-Aly, Z. (2022). Risks of mental health outcomes in people with covid-19: Cohort study. The BMJ, 376, 1–13. https://doi.org/10.1136/bmj-2021-068993
Zhao, Y., Zheng, Z., Pan, C., & Zhou, L. (2021). Self-Esteem and Academic Engagement Among Adolescents: A Moderated Mediation Model. Frontiers in Psychology, 12(June). https://doi.org/10.3389/fpsyg.2021.690828
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Mochammad Anshori, Risqy Siwi Pradini, Wahyu Teja Kusuma

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)