Comparison of IndoBERT and SVM Algorithm to Perform Aspect Based Sentiment Analysis using Hierarchical Dirichlet Process
DOI:
https://doi.org/10.21512/emacsjournal.v7i3.13493Keywords:
Hierarchical Dirichlet Process, SVM, IndoBERT, SMOTE, Aspect Based Sentiment AnalysisAbstract
Analyzing the performance of SVM and IndoBERT models for aspect-based sentiment analysis on fashion reviews in Tokopedia E-Commerce. This study employs the SMOTE technique due to the imbalance in the original data. Aspect determination using the Hierarchical Dirichlet Process (HDP) model yields satisfactory results with an adequate coherence score. The comparison between SVM and IndoBERT methods for aspect-based sentiment analysis shows that SVM is superior. IndoBERT achieved an accuracy of 87%, precision of 91%, recall of 93%, and F1-Score of 92%, while SVM attained an accuracy of 96%, precision of 100%, recall of 92%, and F1- Score of 96%. Therefore, the SVM model was chosen for implementation on a website that allows users to view aspect-based sentiment analysis on products in E-Commerce. The HDP model effectively grouped related terms into aspects such as “Material,” “Shipping,” and “Colour,” enhancing interpretability in sentiment classification. The resulting website enables users to analyze product sentiments interactively, providing actionable insights for both sellers and customers to assess product quality and service satisfaction more efficiently.
References
Brownlee, J. (2020). Imbalanced classification with Python: Better metrics, balance skewed classes, and apply cost-sensitive learning. Machine Learning Mastery.
Blanco, V., Japón, A., & Puerto, J. (2023). Multiclass optimal classification trees with SVM-splits. Machine Learning, 112(12), 4905–4928. https://doi.org/10.1007/s10994-023-06366-1
Budiman, I., Faisal, M. R., Faridhah, A., Farmadi, A., Mazdadi, M. I., Saragih, T. H., & Abadi, F. (2024). Classification performance comparison of BERT and IndoBERT on self-report of COVID-19 status on social media. Journal of Computer Sciences Institute, 30, 61–67. https://doi.org/10.35784/jcsi.5564
Cortes, C., & Vapnik, V. (2019). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Gaye, B., Zhang, D., & Wulamu, A. (2021). Improvement of Support Vector Machine Algorithm in Big Data Background. Mathematical Problems in Engineering, 2021, 1–9. https://doi.org/10.1155/2021/5594899
Geetha, M., & Renuka, D. K. (2021). Improving the performance of aspect-based sentiment analysis using fine-tuned BERT Base Uncased model. International Journal of Intelligent Networks, 2, 64–69. https://doi.org/10.1016/j.ijin.2021.06.005
Goldberg, Y. (2017). Neural network methods for natural language processing. Morgan & Claypool Publishers. https://doi.org/10.2200/S00762ED1V01Y201703HLT037
Koto, F., Lau, J. H., & Baldwin, T. (2020). IndoBERT: A pre-trained language model for Indonesian. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 757–770).
Liu, B. (2020). Introduction. In Sentiment analysis: Mining opinions, sentiments, and emotions (pp. 1–17). Cambridge University Press.
Maulani, I., Fatichah, C., & Wijaya, A. Y. (2024). Klasifikasi ulasan berdasarkan divisi pada Google Play menggunakan metode Hierarchical Dirichlet Process dan metode Ensemble [Review classification based on divisions on Google Play using the Hierarchical Dirichlet Process and Ensemble methods]. ILKOMNIKA: Journal of Computer Science and Applied Informatics, 6(1), 30–42. https://doi.org/10.28926/ilkomnika.v6i1.596
Merdiansah, R., Siska, & Ridha, A. A. (2024). Analisis sentimen pengguna X Indonesia terkait kendaraan listrik menggunakan IndoBERT [Sentiment analysis of Indonesian X users regarding electric vehicles using IndoBERT]. Jurnal Informatika Komputer, 9(2), 100–110. https://ejournal.sisfokomtek.org/index.php/jikom/article/view/2895/2044
Parengkuan, S., & Nurhasanah, N. (2021). Analisis komparatif preferensi konsumen dalam belanja online [Comparative analysis of consumer preferences in online shopping]. Jurnal Ekonomi: Journal of Economic, 12(2). https://doi.org/10.47007/jeko.v12i02.4345
Pavithra, C. B., & Savitha, J. (2024). Topic modeling for evolving textual data using LDA, HDP, NMF, BERTopic, and DTM with a focus on research papers. Journal of Machine Learning Research, 12(1), 45–57.
Sarang, P. (2023). Support Vector Machines (pp. 153–165). https://doi.org/10.1007/978-3-031-02363-7_8
Taufiq Dwi Purnomo, & Joko Sutopo. (2024). Comparison Of Pre-Trained Bert-Based Transformer Models for Regional Language Text Sentiment Analysis in Indonesia. International Journal Science and Technology, 3(3), 11–21. https://doi.org/10.56127/ijst.v3i3.1739
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sheila Prima Octarini, Alfi Yusrotis Zakiyyah, Kartika Purwandari

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)