Impact of Statistical and Semantic Features Extraction for Emotion Detection on Indonesian Short Text Sentences
DOI:
https://doi.org/10.21512/commit.v19i1.11680Keywords:
Emotion Detection, Semantic Features, Statistical Features, Machine Learning, Short TextsAbstract
The ability to detect emotions in short texts is crucial because interpreting emotions on platforms like Twitter can offer insight into social trends and responses to specific events. Additionally, examining emotions in product reviews assists companies in comprehending customer sentiment, allowing them to improve the quality of their products and services. Most research on Indonesian language emotion detection utilizes statistical feature extraction, with limited discussion on the impact of both statistical and semantic feature extraction. Thus, the research aims to detect emotions in short texts equipped with an analysis of the impact of statistical and semantic features. Analysis of the impact of statistical and semantic features on short texts is necessary to identify the most effective approaches, improve detection accuracy, and ensure that the developed systems can better handle the variety and complexity of informal language. The data used are a public dataset originating from Twitter texts and product review texts in e-commerce. The research utilizes statistical features such as Term Frequency Inverse Document Frequency (TF-IDF) and semantic features such as Bidirectional Encoder Representations from Transformers (BERT). The evaluation results show that using semantic features significantly improves the performance of emotion detection in short texts by 13–24%. It is higher than using statistical features. Deep Learning (DL) algorithms based on neural networks have also been proven to outperform Machine Learning (ML) algorithms in detecting emotions in short text. The experimental results and outlines show the potential directions for future development.
Plum Analytics
References
M. S. Saputri, R. Mahendra, and M. Adriani, “Emotion classification on Indonesian Twitter dataset,” in 2018 International Conference on Asian Language Processing (IALP). Bandung, Indonesia: IEEE, Nov. 15–17, 2018, pp. 90–95.
F. A. Acheampong, H. Nunoo-Mensah, and W. Chen, “Transformer models for text-based emotion detection: A review of BERT-based approaches,” Artificial Intelligence Review, vol. 54, no. 8, pp. 5789–5829, 2021.
J. R. Andres, J. P. Soetandar, R. Sutoyo, and H. Riza, “Emotion recognition model using product review from Indonesia marketplace,” in 2023 2nd International Conference on Computer System, Information Technology, and Electrical Engineering (COSITE). Banda Aceh, Indonesia: IEEE, Aug. 2–3, 2023, pp. 67–71.
R. Sutoyo, S. Achmad, A. Chowanda, E. W. Andangsari, and S. M. Isa, “PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks,” Data in Brief, vol. 44, pp. 1–8, 2022.
A. N. Sutranggono and E. M. Imah, “Tweets emotions analysis of community activities restriction as COVID-19 policy in Indonesia using support vector machine,” CommIT (Communication and Information Technology) Journal, vol. 17, no. 1, pp. 13–25, 2023.
Y. Liu, J. Lu, J. Yang, and F. Mao, “Sentiment analysis for e-commerce product reviews by deep learning model of Bert-BiGRU-Softmax,” Mathematical Biosciences and Engineering, vol. 17, no. 6, pp. 7819–7837, 2020.
Y. Mao, L. Zhang, and Y. Li, “Finding product problems from online reviews based on BERTCRF model,” in ICEB 2019 Proceedings, Newcastle Upon Tyne, UK, 2019.
I. K. Arsad, D. B. Setyohadi, and P. Mudjihartono, “E-commerce online review for detecting influencing factors users perception,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 6, pp. 3156–3166, 2021.
A. D. P. Ariyanto, D. Purwitasari, and C. Fatichah, “A systematic review on semantic role labeling for information extraction in low-resource data,” IEEE Access, vol. 12, pp. 57 917–57 946, 2024.
N. A. S. Winarsih and C. Supriyanto, “Evaluation of classification methods for Indonesian text emotion detection,” in 2016 International Seminar on Application for Technology of Information and Communication (ISemantic). Semarang, Indonesia: IEEE, Aug. 5–6, 2016, pp. 130–133.
A. D. P. Ariyanto, C. Fatichah, and D. Purwitasari, “Semantic role labeling for information extraction on Indonesian texts: A literature review,” in 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, 2023, pp. 119–124.
K. S. Nugroho, F. A. Bachtiar, and W. F. Mahmudy, “Detecting emotion in Indonesian Tweets: A term-weighting scheme study,” Journal of Information Systems Engineering & Business Intelligence, vol. 8, no. 1, pp. 61–70, 2022.
A. Nurkasanah and M. Hayaty, “Feature extraction using lexicon on the emotion recognition dataset of Indonesian text,” Ultimatics: Jurnal Teknik Informatika, vol. 14, no. 1, pp. 20–27, 2022.
H. T. Duong and T. A. Nguyen-Thi, “A review: Preprocessing techniques and data augmentation for sentiment analysis,” Computational Social Networks, vol. 8, pp. 1–16, 2021.
A. D. P. Ariyanto and A. Z. Arifin, “Analisis metode representasi teks untuk deteksi interelasi Kitab Hadis: Systematic literature review,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 5, pp. 992–1000, 2021.
A. D. A. P., A. A. Z., R. S. W., and R. I., “Metode pembobotan kata berbasis cluster untuk perangkingan dokumen berbahasa Arab (Clusterbased word weighting method for ranking Arabic documents),” Techno.COM, vol. 20, no. 2, pp. 259–267, 2021.
Q. Liu, M. J. Kusner, and P. Blunsom, “A survey on contextual embeddings,” 2020. [Online]. Available: https://arxiv.org/abs/2003.07278 [18] A. D. P. Ariyanto, “Deteksi interelasi antar Kitab Hadis menggunakan word embedding dan ensemble learning,” Master’s thesis, Institut Teknologi Sepuluh Nopember, 2022.
Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of nonformal Indonesian conversation,” Journal of Big Data, vol. 8, pp. 1–16, 2021.
N. A. Salsabila, Y. A. Winatmoko, A. A. Septiandri, and A. Jamal, “Colloquial Indonesian lexicon,” in 2018 International Conference on Asian Language Processing (IALP). IEEE, 2018, pp. 226–229.
J. Santoso, A. D. B. Soetiono, E. Setyati, E. M. Yuniarno, M. Hariadi, and M. H. Purnomo, “Selftraining naive bayes berbasis Word2Vec untuk kategorisasi berita bahasa Indonesia,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 7, no. 2, pp. 158–166, 2018.
F. Z. El-Alami, S. O. El Alaoui, and N. E. Nahnahi, “Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 8422–8428, 2022.
B. M. Hsu, “Comparison of supervised classification models on textual data,” Mathematics, vol. 8, no. 5, pp. 1–16, 2020.
M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19 083–19 095, 2022.
S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612–619, 2020.
N. A. P. Rostam and N. H. A. H. Malim, “Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 658–667, 2021.
H. A. Abu Alfeilat, A. B. Hassanat, O. Lasassmeh, A. S. Tarawneh, M. B. Alhasanat, H. S. Eyal Salman, and V. S. Prasath, “Effects of distance measure choice on k-nearest neighbor classifier performance: A review,” Big Data, vol. 7, pp. 221–248, 2019.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Amelia Devi Putri Ariyanto, Fari Katul Fikriah, Arif Fitra Setyawan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)