Impact of Statistical and Semantic Features Extraction for Emotion Detection on Indonesian Short Text Sentences

Amelia Devi Putri Ariyanto; Fari Katul Fikriah; Arif Fitra Setyawan

doi:10.21512/commit.v19i1.11680

Authors

Amelia Devi Putri Ariyanto Widya Husada University
Fari Katul Fikriah Widya Husada University
Arif Fitra Setyawan Widya Husada University

DOI:

https://doi.org/10.21512/commit.v19i1.11680

Keywords:

Emotion Detection, Semantic Features, Statistical Features, Machine Learning, Short Texts

Abstract

The ability to detect emotions in short texts is crucial because interpreting emotions on platforms like Twitter can offer insight into social trends and responses to specific events. Additionally, examining emotions in product reviews assists companies in comprehending customer sentiment, allowing them to improve the quality of their products and services. Most research on Indonesian language emotion detection utilizes statistical feature extraction, with limited discussion on the impact of both statistical and semantic feature extraction. Thus, the research aims to detect emotions in short texts equipped with an analysis of the impact of statistical and semantic features. Analysis of the impact of statistical and semantic features on short texts is necessary to identify the most effective approaches, improve detection accuracy, and ensure that the developed systems can better handle the variety and complexity of informal language. The data used are a public dataset originating from Twitter texts and product review texts in e-commerce. The research utilizes statistical features such as Term Frequency Inverse Document Frequency (TF-IDF) and semantic features such as Bidirectional Encoder Representations from Transformers (BERT). The evaluation results show that using semantic features significantly improves the performance of emotion detection in short texts by 13-24%. It is higher than using statistical features. Deep Learning (DL) algorithms based on neural networks have also been proven to outperform Machine Learning (ML) algorithms in detecting emotions in short text. The experimental results and outlines show the potential directions for future development.

Dimensions

Plum Analytics

Author Biographies

Amelia Devi Putri Ariyanto, Widya Husada University

Information Systems and Technology Study Program

Fari Katul Fikriah, Widya Husada University

Information Systems and Technology Study Program

Arif Fitra Setyawan, Widya Husada University

Information Systems and Technology Study Program

References

[1] M. S. Saputri, R. Mahendra, and M. Adriani, “Emotion classification on Indonesian Twitter dataset,” in 2018 International Conference on Asian Language Processing (IALP). Bandung, Indonesia: IEEE, Nov. 15–17, 2018, pp. 90–95.

[2] F. A. Acheampong, H. Nunoo-Mensah, and W. Chen, “Transformer models for text-based emotion detection: A review of BERT-based approaches,” Artificial Intelligence Review, vol. 54, no. 8, pp. 5789–5829, 2021.

[3] J. R. Andres, J. P. Soetandar, R. Sutoyo, and H. Riza, “Emotion recognition model using product review from Indonesia marketplace,” in 2023 2nd International Conference on Computer System, Information Technology, and Electrical Engineering (COSITE). Banda Aceh, Indonesia: IEEE, Aug. 2–3, 2023, pp. 67–71.

[4] R. Sutoyo, S. Achmad, A. Chowanda, E. W. Andangsari, and S. M. Isa, “PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks,” Data in Brief, vol. 44, pp. 1–8, 2022.

[5] A. N. Sutranggono and E. M. Imah, “Tweets emotions analysis of community activities restriction as COVID-19 policy in Indonesia using support vector machine,” CommIT (Communication and Information Technology) Journal, vol. 17, no. 1, pp. 13–25, 2023.

[6] Y. Liu, J. Lu, J. Yang, and F. Mao, “Sentiment analysis for e-commerce product reviews by deep learning model of Bert-BiGRU-Softmax,” Mathematical Biosciences and Engineering, vol. 17, no. 6, pp. 7819–7837, 2020.

[7] Y. Mao, L. Zhang, and Y. Li, “Finding product problems from online reviews based on BERTCRF model,” in ICEB 2019 Proceedings, Newcastle Upon Tyne, UK, 2019.

[8] I. K. Arsad, D. B. Setyohadi, and P. Mudjihartono, “E-commerce online review for detecting influencing factors users perception,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 6, pp. 3156–3166, 2021.

[9] A. D. P. Ariyanto, D. Purwitasari, and C. Fatichah, “A systematic review on semantic role labeling for information extraction in low-resource data,” IEEE Access, vol. 12, pp. 57 917–57 946, 2024.

[10] N. A. S. Winarsih and C. Supriyanto, “Evaluation of classification methods for Indonesian text emotion detection,” in 2016 International Seminar on Application for Technology of Information and Communication (ISemantic). Semarang, Indonesia: IEEE, Aug. 5–6, 2016, pp. 130–133.

[11] A. D. P. Ariyanto, C. Fatichah, and D. Purwitasari, “Semantic role labeling for information extraction on Indonesian texts: A literature review,” in 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, 2023, pp. 119–124.

[12] K. S. Nugroho, F. A. Bachtiar, and W. F. Mahmudy, “Detecting emotion in Indonesian Tweets: A term-weighting scheme study,” Journal of Information Systems Engineering & Business Intelligence, vol. 8, no. 1, pp. 61–70, 2022.

[13] A. Nurkasanah and M. Hayaty, “Feature extraction using lexicon on the emotion recognition dataset of Indonesian text,” Ultimatics: Jurnal Teknik Informatika, vol. 14, no. 1, pp. 20–27, 2022.

[14] H. T. Duong and T. A. Nguyen-Thi, “A review: Preprocessing techniques and data augmentation for sentiment analysis,” Computational Social Networks, vol. 8, pp. 1–16, 2021.

[15] A. D. P. Ariyanto and A. Z. Arifin, “Analisis metode representasi teks untuk deteksi interelasi Kitab Hadis: Systematic literature review,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 5, pp. 992–1000, 2021.

[16] A. D. A. P., A. A. Z., R. S. W., and R. I., “Metode pembobotan kata berbasis cluster untuk perangkingan dokumen berbahasa Arab (Clusterbased word weighting method for ranking Arabic documents),” Techno.COM, vol. 20, no. 2, pp. 259–267, 2021.

[17] Q. Liu, M. J. Kusner, and P. Blunsom, “A survey on contextual embeddings,” 2020. [Online]. Available: https://arxiv.org/abs/2003.07278

[18] A. D. P. Ariyanto, “Deteksi interelasi antar Kitab Hadis menggunakan word embedding dan ensemble learning,” Master’s thesis, Institut Teknologi Sepuluh Nopember, 2022.

[19] Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of nonformal Indonesian conversation,” Journal of Big Data, vol. 8, pp. 1–16, 2021.

[20] N. A. Salsabila, Y. A. Winatmoko, A. A. Septiandri, and A. Jamal, “Colloquial Indonesian lexicon,” in 2018 International Conference on Asian Language Processing (IALP). IEEE, 2018, pp. 226–229.

[21] J. Santoso, A. D. B. Soetiono, E. Setyati, E. M. Yuniarno, M. Hariadi, and M. H. Purnomo, “Selftraining naive bayes berbasis Word2Vec untuk kategorisasi berita bahasa Indonesia,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 7, no. 2, pp. 158–166, 2018.

[22] F. Z. El-Alami, S. O. El Alaoui, and N. E. Nahnahi, “Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 8422–8428, 2022.

[23] B. M. Hsu, “Comparison of supervised classification models on textual data,” Mathematics, vol. 8, no. 5, pp. 1–16, 2020.

[24] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19 083–19 095, 2022.

[25] S. Tangirala, “Evaluating the impact of Gini index and information gain on classification using decision tree classifier algorithm,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612–619, 2020.

[26] N. A. P. Rostam and N. H. A. H. Malim, “Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 658–667, 2021.

[27] H. A. Abu Alfeilat, A. B. Hassanat, O. Lasassmeh, A. S. Tarawneh, M. B. Alhasanat, H. S. Eyal Salman, and V. S. Prasath, “Effects of distance measure choice on k-nearest neighbor classifier performance: A review,” Big Data, vol. 7, pp. 221–248, 2019.