Advancing Indonesian Audio Emotion Classification: A Comparative Study Using IndoWaveSentiment

Muhammad Rizki Nur Majiid; Karli Eka Setiawan; Prayoga Yudha Pamungkas; Taufiq Annas; Nicholas Lorenzo Setiawan

doi:10.21512/emacsjournal.v7i2.13415

Authors

Muhammad Rizki Nur Majiid Bina Nusantara University
Karli Eka Setiawan Bina Nusantara University
Prayoga Yudha Pamungkas Bina Nusantara University
Taufiq Annas Bina Nusantara University
Nicholas Lorenzo Setiawan Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v7i2.13415

Keywords:

Speech Emotion Recognition, Indonesian speech, IndoWaveSentiment, ensemble learning, acoustic features

Abstract

This study addresses the critical gap in Indonesian Speech Emotion Recognition (SER) by evaluating machine learning models on the IndoWaveSentiment dataset, a novel corpus of 300 high-fidelity recordings capturing five emotions (neutral, happy, surprised, disgusted, disappointed) from native speakers. The research aims to identify optimal classification techniques and acoustic features for Indonesian SER, given the language’s unique linguistic characteristics and the scarcity of annotated resources. Six models, Logistic Regression, KNN, Gradient Boosting, Random Forest, Naive Bayes, and SVC, were trained on 45 acoustic features, including spectral contrast, MFCCs, and zero crossing rate, extracted using Librosa. Results demonstrated Random Forest as the top performer (90% accuracy), followed by Gradient Boosting (85%) and Logistic Regression (75%), with spectral contrast (contrast2, contrast7) and MFCC1 emerging as the most discriminative features. The findings highlight the efficacy of ensemble methods in capturing nuanced emotional cues in Indonesian speech, outperforming prior studies on locally sourced datasets. Practical implications include applications in customer service analytics and mental health tools, though limitations such as the dataset’s-controlled conditions and fixed sentence structure necessitate caution in real-world deployment. Future work should expand the dataset to include regional dialects, spontaneous speech, and hybrid architectures like CNN-LSTMs. This study establishes foundational benchmarks for Indonesian SER, advocating for culturally informed models to enhance human-computer interaction in underrepresented linguistic contexts.

Dimensions

Author Biographies

Muhammad Rizki Nur Majiid, Bina Nusantara University

Computer Science Department Semarang Campus, School of Computer Science

Karli Eka Setiawan, Bina Nusantara University

Computer Science Department, School of Computer Science

Prayoga Yudha Pamungkas, Bina Nusantara University

Industrial Engineering Department, Faculty of Engineering

Taufiq Annas, Bina Nusantara University

Visual Communication Design Semarang Campus, School of Design

Nicholas Lorenzo Setiawan, Bina Nusantara University

Computer Science Department Semarang Campus, School of Computer Science

References

Aini, Y. K., Santoso, T. B., & Dutono, T. (2021). Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia. Jurnal Komputer Terapan, 7(1). https://doi.org/10.35143/jkt.v7i1.4623

Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. In Speech Communication (Vol. 116). https://doi.org/10.1016/j.specom.2019.12.001

Akinpelu, S., & Viriri, S. (2023). Speech emotion classification using attention based network and regularized feature selection. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-38868-2

Bustamin, A., Rizky, A. M., Warni, E., Areni, I. S., & Indrabayu. (2024). IndoWaveSentiment: Indonesian audio dataset for emotion classification. Data in Brief, 57, 111138. https://doi.org/https://doi.org/10.1016/j.dib.2024.111138

Caschera, M. C., Grifoni, P., & Ferri, F. (2022). Emotion Classification from Speech and Text in Videos Using a Multimodal Approach. Multimodal Technologies and Interaction, 6(4). https://doi.org/10.3390/mti6040028

Choudhary, R. R., Meena, G., & Mohbey, K. K. (2022). Speech Emotion Based Sentiment Recognition using Deep Neural Networks. Journal of Physics: Conference Series, 2236(1). https://doi.org/10.1088/1742-6596/2236/1/012003

Hidajat, M., Supria, Luwinda, F. A., & Sanjaya, H. (2019). Emotional Speech Classification Application Development Using Android Mobile Applications. 2019 International Conference on Information Management and Technology (ICIMTech), 400–403. https://doi.org/10.1109/ICIMTech.2019.8843816

Kumala, O. U., & Zahra, A. (2021). Indonesian Speech Emotion Recognition using Cross-Corpus Method with the Combination of MFCC and Teager Energy Features. International Journal of Advanced Computer Science and Applications, 12(4). https://doi.org/10.14569/IJACSA.2021.0120422

Luis Felipe Parra-Gallego, & Juan Rafael Orozco-Arroyave. (2023). Classification of Emotions and Evaluation of Customer Satisfaction from Speech in Real World Acoustic Environments. International Journal For Multidisciplinary Research, 5(3). https://doi.org/10.36948/ijfmr.2023.v05i03.4166

Minor, K. (2025). Developing Algorithm of Music Concepts and Operations Using The Modular Arithmetic. Engineering, MAthematics and Computer Science Journal (EMACS), 7(1), 51–59. https://doi.org/10.21512/emacsjournal.v7i1.12562

Minor, K. A., & Kartowisastro, I. H. (2022). Automatic Music Transcription Using Fourier Transform for Monophonic and Polyphonic Audio File. Ingénierie Des Systèmes d Information, 27(4), 629–635. https://doi.org/10.18280/isi.270413

Nath, S., Shahi, A. K., Martin, T., Choudhury, N., & Mandal, R. (2024). A Comparative Study on Speech Emotion Recognition Using Machine Learning. https://doi.org/10.1007/978-981-99-5435-3_5

Wijaya, A. A., Yasmina, I., & Zahra, A. (2021). Indonesian Music Emotion Recognition Based on Audio with Deep Learning Approach. Advances in Science, Technology and Engineering Systems Journal, 6(2), 716–721. https://doi.org/10.25046/aj060283

Wunarso, N. B., & Soelistio, Y. E. (2017). Towards Indonesian speech-emotion automatic recognition (I-SpEAR). Proceedings of 2017 4th International Conference on New Media Studies, CONMEDIA 2017, 2018-January. https://doi.org/10.1109/CONMEDIA.2017.8266038

Zahra, H. N., Ibrohim, M. O., Fahmi, J., Adelia, R., Nur Febryanto, F. A., & Riandi, O. (2020). Speech emotion recognition on indonesian youtube web series using deep learning approach. 2020 5th International Conference on Informatics and Computing, ICIC 2020. https://doi.org/10.1109/ICIC50835.2020.9288650

Advancing Indonesian Audio Emotion Classification: A Comparative Study Using IndoWaveSentiment

Authors

DOI:

Keywords:

Abstract

Author Biographies

Muhammad Rizki Nur Majiid, Bina Nusantara University

Karli Eka Setiawan, Bina Nusantara University

Prayoga Yudha Pamungkas, Bina Nusantara University

Taufiq Annas, Bina Nusantara University

Nicholas Lorenzo Setiawan, Bina Nusantara University

References

Downloads

Published

How to Cite

Issue

Section

License

sidebarmenu

toolsemacs