An Explainable AI Model for Hate Speech Detection on Indonesian Twitter

Muhammad Amien Ibrahim; Samsul Arifin; I Gusti Agung Anom Yudistira; Rinda Nariswari; Abdul Azis Abdillah; Nerru Pranuta Murnaka; Puguh Wahyu Prasetyo

doi:10.21512/commit.v16i2.8343

Authors

Muhammad Amien Ibrahim Bina Nusantara University
Samsul Arifin Bina Nusantara University
I Gusti Agung Anom Yudistira Bina Nusantara University
Rinda Nariswari Bina Nusantara University
Abdul Azis Abdillah Politeknik Negeri Jakarta
Nerru Pranuta Murnaka STKIP Surya
Puguh Wahyu Prasetyo Universitas Ahmad Dahlan

DOI:

https://doi.org/10.21512/commit.v16i2.8343

Keywords:

Artificial Intelligence Model, Hate Speech, Indonesian Twitter

Abstract

To avoid citizen disputes, hate speech on social media, such as Twitter, must be automatically detected. The current research in Indonesian Twitter focuses on developing better hate speech detection models. However, there is limited study on the explainability aspects of hate speech detection. The research aims to explain issues that previous researchers have not detailed and attempt to answer the shortcomings of previous researchers. There are 13,169 tweets in the dataset with labels like â€œhate speechâ€ and â€œabusive languageâ€. The dataset also provides binary labels on whether hate speech is directed to individual, group, religion, race, physical disability, and gender. In the research, classification is performed by using traditional machine learning models, and the predictions are evaluated using an Explainable AI model, such as Local Interpretable Model-Agnostic Explanations (LIME), to allow users to comprehend why a tweet is regarded as a hateful message. Moreover, models that perform well in classification perceive incorrect words as contributing to hate speech. As a result, such models are unsuitable for deployment in the real world. In the investigation, the combination of XGBoost and logical LIME explanations produces the most logical results. The use of the Explainable AI model highlights the importance of choosing the ideal model while maintaining usersâ€™ trust in the deployed model.

Dimensions

Plum Analytics

Author Biographies

Muhammad Amien Ibrahim, Bina Nusantara University

Computer Science Department, School of Computer Science

Samsul Arifin, Bina Nusantara University

Statistics Department, School of Computer Science

I Gusti Agung Anom Yudistira, Bina Nusantara University

Statistics Department, School of Computer Science

Rinda Nariswari, Bina Nusantara University

Statistics Department, School of Computer Science

Abdul Azis Abdillah, Politeknik Negeri Jakarta

Department of Mechanical Engineering

Nerru Pranuta Murnaka, STKIP Surya

Department of Mathematics Education

Puguh Wahyu Prasetyo, Universitas Ahmad Dahlan

Mathematics Education Department

References

Z. Zhang and L. Luo, â€œHate speech detection: A solved problem? The challenging case of long tail on Twitter,â€ Semantic Web, vol. 10, no. 5, pp. 925â€“945, 2019.

G. H. Stanton, â€œThe Rwandan genocide: Why early warning failed,â€ Journal of African Conflicts and Peace Studies, vol. 1, no. 2, pp. 6â€“25, 2009.

A. A. Abdillah, A. Azwardi, S. Permana, I. Susanto, F. Zainuri, and S. Arifin, â€œPerformance evaluation of linear discriminant analysis and support vector machines to classify cesarean section,â€ Eastern-European Journal of Enterprise Technologies, vol. 5, no. 2, pp. 37â€“43, 2021.

T. Davidson, D. Warmsley, M. Macy, and I. Weber, â€œAutomated hate speech detection and the problem of offensive language,â€ in Proceedings of the International AAAI Conference on Web and Social Media, vol. 11, no. 1, 2017, pp. 512â€“515.

M. O. Ibrohim and I. Budi, â€œMulti-label hate speech and abusive language detection in Indonesian Twitter,â€ in Proceedings of the Third Workshop on Abusive Language Online, 2019, pp. 46â€“57.

H. Yenala, A. Jhanwar, M. K. Chinnakotla, and J. Goyal, â€œDeep learning for detecting inappropriate content in text,â€ International Journal of Data Science and Analytics, vol. 6, no. 4, pp. 273â€“286, 2018.

S. Arifin, I. B. Muktyas, W. F. Al Maki, and M. K. B. Mohd Aziz, â€œGraph coloring program of exam scheduling modeling based on Bitwise coloring algorithm using Python,â€ Journal of Computer Science, vol. 18, no. 1, pp. 26â€“32, 2022.

S. Frenda, B. Ghanem, M. Montes-Y-GÂ´omez, and P. Rosso, â€œOnline hate speech against women: Automatic identification of misogyny and sexism on Twitter,â€ Journal of Intelligent & Fuzzy Systems, vol. 36, no. 5, pp. 4743â€“4752, 2019.

I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, â€œHate speech detection in the Indonesian language: A dataset and preliminary study,â€ in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS). Bali, Indonesia: IEEE, Oct. 28â€“29, 2017, pp. 233â€“238.

F. Del Vigna, A. Cimino, F. Dellâ€™Orletta, M. Petrocchi, and M. Tesconi, â€œHate me, hate me not: Hate speech detection on Facebook,â€ in Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), 2017, pp. 86â€“95.

C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, â€œAbusive language detection in online user content,â€ in Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 145â€“153.

M. O. Ibrohim, E. Sazany, and I. Budi, â€œIdentify abusive and offensive language in Indonesian Twitter using deep learning approach,â€ Journal of Physics: Conference Series, vol. 1196, no. 1, pp. 1â€“6, 2019.

S. Tuarob and J. L. Mitrpanont, â€œAutomatic discovery of abusive Thai language usages in social networks,â€ in International Conference on Asian Digital Libraries. Bangkok, Thailand: Springer, Nov. 13â€“15, 2017, pp. 267â€“278.

S. Arifin, I. B. Muktyas, P. W. Prasetyo, and A. A. Abdillah, â€œUnimodular matrix and Bernoulli map on text encryption algorithm using Python,â€ Al-Jabar: Jurnal Pendidikan Matematika, vol. 12, no. 2, pp. 447â€“455, 2021.

P. El Kafrawy, A. Mausad, and H. Esmail, â€œExperimental comparison of methods for multi-label classification in different application domains,â€ International Journal of Computer Applications, vol. 114, no. 19, pp. 1â€“9, 2015.

A. B. Arrieta, N. DÂ´Ä±az-RodrÂ´Ä±guez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. GarcÂ´Ä±a, S. Gil-LÂ´opez, D. Molina, R. Benjamins et al., â€œExplainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,â€ Information Fusion, vol. 58, pp. 82â€“115, 2020.

A. Mahajan, D. Shah, and G. Jafar, â€œExplainable AI approach towards toxic comment classification,â€ in Emerging technologies in data mining and information security. Springer, 2021, pp. 849â€“858.

G. I. PÂ´erez-Landa, O. Loyola-GonzÂ´alez, and M. A. Medina-PÂ´erez, â€œAn explainable artificial intelligence model for detecting xenophobic tweets,â€ Applied Sciences, vol. 11, no. 22, pp. 1â€“27, 2021.

Å. GÂ´orski and S. Ramakrishna, â€œExplainable artificial intelligence, lawyerâ€™s perspective,â€ in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, 2021, pp. 60â€“68.

I. B. Muktyas, Sulistiawati, and S. Arifin, â€œDigital image encryption algorithm through unimodular matrix and logistic map using Python,â€ in AIP Conference Proceedings, vol. 2331, no. 1. AIP Publishing LLC, 2021, pp. 020 006â€“1â€“020 006â€“7.

M. T. Ribeiro, S. Singh, and C. Guestrin, â€œâ€œwhy should I trust you?â€ Explaining the predictions of any classifier,â€ in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135â€“1144.

F. Z. Tala, â€œA study of stemming effects on information retrieval in Bahasa Indonesia,â€ Master thesis, Universiteit van Amsterdam, 2003.