AN EXPLAINABLE AI MODEL TO HATE SPEECH DETECTION ON INDONESIAN TWITTER

Authors

  • Muhammad Amien Ibrahim Bina Nusantara University
  • Samsul Arifin Binus University
  • I Gusti Agung Anom Yudistira Bina Nusantara University
  • Rinda Nariswari Bina Nusantara University
  • Abdul Azis Abdillah Politeknik Negeri Jakarta
  • Nerru Pranuta Murnaka STKIP Surya
  • Puguh Wahyu Prasetyo Universitas Ahmad Dahlan

Keywords:

Hate Speech, Twitter, Explainable, Artificial Intelligence

Abstract

To avoid citizen disputes, hate speech on social media such as Twitter must be automatically detected. The current research in Indonesian Twitter has been focusing on developing better hate speech detection models, however there is limited study on the explainability aspects of hate speech detection. The fundamental concepts for this challenge are found in the field of Explainable AI (XAI), which is generally understood as a critical attribute for the deployment of AI models. In this work, classification was performed using traditional machine learning models, and the predictions were evaluated using an Explainable AI model such as LIME to allow users to comprehend why a tweet is regarded as a hateful message. According to our findings, models that perform well in classification perceive incorrect words as contributing to hate speech. As a result, such models would not be suitable for deployment in the real world. In our investigation, the combination of XGBoost and logical LIME explanations produced the most logical results. The use of such Explainability AI model highlights the importance of choosing the ideal model while maintaining user's trust in the deployed model.

 

 

Dimensions

Author Biographies

Muhammad Amien Ibrahim, Bina Nusantara University

Department of Computer Science, School of Computer Science

I Gusti Agung Anom Yudistira, Bina Nusantara University

Department of Statistics, School of Computer Science

Rinda Nariswari, Bina Nusantara University

Department of Statistics, School of Computer Science

Abdul Azis Abdillah, Politeknik Negeri Jakarta

Department of Mechanical Engineering

Nerru Pranuta Murnaka, STKIP Surya

Department of Mathematics Education

Puguh Wahyu Prasetyo, Universitas Ahmad Dahlan

Mathematics Education Department

References

G. H. Stanton and G. H. Stanton, “Journal of African Conflicts and Peace Studies The Rwandan Genocide : Why Early Warning Failed,” vol. 1, no. 2, pp. 6–25, 2009.

Komnas-HAM, Buku Saku Penanganan Ujaran Kebencian (Hate Speech). Jakarta: Komisi Nasional Hak Asasi Manusia, 2015.

T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” Proc. 11th Int. Conf. Web Soc. Media, ICWSM 2017, no. Icwsm, pp. 512–515, 2017.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” pp. 46–57, 2019, doi: 10.18653/v1/w19-3506.

I. D. P. Wijana and M. Rohmadi, Sosiolinguistik: Kajian, Teori, dan Analisis. Yogyakarta: Pustaka Pelajar, 2010.

H. Yenala, A. Jhanwar, M. K. Chinnakotla, and J. Goyal, “Deep learning for detecting inappropriate content in text,” Int. J. Data Sci. Anal., vol. 6, no. 4, pp. 273–286, 2018, doi: 10.1007/s41060-017-0088-4.

Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting Offensive Language in Social Media to Protect Adolescent Online Safety,” in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, 2012, pp. 71–80, doi: 10.1109/SocialCom-PASSAT.2012.55.

Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” HLT-NAACL 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Student Res. Work., pp. 88–93, 2016, doi: 10.18653/v1/n16-2013.

I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate speech detection in the Indonesian language: A dataset and preliminary study,” in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2017, pp. 233–238, doi: 10.1109/ICACSIS.2017.8355039.

F. Del Vigna, A. Cimino, F. Dell’Orletta, M. Petrocchi, and M. Tesconi, “Hate me, hate me not: Hate speech detection on Facebook,” CEUR Workshop Proc., vol. 1816, pp. 86–95, 2017.

C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive language detection in online user content,” 25th Int. World Wide Web Conf. WWW 2016, pp. 145–153, 2016, doi: 10.1145/2872427.2883062.

M. Okky Ibrohim, E. Sazany, and I. Budi, “Identify Abusive and Offensive Language in Indonesian Twitter using Deep Learning Approach,” J. Phys. Conf. Ser., vol. 1196, p. 12041, 2019, doi: 10.1088/1742-6596/1196/1/012041.

S. Tuarob and J. L. Mitrpanont, “Automatic Discovery of Abusive Thai Language Usages in Social Networks BT - Digital Libraries: Data, Information, and Knowledge for Digital Lives,” 2017, pp. 267–278.

A. Barredo Arrieta et al., “Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Inf. Fusion, vol. 58, no. December 2019, pp. 82–115, 2020, doi: 10.1016/j.inffus.2019.12.012.

P. ElKafrawy, A. Mausad, and H. Esmail, “Experimental Comparison of Methods for Multi-label Classification in different Application Domains,” Int. J. Comput. Appl., vol. 114, no. 19, pp. 1–9, 2015, doi: 10.5120/20083-1666.

R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” vol. 14, Mar. 1995.

M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,” NAACL-HLT 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr. Sess., pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.

F. Z. Tala, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia,” M.Sc. Thesis, Append. D, vol. pp, pp. 39–46, 2003.

A. Mahajan, D. Shah, and G. Jafar, “Explainable AI Approach Towards Toxic Comment Classification,” pp. 849–858, 2021, doi: 10.1007/978-981-33-4367-2_81.

S. Frenda, B. Ghanem, M. Montes-Y-Gómez, and P. Rosso, “Online hate speech against women: Automatic identification of misogyny and sexism on twitter,” J. Intell. Fuzzy Syst., vol. 36, no. 5, pp. 4743–4752, 2019, doi: 10.3233/JIFS-179023.

Z. Zhang and L. Luo, “Hate speech detection: A solved problem? The challenging case of long tail on Twitter,” Semant. Web, vol. 10, no. 5, pp. 925–945, 2019, doi: 10.3233/SW-180338.

Published

2022-06-08
Abstract 244  .