AN EXPLAINABLE AI MODEL TO HATE SPEECH DETECTION ON INDONESIAN TWITTER
Keywords:Hate Speech, Twitter, Explainable, Artificial Intelligence
To avoid citizen disputes, hate speech on social media such as Twitter must be automatically detected. The current research in Indonesian Twitter has been focusing on developing better hate speech detection models, however there is limited study on the explainability aspects of hate speech detection. The fundamental concepts for this challenge are found in the field of Explainable AI (XAI), which is generally understood as a critical attribute for the deployment of AI models. In this work, classification was performed using traditional machine learning models, and the predictions were evaluated using an Explainable AI model such as LIME to allow users to comprehend why a tweet is regarded as a hateful message. According to our findings, models that perform well in classification perceive incorrect words as contributing to hate speech. As a result, such models would not be suitable for deployment in the real world. In our investigation, the combination of XGBoost and logical LIME explanations produced the most logical results. The use of such Explainability AI model highlights the importance of choosing the ideal model while maintaining user's trust in the deployed model.
G. H. Stanton and G. H. Stanton, “Journal of African Conflicts and Peace Studies The Rwandan Genocide : Why Early Warning Failed,” vol. 1, no. 2, pp. 6–25, 2009.
Komnas-HAM, Buku Saku Penanganan Ujaran Kebencian (Hate Speech). Jakarta: Komisi Nasional Hak Asasi Manusia, 2015.
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” Proc. 11th Int. Conf. Web Soc. Media, ICWSM 2017, no. Icwsm, pp. 512–515, 2017.
M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” pp. 46–57, 2019, doi: 10.18653/v1/w19-3506.
I. D. P. Wijana and M. Rohmadi, Sosiolinguistik: Kajian, Teori, dan Analisis. Yogyakarta: Pustaka Pelajar, 2010.
H. Yenala, A. Jhanwar, M. K. Chinnakotla, and J. Goyal, “Deep learning for detecting inappropriate content in text,” Int. J. Data Sci. Anal., vol. 6, no. 4, pp. 273–286, 2018, doi: 10.1007/s41060-017-0088-4.
Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting Offensive Language in Social Media to Protect Adolescent Online Safety,” in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, 2012, pp. 71–80, doi: 10.1109/SocialCom-PASSAT.2012.55.
Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” HLT-NAACL 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Student Res. Work., pp. 88–93, 2016, doi: 10.18653/v1/n16-2013.
I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate speech detection in the Indonesian language: A dataset and preliminary study,” in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2017, pp. 233–238, doi: 10.1109/ICACSIS.2017.8355039.
F. Del Vigna, A. Cimino, F. Dell’Orletta, M. Petrocchi, and M. Tesconi, “Hate me, hate me not: Hate speech detection on Facebook,” CEUR Workshop Proc., vol. 1816, pp. 86–95, 2017.
C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive language detection in online user content,” 25th Int. World Wide Web Conf. WWW 2016, pp. 145–153, 2016, doi: 10.1145/2872427.2883062.
M. Okky Ibrohim, E. Sazany, and I. Budi, “Identify Abusive and Offensive Language in Indonesian Twitter using Deep Learning Approach,” J. Phys. Conf. Ser., vol. 1196, p. 12041, 2019, doi: 10.1088/1742-6596/1196/1/012041.
S. Tuarob and J. L. Mitrpanont, “Automatic Discovery of Abusive Thai Language Usages in Social Networks BT - Digital Libraries: Data, Information, and Knowledge for Digital Lives,” 2017, pp. 267–278.
A. Barredo Arrieta et al., “Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Inf. Fusion, vol. 58, no. December 2019, pp. 82–115, 2020, doi: 10.1016/j.inffus.2019.12.012.
P. ElKafrawy, A. Mausad, and H. Esmail, “Experimental Comparison of Methods for Multi-label Classification in different Application Domains,” Int. J. Comput. Appl., vol. 114, no. 19, pp. 1–9, 2015, doi: 10.5120/20083-1666.
R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” vol. 14, Mar. 1995.
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier,” NAACL-HLT 2016 - 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr. Sess., pp. 97–101, 2016, doi: 10.18653/v1/n16-3020.
F. Z. Tala, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia,” M.Sc. Thesis, Append. D, vol. pp, pp. 39–46, 2003.
A. Mahajan, D. Shah, and G. Jafar, “Explainable AI Approach Towards Toxic Comment Classification,” pp. 849–858, 2021, doi: 10.1007/978-981-33-4367-2_81.
S. Frenda, B. Ghanem, M. Montes-Y-Gómez, and P. Rosso, “Online hate speech against women: Automatic identification of misogyny and sexism on twitter,” J. Intell. Fuzzy Syst., vol. 36, no. 5, pp. 4743–4752, 2019, doi: 10.3233/JIFS-179023.
Z. Zhang and L. Luo, “Hate speech detection: A solved problem? The challenging case of long tail on Twitter,” Semant. Web, vol. 10, no. 5, pp. 925–945, 2019, doi: 10.3233/SW-180338.
Copyright (c) 2022 Samsul Arifin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)