Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech
DOI:
https://doi.org/10.21512/emacsjournal.v5i2.9978Keywords:
TF-IDF, Count Vectorizer, Support Vector Machine, Sentiment AnalysisAbstract
Hate speech is a form of expression used to spread hatred and commit acts of violence and discrimination against a person or group of people for various reasons. Cases of hate speech are very common in social media, one of which is Twitter. The goal to be achieved is to create a system that can classify a tweet on Twitter into hate speech (HS) or non-hate speech (NONHS) classes. The method used is Support Vector Machine by comparing the features of TF-IDF and Count Vectorizer. And the parameters compared are seen from accuracy, precision, recall, and f1-score. Results obtained, overall, by using the TF-IDF feature, the Support Vector Machine algorithm gets high results compared to the Count Vectorizer feature, with an accuracy value of 88.77%, 87.45% precision, 88.77% recall, and f1-score of 87.81%.
Plum Analytics
References
Adeva, J. G., Atxa, J. P., Carrillo, M. U., & Zengotitabengoa, E. A. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498-1508. https://doi.org/10.1016/j.eswa.2013.08.047
Amri, A. (2020). Implementasi Algoritma Random Forest Untuk Mendeteksi Hate Speech Dan Abusive Language Pada Twitter Bahasa Indonesia (Doctoral dissertation, Universitas Islam Negeri Sultan Syarif Kasim Riau).
Istaiteh, O., Al-Omoush, R., & Tedmori, S. (2020, October). Racist and sexist hate speech detection: Literature review. In 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 95-99). IEEE., doi: 10.1109/IDSTA50958.2020.9264052
Ivan, I., Sari, Y. A., & Adikara, P. P. (2019). Klasifikasi Hate Speech Berbahasa Indonesia di Twitter Menggunakan Naive Bayes dan Seleksi Fitur Information Gain dengan Normalisasi Kata. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 3(5), 4914-4922.
Modi, S. (2018, December). AHTDT-Automatic Hate Text Detection Techniques in Social Media. In 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET) (pp. 1-3). IEEE.
Moh, M., Moh, T. S., & Khieu, B. (2020, January). No” Love” Lost: Defending Hate Speech Detection Models Against Adversaries. In 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM) (pp. 1-6). IEEE.
Mustafa, M. S., Ramadhan, M. R., & Thenata, A. P. (2018). Implementasi Data Mining untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier. Creative Information Technology Journal, 4(2), 151-162.
Rachmah, E. N., & Baharuddin, F. (2019). Faktor pembentuk perilaku body shaming di media sosial. In Prosiding Seminar Nasional & Call Paper Psikologi Sosial (pp. 66-73). Retrieved from http://fppsi.um.ac.id/wp-content/uploads/2019/07/Eva-Nur.pdf
Ritonga, A. S., & Purwaningsih, E. S. (2018). Penerapan Metode Support Vector Machine (SVM) Dalam Klasifikasi Kualitas Pengelasan Smaw (Shield Metal Arc Welding). Jurnal Ilmiah Edutic: Pendidikan dan Informatika, 5(1), 17-25.
Salsabila, N. A., Winatmoko, Y. A., Septiandri, A. A., & Jamal, A. (2018, November). Colloquial indonesian lexicon. In 2018 International Conference on Asian Language Processing (IALP) (pp. 226-229). IEEE. https://doi.org/10.1109/IALP.2018.8629151
Tineges, R., Triayudi, A., & Sholihati, I. D. (2020). Analisis Sentimen Terhadap Layanan Indihome Berdasarkan Twitter Dengan Metode Klasifikasi Support Vector Machine (SVM). Jurnal Media Informatika Budidarma, 4(3), 650-658.
Watanabe, H., Bouazizi, M., & Ohtsuki, T. (2018). Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE access, 6, 13825-13835.
Zhang, Z., & Luo, L. (2019). Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web, 10(5), 925-945.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Engineering, MAthematics and Computer Science (EMACS) Journal
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)