Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech

Authors

  • Kristien Margi Suryaningrum Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v5i2.9978

Keywords:

TF-IDF, Count Vectorizer, Support Vector Machine, Sentiment Analysis

Abstract

Hate speech is a form of expression used to spread hatred and commit acts of violence and discrimination against a person or group of people for various reasons. Cases of hate speech are very common in social media, one of which is Twitter. The goal to be achieved is to create a system that can classify a tweet on Twitter into hate speech (HS) or non-hate speech (NONHS) classes. The method used is Support Vector Machine by comparing the features of TF-IDF and Count Vectorizer. And the parameters compared are seen from accuracy, precision, recall, and f1-score. Results obtained, overall, by using the TF-IDF feature, the Support Vector Machine algorithm gets high results compared to the Count Vectorizer feature, with an accuracy value of 88.77%, 87.45% precision, 88.77% recall, and f1-score of 87.81%.

Dimensions

Plum Analytics

Author Biography

Kristien Margi Suryaningrum, Bina Nusantara University

Software Engineering Program, Computer Science Department, School of Computer Science

References

Adeva, J. G., Atxa, J. P., Carrillo, M. U., & Zengotitabengoa, E. A. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498-1508. https://doi.org/10.1016/j.eswa.2013.08.047

Amri, A. (2020). Implementasi Algoritma Random Forest Untuk Mendeteksi Hate Speech Dan Abusive Language Pada Twitter Bahasa Indonesia (Doctoral dissertation, Universitas Islam Negeri Sultan Syarif Kasim Riau).

Istaiteh, O., Al-Omoush, R., & Tedmori, S. (2020, October). Racist and sexist hate speech detection: Literature review. In 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 95-99). IEEE., doi: 10.1109/IDSTA50958.2020.9264052

Ivan, I., Sari, Y. A., & Adikara, P. P. (2019). Klasifikasi Hate Speech Berbahasa Indonesia di Twitter Menggunakan Naive Bayes dan Seleksi Fitur Information Gain dengan Normalisasi Kata. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 3(5), 4914-4922.

Modi, S. (2018, December). AHTDT-Automatic Hate Text Detection Techniques in Social Media. In 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET) (pp. 1-3). IEEE.

Moh, M., Moh, T. S., & Khieu, B. (2020, January). No” Love” Lost: Defending Hate Speech Detection Models Against Adversaries. In 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM) (pp. 1-6). IEEE.

Mustafa, M. S., Ramadhan, M. R., & Thenata, A. P. (2018). Implementasi Data Mining untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier. Creative Information Technology Journal, 4(2), 151-162.

Rachmah, E. N., & Baharuddin, F. (2019). Faktor pembentuk perilaku body shaming di media sosial. In Prosiding Seminar Nasional & Call Paper Psikologi Sosial (pp. 66-73). Retrieved from http://fppsi.um.ac.id/wp-content/uploads/2019/07/Eva-Nur.pdf

Ritonga, A. S., & Purwaningsih, E. S. (2018). Penerapan Metode Support Vector Machine (SVM) Dalam Klasifikasi Kualitas Pengelasan Smaw (Shield Metal Arc Welding). Jurnal Ilmiah Edutic: Pendidikan dan Informatika, 5(1), 17-25.

Salsabila, N. A., Winatmoko, Y. A., Septiandri, A. A., & Jamal, A. (2018, November). Colloquial indonesian lexicon. In 2018 International Conference on Asian Language Processing (IALP) (pp. 226-229). IEEE. https://doi.org/10.1109/IALP.2018.8629151

Tineges, R., Triayudi, A., & Sholihati, I. D. (2020). Analisis Sentimen Terhadap Layanan Indihome Berdasarkan Twitter Dengan Metode Klasifikasi Support Vector Machine (SVM). Jurnal Media Informatika Budidarma, 4(3), 650-658.

Watanabe, H., Bouazizi, M., & Ohtsuki, T. (2018). Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE access, 6, 13825-13835.

Zhang, Z., & Luo, L. (2019). Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web, 10(5), 925-945.

Downloads

Published

2023-05-31

Issue

Section

Articles
Abstract 345  .
PDF downloaded 288  .