Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech

Kristien Margi Suryaningrum

doi:10.21512/emacsjournal.v5i2.9978

Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech

Authors

Kristien Margi Suryaningrum Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v5i2.9978

Keywords:

TF-IDF, Count Vectorizer, Support Vector Machine, Sentiment Analysis

Abstract

Hate speech is a form of expression used to spread hatred and commit acts of violence and discrimination against a person or group of people for various reasons. Cases of hate speech are very common in social media, one of which is Twitter. The goal to be achieved is to create a system that can classify a tweet on Twitter into hate speech (HS) or non-hate speech (NONHS) classes. The method used is Support Vector Machine by comparing the features of TF-IDF and Count Vectorizer. And the parameters compared are seen from accuracy, precision, recall, and f1-score. Results obtained, overall, by using the TF-IDF feature, the Support Vector Machine algorithm gets high results compared to the Count Vectorizer feature, with an accuracy value of 88.77%, 87.45% precision, 88.77% recall, and f1-score of 87.81%.

Dimensions

Author Biography

Kristien Margi Suryaningrum, Bina Nusantara University

Software Engineering Program, Computer Science Department, School of Computer Science

References

Adeva, J. G., Atxa, J. P., Carrillo, M. U., & Zengotitabengoa, E. A. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498-1508. https://doi.org/10.1016/j.eswa.2013.08.047

Amri, A. (2020). Implementasi Algoritma Random Forest Untuk Mendeteksi Hate Speech Dan Abusive Language Pada Twitter Bahasa Indonesia (Doctoral dissertation, Universitas Islam Negeri Sultan Syarif Kasim Riau).

Istaiteh, O., Al-Omoush, R., & Tedmori, S. (2020, October). Racist and sexist hate speech detection: Literature review. In 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 95-99). IEEE., doi: 10.1109/IDSTA50958.2020.9264052

Ivan, I., Sari, Y. A., & Adikara, P. P. (2019). Klasifikasi Hate Speech Berbahasa Indonesia di Twitter Menggunakan Naive Bayes dan Seleksi Fitur Information Gain dengan Normalisasi Kata. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 3(5), 4914-4922.

Modi, S. (2018, December). AHTDT-Automatic Hate Text Detection Techniques in Social Media. In 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET) (pp. 1-3). IEEE.

Moh, M., Moh, T. S., & Khieu, B. (2020, January). Noâ€ Loveâ€ Lost: Defending Hate Speech Detection Models Against Adversaries. In 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM) (pp. 1-6). IEEE.

Mustafa, M. S., Ramadhan, M. R., & Thenata, A. P. (2018). Implementasi Data Mining untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier. Creative Information Technology Journal, 4(2), 151-162.

Rachmah, E. N., & Baharuddin, F. (2019). Faktor pembentuk perilaku body shaming di media sosial. In Prosiding Seminar Nasional & Call Paper Psikologi Sosial (pp. 66-73). Retrieved from http://fppsi.um.ac.id/wp-content/uploads/2019/07/Eva-Nur.pdf

Ritonga, A. S., & Purwaningsih, E. S. (2018). Penerapan Metode Support Vector Machine (SVM) Dalam Klasifikasi Kualitas Pengelasan Smaw (Shield Metal Arc Welding). Jurnal Ilmiah Edutic: Pendidikan dan Informatika, 5(1), 17-25.

Salsabila, N. A., Winatmoko, Y. A., Septiandri, A. A., & Jamal, A. (2018, November). Colloquial indonesian lexicon. In 2018 International Conference on Asian Language Processing (IALP) (pp. 226-229). IEEE. https://doi.org/10.1109/IALP.2018.8629151

Tineges, R., Triayudi, A., & Sholihati, I. D. (2020). Analisis Sentimen Terhadap Layanan Indihome Berdasarkan Twitter Dengan Metode Klasifikasi Support Vector Machine (SVM). Jurnal Media Informatika Budidarma, 4(3), 650-658.

Watanabe, H., Bouazizi, M., & Ohtsuki, T. (2018). Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE access, 6, 13825-13835.

Zhang, Z., & Luo, L. (2019). Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web, 10(5), 925-945.

Downloads

Published

2023-05-31

How to Cite

Suryaningrum, K. M. (2023). Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech. Engineering, MAthematics and Computer Science Journal (EMACS), 5(2), 79–83. https://doi.org/10.21512/emacsjournal.v5i2.9978

Download Citation

Issue

Vol. 5 No. 2 (2023): EMACS

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

USER RIGHTS

All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)

Abstract 1010 .
PDF downloaded 1309 .

Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech

Authors

DOI:

Keywords:

Abstract

Author Biography

Kristien Margi Suryaningrum, Bina Nusantara University

References

Downloads

Published

How to Cite

Issue

Section

License

sidebarmenu

toolsemacs