Deciphering Digital Discourse: Detecting Cyberbullying Patterns in Filipino Tweets Using Machine Learning

Authors

  • January F. Naga MSU-Iligan Institute of Technology
  • Rabby Q. Lavilles MSU-Iligan Institute of Technology

DOI:

https://doi.org/10.21512/commit.v18i2.11094

Keywords:

Digital Discourse, Cyberbullying Patterns, Social Media, Machine Learning

Abstract

The research addresses the escalating challenge of cyberbullying in the Philippines, a concern magnified by widespread social media use. A dataset of 146,661 tweets is analyzed using a pre-trained natural language processing model tailored to detect derogatory Filipino terms. The methodology is designed to preprocess data for clarity and analyze derogatory phrases, using the 23 key terms to indicate cyberbullying. Through quantitative analysis, specific patterns of derogatory term co-occurrence are uncovered. The research specifically focuses on Filipino digital discourse, uncovering patterns of derogatory language usage, which is unique to this context. Combining data mining and machine learning techniques, including Frequent Pattern (FP)-growth for pattern identification, cosine similarity for phrase correlation, and classification technique, the research achieves an accuracy rate of 97.91%. To assess the model’s reliability and precision, a 10-fold cross-validation is utilized. Moreover, by examining specific tweets, the analysis highlights the alignment between automated classifications and human judgment. The co-occurrence of derogatory terms, identified through methods like FP-growth and cosine similarity, reveals underlying cyberbullying narratives that are not immediately obvious. This approach validates the high accuracy of the models and emphasizes the importance of a comprehensive framework for detecting cyberbullying in a linguistically and culturally specific context. The findings substantiate the effectiveness of the targeted approach, providing essential insights for developing cyberbullying prevention strategies. Furthermore, the research enriches the literature on digital discourse analysis and online harassment prevention by addressing cyberbullying patterns and behaviors. Importantly, the research offers valuable guidance for policymakers in crafting more effective online safety measures in the Philippines.

Dimensions

Plum Analytics

Author Biographies

January F. Naga, MSU-Iligan Institute of Technology

Department of Information Technology, College of Computer Studies

Rabby Q. Lavilles, MSU-Iligan Institute of Technology

Department of Information Technology, College of Computer Studies

References

K. D. Pe˜na, “PH social media craze: 77% of Filipinos more engaging online than in real life,” 2023. [Online]. Available: https://shorturl.at/Co5xS

S. Kemp, “Digital 2023: The Philippines.” [Online]. Available: https://datareportal.com/reports/digital-2023-philippines

Statista, “Social media in the Philippines - Statistics & facts.” [Online]. Available: https://www.statista.com/topics/6759/social-media-usage-in-the-philippines/#topicOverview

UNICEF, “Online bullying remains prevalent in the Philippines, other countries,” September 2019. [Online]. Available: https://shorturl.at/Gawld

K. H. Chan, Tommy, C. M. K. Cheung, and Z. W. Y. Lee, “Cyberbullying on social networking sites: A literature review and future research directions,” vol. 58, no. 2, pp. 1–16, 2021.

C. Van Hee, G. Jacobs, C. Emmery, B. Desmet, E. Lefever, B. Verhoeven, G. De Pauw, W. Daelemans, and V. Hoste, “Automatic detection of cyberbullying in social media text,” vol. 13, no. 10, pp. 1–22, 2018.

J. Li, Y. Wu, and T. Hesketh, “Internet use and cyberbullying: Impacts on psychosocial and psychosomatic wellbeing among Chinese adolescents,” Computers in Human Behavior, vol. 138, pp. 1–10, 2023.

R. Garett, L. R. Lord, and S. D. Young, “Associations between social media and cyberbullying: A review of the literature,” mHealth, vol. 2, pp. 1–7, 2016. [Online]. Available: https://mhealth.amegroups.org/article/view/12924

R. Lokeshkumar, O. A. Mishra, and S. Kalra, “Social media data analysis to predict mental state of users using machine learning techniques,” Journal of Education and Health Promotion, vol. 10, pp. 1–23, 2021.

C. Zachlod, O. Samuel, A. Ochsner, and S. Werthm¨uller, “Analytics of social media data – State of characteristics and application,” Journal of Business Research, vol. 144, pp. 1064–1076, 2022.

A. Dewani, M. A. Memon, and S. Bhatti, “Cyberbullying detection: Advanced preprocessing techniques & deep learning architecture for Roman Urdu data,” Journal of Big Data, vol. 8, pp. 1–20, 2021.

M. A. Al-Garadi, M. R. Hussain, N. Khan, G. Murtaza, H. F. Nweke, I. Ali, G. Mujtaba, H. Chiroma, H. A. Khattak, and A. Gani, “Predicting cyber bullying on social media in the big data era using machine learning algorithms: Review of literature and open challenges,” IEEE Access, vol. 7, pp. 70 701–70 718, 2019.

V. Banerjee, J. Telavane, P. Gaikwad, and P. Vartak, “Detection of cyberbullying using deep neural network,” in 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS). Coimbatore, India: IEEE, 2019, pp. 604–607.

Y. Yadav, P. Bajaj, R. K. Gupta, and R. Sinha, “A comparative study of deep learning methods for hate speech and offensive language detection in textual data,” in 2021 IEEE 18th India Council International Conference (INDICON). Guwahati, India: IEEE, 2021, pp. 1–6.

M. Di Capua, E. Di Nardo, and A. Petrosino, “Unsupervised cyber bullying detection in social networks,” in 2016 23rd International conference on pattern recognition (ICPR). Cancun, Mexico: IEEE, 2016, pp. 432–437.

Noviantho, S. M. Isa, and L. Ashianti, “Cyberbullying classification using text mining,” in 2017 1st International Conference on Informatics and Computational Sciences (ICICoS). Semarang, Indonesia: IEEE, 2017, pp. 241–246.

M. Dehghani, D. T. Dehkordy, and M. Bahrani, “Abusive words Detection in Persian tweets using machine learning and deep learning techniques,” in 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS). Tehran, Islamic Republic of Iran: IEEE, 2021, pp. 1–5.

O. Oriola and E. Kotz´e, “Improved semisupervised learning technique for automatic detection of South African abusive language on Twitter,” South African Computer Journal, vol. 32, no. 2, pp. 56–79, 2020.

A. Bozyi˘git, S. Utku, and E. Nasibov, “Cyberbullying detection: Utilizing social media features,” Expert Systems with Applications, vol. 179, 2021.

F. Beyhan, B. Arık, I. Arin, A. Terzioglu, B. Yanikoglu, and R. Yeniterzi, “A Turkish hate speech dataset and detection system,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022, pp. 4177–4185.

S. T. Luu, K. Van Nguyen, and N. L. T. Nguyen, “Impacts of transformer-based language models and imbalanced data for hate speech detection on Vietnamese social media texts,” Research Square Platform, vol. LLC, 2022.

A. Silva and N. Roman, “Hate speech detection in Portuguese with Na¨ıve Bayes, SVM, MLP and Logistic Regression,” in Anais do XVII Encontro Nacional de Inteligˆencia Artificial e Computacional. SBC, 2020, pp. 1–12, sociedade Brasileira de Computac¸ ˜ao - SBC.

H. Haddad, H. Mulki, and A. Oueslati, “T-HSAB: A Tunisian Hate Speech and Abusive Dataset,” vol. 1108, 2019, pp. 251–263. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85075563504&doi=10.1007

H. Margono, X. Yi, and G. K. Raikundalia, “Mining Indonesian cyber bullying patterns in social networks,” in Proceedings of the Thirty-Seventh Australasian Computer Science Conference, vol. 147, 2014, pp. 115–124.

P. Ristoski, C. Bizer, and H. Paulheim, “Mining the web of linked data with RapidMiner,” Journal of Web Semantics, vol. 35, pp. 142–151, 2015.

J. Santos-Pereira, L. Gruenwald, and J. Bernardino, “Top data mining tools for the healthcare industry,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 8, pp. 4968–4982, 2022.

M. Z. Naser, “Machine learning for all! Benchmarking automated, explainable, and coding-free platforms on civil and environmental engineering problems,” Journal of Infrastructure Intelligence and Resilience, vol. 2, no. 1, pp. 1–15, 2023.

E. D. Madyatmadja, D. J. M. Sembiring, S. M. B. P. Angin, D. Ferdy, and J. F. Andry, “Big data in educational institutions using RapidMiner to predict learning effectiveness,” Journal of Computer Science, vol. 17, no. 4, pp. 403–413, 2021.

A. Perera and P. Fernando, “Accurate cyberbullying detection and prevention on social media,” Procedia Computer Science, vol. 181, pp. 605–611, 2021.

J. Han, M. Kamber, and J. Pei, “Chapter 6: Mining frequent patterns, associations, and correlations: Basic concepts and methods,” in Data mining: Concepts and techniques. Morgan Kaufmann, 2011, pp. 243–278.

R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 20th International Conference on Very Large Data Bases, (VLDB’94). Santiago de Chile, Chile: Morgan Kaufmann, 1994.

A. A. Amer and H. I. Abdalla, “A set theory based similarity measure for text clustering and classification,” Journal of Big Data, vol. 7, pp. 1–43, 2020.

J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986.

L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.

T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.

K. Reynolds, A. Kontostathis, and L. Edwards, “Using machine learning to detect cyberbullying,” in 2011 10th International Conference on Machine Learning and Applications and Workshops. Honolulu, USA: IEEE, 2011, pp. 241–244.

B. A. Talpur and D. O’Sullivan, “Cyberbullying severity detection: A machine learning approach,” PLoS ONE, vol. 15, no. 10, pp. 1–19, 2020.

M. Alzaqebah, G. M. Jaradat, D. Nassan, R. Alnasser, M. K. Alsmadi, I. Almarashdeh, S. Jawarneh, M. Alwohaibi, N. A. Al-Mulla, N. Alshehab, and S. Alkhushayni, “Cyberbullying detection framework for short and imbalanced Arabic datasets,” Journal of King Saud University-Computer and Information Sciences, vol. 35, no. 8, pp. 1–11, 2023.

J. Han, M. Kamber, and J. Pei, “Chapter 9: Classification: Advanced methods,” in Data mining: Concepts and techniques. Morgan Kaufmann, 2012, pp. 393–442.

G. H. John and P. Langley, “Estimating continuous distributions in Bayesian classifiers,” in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995), 1995, pp. 338–345.

J. Han, M. Kamber, and J. Pei, “Chapter 8: Classification: Basic concepts,” in Data mining: Concepts and techniques. Morgan Kaufmann, 2012, pp. 327–391.

M. Raj, S. Singh, K. Solanki, and R. Selvanambi, “An application to detect cyberbullying using machine learning and deep learning techniques,” SN Computer Science, vol. 3, pp. 1–13, 2022.

B. Ogunleye and B. Dharmaraj, “The use of a large language model for cyberbullying detection,” Analytics, vol. 2, no. 3, pp. 694–707, 2023.

Q. Huang, V. K. Singh, and P. K. Atrey, “On cyberbullying incidents and underlying online social relationships,” Journal of Computational Social Science, vol. 1, pp. 241–260, 2018.

A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text classification,” in AAAI-98 Workshop on Learning for Text Categorization, vol. 752, no. 1. Madison, WI, 1998, pp. 41–48.

S. R. Safavian and D. Landgrebe, “A survey of Decision Tree classifier methodology,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660–674, 1991.

D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning algorithms,” Machine Learning, vol. 6, pp. 37–66, 1991.

Statista, “Leading countries based on number of X (formerly Twitter) users as of January 2024 (in million).” [Online]. Available: https://shorturl.at/rSdj7

A. Bozyi˘git, S. Utku, and E. Nasibo˘glu, “Cyberbullying detection by using artificial neural network models,” in 2019 4th International Conference on Computer Science and Engineering (UBMK). Samsun, Turkey: IEEE, 2019, pp. 520–524.

Downloads

Published

2024-09-10
Abstract 139  .
PDF downloaded 111  .