Deciphering Digital Discourse: Detecting Cyberbullying Patterns in Filipino Tweets Using Machine Learning

January F. Naga; Rabby Q. Lavilles

doi:10.21512/commit.v18i2.11094

Authors

January F. Naga MSU-Iligan Institute of Technology
Rabby Q. Lavilles MSU-Iligan Institute of Technology

DOI:

https://doi.org/10.21512/commit.v18i2.11094

Keywords:

Digital Discourse, Cyberbullying Patterns, Social Media, Machine Learning

Abstract

The research addresses the escalating challenge of cyberbullying in the Philippines, a concern magnified by widespread social media use. A dataset of 146,661 tweets is analyzed using a pre-trained natural language processing model tailored to detect derogatory Filipino terms. The methodology is designed to preprocess data for clarity and analyze derogatory phrases, using the 23 key terms to indicate cyberbullying. Through quantitative analysis, specific patterns of derogatory term co-occurrence are uncovered. The research specifically focuses on Filipino digital discourse, uncovering patterns of derogatory language usage, which is unique to this context. Combining data mining and machine learning techniques, including Frequent Pattern (FP)-growth for pattern identification, cosine similarity for phrase correlation, and classification technique, the research achieves an accuracy rate of 97.91%. To assess the modelâ€™s reliability and precision, a 10-fold cross-validation is utilized. Moreover, by examining specific tweets, the analysis highlights the alignment between automated classifications and human judgment. The co-occurrence of derogatory terms, identified through methods like FP-growth and cosine similarity, reveals underlying cyberbullying narratives that are not immediately obvious. This approach validates the high accuracy of the models and emphasizes the importance of a comprehensive framework for detecting cyberbullying in a linguistically and culturally specific context. The findings substantiate the effectiveness of the targeted approach, providing essential insights for developing cyberbullying prevention strategies. Furthermore, the research enriches the literature on digital discourse analysis and online harassment prevention by addressing cyberbullying patterns and behaviors. Importantly, the research offers valuable guidance for policymakers in crafting more effective online safety measures in the Philippines.

Dimensions

Plum Analytics

Author Biographies

January F. Naga, MSU-Iligan Institute of Technology

Department of Information Technology, College of Computer Studies

Rabby Q. Lavilles, MSU-Iligan Institute of Technology

Department of Information Technology, College of Computer Studies

References

K. D. PeËœna, â€œPH social media craze: 77% of Filipinos more engaging online than in real life,â€ 2023. [Online]. Available: https://shorturl.at/Co5xS

S. Kemp, â€œDigital 2023: The Philippines.â€ [Online]. Available: https://datareportal.com/reports/digital-2023-philippines

Statista, â€œSocial media in the Philippines - Statistics & facts.â€ [Online]. Available: https://www.statista.com/topics/6759/social-media-usage-in-the-philippines/#topicOverview

UNICEF, â€œOnline bullying remains prevalent in the Philippines, other countries,â€ September 2019. [Online]. Available: https://shorturl.at/Gawld

K. H. Chan, Tommy, C. M. K. Cheung, and Z. W. Y. Lee, â€œCyberbullying on social networking sites: A literature review and future research directions,â€ vol. 58, no. 2, pp. 1â€“16, 2021.

C. Van Hee, G. Jacobs, C. Emmery, B. Desmet, E. Lefever, B. Verhoeven, G. De Pauw, W. Daelemans, and V. Hoste, â€œAutomatic detection of cyberbullying in social media text,â€ PloS ONE, vol. 13, no. 10, pp. 1â€“22, 2018.

J. Li, Y. Wu, and T. Hesketh, â€œInternet use and cyberbullying: Impacts on psychosocial and psychosomatic wellbeing among Chinese adolescents,â€ Computers in Human Behavior, vol. 138, pp. 1â€“10, 2023.

R. Garett, L. R. Lord, and S. D. Young, â€œAssociations between social media and cyberbullying: A review of the literature,â€ Mhealth, vol. 2, pp. 1â€“7, 2016.

R. Lokeshkumar, O. A. Mishra, and S. Kalra, â€œSocial media data analysis to predict mental state of users using machine learning techniques,â€ Journal of Education and Health Promotion, vol. 10, pp. 1â€“23, 2021.

C. Zachlod, O. Samuel, A. Ochsner, and S. WerthmÂ¨uller, â€œAnalytics of social media data â€“ State of characteristics and application,â€ Journal of Business Research, vol. 144, pp. 1064â€“1076, 2022.

A. Dewani, M. A. Memon, and S. Bhatti, â€œCyberbullying detection: Advanced preprocessing techniques & deep learning architecture for Roman Urdu data,â€ Journal of Big Data, vol. 8, pp. 1â€“20, 2021.

M. A. Al-Garadi, M. R. Hussain, N. Khan, G. Murtaza, H. F. Nweke, I. Ali, G. Mujtaba, H. Chiroma, H. A. Khattak, and A. Gani, â€œPredicting cyber bullying on social media in the big data era using machine learning algorithms: Review of literature and open challenges,â€ IEEE Access, vol. 7, pp. 70 701â€“70 718, 2019.

V. Banerjee, J. Telavane, P. Gaikwad, and P. Vartak, â€œDetection of cyberbullying using deep neural network,â€ in 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS). Coimbatore, India: IEEE, 2019, pp. 604â€“607.

Y. Yadav, P. Bajaj, R. K. Gupta, and R. Sinha, â€œA comparative study of deep learning methods for hate speech and offensive language detection in textual data,â€ in 2021 IEEE 18th India Council International Conference (INDICON). Guwahati, India: IEEE, 2021, pp. 1â€“6.

M. Di Capua, E. Di Nardo, and A. Petrosino, â€œUnsupervised cyber bullying detection in social networks,â€ in 2016 23rd International conference on pattern recognition (ICPR). Cancun, Mexico: IEEE, 2016, pp. 432â€“437.

Noviantho, S. M. Isa, and L. Ashianti, â€œCyberbullying classification using text mining,â€ in 2017 1st International Conference on Informatics and Computational Sciences (ICICoS). Semarang, Indonesia: IEEE, 2017, pp. 241â€“246.

M. Dehghani, D. T. Dehkordy, and M. Bahrani, â€œAbusive words detection in Persian tweets using machine learning and deep learning techniques,â€ in 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS). Tehran, Islamic Republic of Iran: IEEE, 2021, pp. 1â€“5.

O. Oriola and E. KotzÂ´e, â€œImproved semisupervised learning technique for automatic detection of South African abusive language on Twitter,â€ South African Computer Journal, vol. 32, no. 2, pp. 56â€“79, 2020.

A. BozyiË˜git, S. Utku, and E. Nasibov, â€œCyberbullying detection: Utilizing social media features,â€ Expert Systems with Applications, vol. 179, 2021.

F. Beyhan, B. ArÄ±k, I. Arin, A. Terzioglu, B. Yanikoglu, and R. Yeniterzi, â€œA Turkish hate speech dataset and detection system,â€ in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022, pp. 4177â€“4185.

S. T. Luu, K. Van Nguyen, and N. L. T. Nguyen, â€œImpacts of transformer-based language models and imbalanced data for hate speech detection on Vietnamese social media texts,â€ Research Square Platform, 2022.

A. Silva and N. Roman, â€œHate speech detection in Portuguese with NaÂ¨Ä±ve Bayes, SVM, MLP and Logistic Regression,â€ in Anais do XVII Encontro Nacional de InteligË†encia Artificial e Computacional. SBC, 2020, pp. 1â€“12, sociedade Brasileira de ComputacÂ¸ Ëœao - SBC.

H. Haddad, H. Mulki, and A. Oueslati, â€œT-HSAB: A Tunisian Hate Speech and Abusive Dataset,â€ in International Conference on Arabic Language Processing, vol. 1108, 2019, pp. 251â€“263.

H. Margono, X. Yi, and G. K. Raikundalia, â€œMining Indonesian cyber bullying patterns in social networks,â€ in Proceedings of the Thirty-Seventh Australasian Computer Science Conference, vol. 147, 2014, pp. 115â€“124.

P. Ristoski, C. Bizer, and H. Paulheim, â€œMining the web of linked data with RapidMiner,â€ Journal of Web Semantics, vol. 35, pp. 142â€“151, 2015.

J. Santos-Pereira, L. Gruenwald, and J. Bernardino, â€œTop data mining tools for the healthcare industry,â€ Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 8, pp. 4968â€“4982, 2022.

M. Z. Naser, â€œMachine learning for all! Benchmarking automated, explainable, and coding-free platforms on civil and environmental engineering problems,â€ Journal of Infrastructure Intelligence and Resilience, vol. 2, no. 1, pp. 1â€“15, 2023.

E. D. Madyatmadja, D. J. M. Sembiring, S. M. B. P. Angin, D. Ferdy, and J. F. Andry, â€œBig data in educational institutions using RapidMiner to predict learning effectiveness,â€ Journal of Computer Science, vol. 17, no. 4, pp. 403â€“413, 2021.

A. Perera and P. Fernando, â€œAccurate cyberbullying detection and prevention on social media,â€ Procedia Computer Science, vol. 181, pp. 605â€“611, 2021.

J. Han, M. Kamber, and J. Pei, â€œChapter 6: Mining frequent patterns, associations, and correlations: Basic concepts and methods,â€ in Data mining: Concepts and techniques. Morgan Kaufmann, 2011, pp. 243â€“278.

R. Agrawal and R. Srikant, â€œFast algorithms for mining association rules,â€ in Proceedings of the 20th International Conference on Very Large Data Bases, (VLDBâ€™94). Santiago de Chile, Chile: Morgan Kaufmann, 1994.

A. A. Amer and H. I. Abdalla, â€œA set theory based similarity measure for text clustering and classification,â€ Journal of Big Data, vol. 7, pp. 1â€“43, 2020.

J. R. Quinlan, â€œInduction of decision trees,â€ Machine Learning, vol. 1, pp. 81â€“106, 1986.

L. Breiman, â€œRandom forests,â€ Machine Learning, vol. 45, pp. 5â€“32, 2001.

T. Cover and P. Hart, â€œNearest neighbor pattern classification,â€ IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21â€“27, 1967.

K. Reynolds, A. Kontostathis, and L. Edwards, â€œUsing machine learning to detect cyberbullying,â€ in 2011 10th International Conference on Machine Learning and Applications and Workshops. Honolulu, USA: IEEE, 2011, pp. 241â€“244.

B. A. Talpur and D. Oâ€™Sullivan, â€œCyberbullying severity detection: A machine learning approach,â€ PLoS ONE, vol. 15, no. 10, pp. 1â€“19, 2020.

M. Alzaqebah, G. M. Jaradat, D. Nassan, R. Alnasser, M. K. Alsmadi, I. Almarashdeh, S. Jawarneh, M. Alwohaibi, N. A. Al-Mulla, N. Alshehab, and S. Alkhushayni, â€œCyberbullying detection framework for short and imbalanced Arabic datasets,â€ Journal of King Saud University-Computer and Information Sciences, vol. 35, no. 8, pp. 1â€“11, 2023.

J. Han, M. Kamber, and J. Pei, â€œChapter 9: Classification: Advanced methods,â€ in Data mining: Concepts and techniques. Morgan Kaufmann, 2012, pp. 393â€“442.

G. H. John and P. Langley, â€œEstimating continuous distributions in Bayesian classifiers,â€ in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995), 1995, pp. 338â€“345.

J. Han, M. Kamber, and J. Pei, â€œChapter 8: Classification: Basic concepts,â€ in Data mining: Concepts and techniques. Morgan Kaufmann, 2012, pp. 327â€“391.

M. Raj, S. Singh, K. Solanki, and R. Selvanambi, â€œAn application to detect cyberbullying using machine learning and deep learning techniques,â€ SN Computer Science, vol. 3, pp. 1â€“13, 2022.

B. Ogunleye and B. Dharmaraj, â€œThe use of a large language model for cyberbullying detection,â€ Analytics, vol. 2, no. 3, pp. 694â€“707, 2023.

Q. Huang, V. K. Singh, and P. K. Atrey, â€œOn cyberbullying incidents and underlying online social relationships,â€ Journal of Computational Social Science, vol. 1, pp. 241â€“260, 2018.

A. McCallum and K. Nigam, â€œA comparison of event models for Naive Bayes text classification,â€ in AAAI-98 Workshop on Learning for Text Categorization, vol. 752, no. 1. Madison, WI, 1998, pp. 41â€“48.

S. R. Safavian and D. Landgrebe, â€œA survey of Decision Tree classifier methodology,â€ IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660â€“674, 1991.

D. W. Aha, D. Kibler, and M. K. Albert, â€œInstance-based learning algorithms,â€ Machine Learning, vol. 6, pp. 37â€“66, 1991.

Statista, â€œLeading countries based on number of X (formerly Twitter) users as of January 2024 (in million).â€ [Online]. Available: https://shorturl.at/rSdj7

A. BozyiË˜git, S. Utku, and E. NasiboË˜glu, â€œCyberbullying detection by using artificial neural network models,â€ in 2019 4th International Conference on Computer Science and Engineering (UBMK). Samsun, Turkey: IEEE, 2019, pp. 520â€“524.