Classifying Viral Twitter with Transformer Models and Multi-Layer Perceptron

Jeffrey Junior Tedjasulaksana; Alexander Agung Santoso Gunawan

doi:10.21512/emacsjournal.v7i1.11580

Authors

Jeffrey Junior Tedjasulaksana Bina Nusantara University
Alexander Agung Santoso Gunawan Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v7i1.11580

Keywords:

Cost-Sensitive, Multi-Layer Perceptron, Twitter, Virality Classification, XLNet

Abstract

The classification of virality levels in Indonesian tweets is explored in this research using advanced natural language processing techniques and machine learning algorithms. Transformer models such as RoBERTa for sentiment analysis and XLNet for text embedding, alongside Multi-Layer Perceptron (MLP) classifiers, are leveraged to address the challenge of predicting tweet virality. Emotion features are incorporated, and cost-sensitive methods for handling class imbalance are implemented, resulting in robust performance demonstrated by our model. Intriguing correlations between tweet sentiment, emotion distribution, and virality levels are uncovered through sentiment analysis and emotion detection. The efficacy of XLNet in capturing contextual nuances, outperforming BERTweet, is highlighted by our findings. Furthermore, the integration of emotion features and cost-sensitive methods enhances the model's predictive accuracy, offering valuable insights for marketers and businesses seeking to optimize their social media strategies. The proposed model achieves an accuracy of 95% and an F1-Score of 59%.

Dimensions

Plum Analytics

Author Biographies

Jeffrey Junior Tedjasulaksana, Bina Nusantara University

Computer Science Department, BINUS Graduate Program - Master of Computer Science,

Alexander Agung Santoso Gunawan, Bina Nusantara University

Computer Science Department, School of Computer Science

References

Aichner, T., GrÃ¼nfelder, M., Maurer, O., & Jegeni, D. (2021). Twenty-Five Years of Social Media: A Review of Social Media Applications and Definitions from 1994 to 2019. In Cyberpsychology, Behavior, and Social Networking (Vol. 24, Issue 4, pp. 215â€“222). Mary Ann Liebert Inc. https://doi.org/10.1089/cyber.2020.0134

Amitani, R., Matsumoto, K., Yoshida, M., & Kita, K. (2021). Buzz tweet classification based on text and image features of tweets using multi-task learning. Applied Sciences (Switzerland), 11(22). https://doi.org/10.3390/app112210567

Devlin, J., Chang, M.-W., Lee, K., Google, K. T., & Language, A. I. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 4171â€“4186. https://github.com/tensorflow/tensor2tensor

Karami, A., Lundy, M., Webb, F., & Dwivedi, Y. K. (2020). Twitter and Research: A Systematic Literature Review through Text Mining. IEEE Access, 8, 67698â€“67717. https://doi.org/10.1109/ACCESS.2020.2983656

Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713â€“3744. https://doi.org/10.1007/s11042-022-13428-4

Li, H., Choi, J., Lee, S., & Ho Ahn, J. (2020). Comparing BERT and XLNet from the Perspective of Computational Characteristics. 2020 International Conference on Electronics, Information, and Communication (ICEIC).

Nanath, K., & Joy, G. (2023). Leveraging Twitter data to analyze the virality of Covid-19 tweets: a text mining approach. Behaviour and Information Technology, 42(2), 196â€“214. https://doi.org/10.1080/0144929X.2021.1941259

Prasongko, Y., Girsang, A., Yayogi, A., & Prasetyo, D. (2023). Prediction of Crime Virality by Indonesia National Police on Social Media. Media Bina Ilmiah, 17.

Qu, Z., & Ding, Z. (2020). Predicting the retweet level of covid-19 tweets with neural network classifier. Proceedings of 2020 IEEE 19th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2020, 15â€“20. https://doi.org/10.1109/ICCICC50026.2020.9450271

Rameez, R., Rahmani, H. A., & Yilmaz, E. (2022). ViralBERT: A User Focused BERT-Based Approach to Virality Prediction. UMAP2022 - Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization, 85â€“89. https://doi.org/10.1145/3511047.3536415

Shang, Y., Zhou, B., Zeng, X., Wang, Y., Yu, H., & Zhang, Z. (2022). Predicting the Popularity of Online Content by Modeling the Social Influence and Homophily Features. Frontiers in Physics, 10. https://doi.org/10.3389/fphy.2022.915756

Statista. (2024, April). Leading countries based on number of X (formerly Twitter) users as of April 2024. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/

Vrontis, D., Makrides, A., Christofi, M., & Thrassou, A. (2021). Social media influencer marketing: A systematic review, integrative framework and future research agenda. International Journal of Consumer Studies, 45(4), 617â€“644. https://doi.org/10.1111/ijcs.12647

Zadeh, A., & Sharda, R. (2022). How Can Our Tweets Go Viral? Point-Process Modelling of Brand Content. Information & Management, 59(2), 103594. https://doi.org/10.1016/j.im.2022.103594

Zhang, C., Tan, K. C., Li, H., & Hong, G. S. (2019). A Cost-Sensitive Deep Belief Network for Imbalanced Classification. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 109â€“122. https://doi.org/10.1109/TNNLS.2018.2832648

Zhu, B., Baesens, B., & vanden Broucke, S. K. L. M. (2017). An empirical comparison of techniques for the class imbalance problem in churn prediction. Information Sciences, 408, 84â€“99. https://doi.org/10.1016/j.ins.2017.04.015