Classifying Viral Twitter with Transformer Models and Multi-Layer Perceptron
DOI:
https://doi.org/10.21512/emacsjournal.v7i1.11580Keywords:
Cost-Sensitive, Multi-Layer Perceptron, Twitter, Virality Classification, XLNetAbstract
The classification of virality levels in Indonesian tweets is explored in this research using advanced natural language processing techniques and machine learning algorithms. Transformer models such as RoBERTa for sentiment analysis and XLNet for text embedding, alongside Multi-Layer Perceptron (MLP) classifiers, are leveraged to address the challenge of predicting tweet virality. Emotion features are incorporated, and cost-sensitive methods for handling class imbalance are implemented, resulting in robust performance demonstrated by our model. Intriguing correlations between tweet sentiment, emotion distribution, and virality levels are uncovered through sentiment analysis and emotion detection. The efficacy of XLNet in capturing contextual nuances, outperforming BERTweet, is highlighted by our findings. Furthermore, the integration of emotion features and cost-sensitive methods enhances the model's predictive accuracy, offering valuable insights for marketers and businesses seeking to optimize their social media strategies. The proposed model achieves an accuracy of 95% and an F1-Score of 59%.
Plum Analytics
References
Aichner, T., Grünfelder, M., Maurer, O., & Jegeni, D. (2021). Twenty-Five Years of Social Media: A Review of Social Media Applications and Definitions from 1994 to 2019. In Cyberpsychology, Behavior, and Social Networking (Vol. 24, Issue 4, pp. 215–222). Mary Ann Liebert Inc. https://doi.org/10.1089/cyber.2020.0134
Amitani, R., Matsumoto, K., Yoshida, M., & Kita, K. (2021). Buzz tweet classification based on text and image features of tweets using multi-task learning. Applied Sciences (Switzerland), 11(22). https://doi.org/10.3390/app112210567
Devlin, J., Chang, M.-W., Lee, K., Google, K. T., & Language, A. I. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 4171–4186. https://github.com/tensorflow/tensor2tensor
Karami, A., Lundy, M., Webb, F., & Dwivedi, Y. K. (2020). Twitter and Research: A Systematic Literature Review through Text Mining. IEEE Access, 8, 67698–67717. https://doi.org/10.1109/ACCESS.2020.2983656
Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713–3744. https://doi.org/10.1007/s11042-022-13428-4
Li, H., Choi, J., Lee, S., & Ho Ahn, J. (2020). Comparing BERT and XLNet from the Perspective of Computational Characteristics. 2020 International Conference on Electronics, Information, and Communication (ICEIC).
Nanath, K., & Joy, G. (2023). Leveraging Twitter data to analyze the virality of Covid-19 tweets: a text mining approach. Behaviour and Information Technology, 42(2), 196–214. https://doi.org/10.1080/0144929X.2021.1941259
Prasongko, Y., Girsang, A., Yayogi, A., & Prasetyo, D. (2023). Prediction of Crime Virality by Indonesia National Police on Social Media. Media Bina Ilmiah, 17.
Qu, Z., & Ding, Z. (2020). Predicting the retweet level of covid-19 tweets with neural network classifier. Proceedings of 2020 IEEE 19th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2020, 15–20. https://doi.org/10.1109/ICCICC50026.2020.9450271
Rameez, R., Rahmani, H. A., & Yilmaz, E. (2022). ViralBERT: A User Focused BERT-Based Approach to Virality Prediction. UMAP2022 - Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization, 85–89. https://doi.org/10.1145/3511047.3536415
Shang, Y., Zhou, B., Zeng, X., Wang, Y., Yu, H., & Zhang, Z. (2022). Predicting the Popularity of Online Content by Modeling the Social Influence and Homophily Features. Frontiers in Physics, 10. https://doi.org/10.3389/fphy.2022.915756
Statista. (2024, April). Leading countries based on number of X (formerly Twitter) users as of April 2024. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Vrontis, D., Makrides, A., Christofi, M., & Thrassou, A. (2021). Social media influencer marketing: A systematic review, integrative framework and future research agenda. International Journal of Consumer Studies, 45(4), 617–644. https://doi.org/10.1111/ijcs.12647
Zadeh, A., & Sharda, R. (2022). How Can Our Tweets Go Viral? Point-Process Modelling of Brand Content. Information & Management, 59(2), 103594. https://doi.org/10.1016/j.im.2022.103594
Zhang, C., Tan, K. C., Li, H., & Hong, G. S. (2019). A Cost-Sensitive Deep Belief Network for Imbalanced Classification. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 109–122. https://doi.org/10.1109/TNNLS.2018.2832648
Zhu, B., Baesens, B., & vanden Broucke, S. K. L. M. (2017). An empirical comparison of techniques for the class imbalance problem in churn prediction. Information Sciences, 408, 84–99. https://doi.org/10.1016/j.ins.2017.04.015
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Engineering, MAthematics and Computer Science Journal (EMACS)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)