The Comparison of Deep Learning Models for Indonesian Political Hoax News Detection

Oktavia Citra Resmi Rachmawati; Zakha Maisat Eka Darmawan

doi:10.21512/commit.v18i2.10929

Authors

Oktavia Citra Resmi Rachmawati Department of Information and Computer Engineering
Zakha Maisat Eka Darmawan Department of Creative Multimedia Technology

DOI:

https://doi.org/10.21512/commit.v18i2.10929

Keywords:

Deep Learning Model, Political Hoax News Detection, Text Classification

Abstract

Indonesia is the worldâ€™s fourth most populous country and has a diverse sociopolitical landscape. Political fake news exacerbates existing social divisions and causes political polarization in Indonesian society. Hence, studying it as a specific challenge can contribute to broader discussions on the impact of fake news in different contexts. The researchers propose a hoax news detection system by developing a deep learning model with various lapses against a data set preprocessed using term-frequency and token filtering to represent the most prominent words in each class. The researchers compare the layers with the potential to have high performance in predicting the falsity of Indonesian political news data by observing the models based on training history plots, model specification, and performance metrics in the classification report module. The deep learning models include One-Dimensional Convolution Neural Networks (1D CNN), Long-Term Short Memory (LSTM), and Gated Recurrent Unit (GRU). The news data are obtained from the Kaggle site, containing 41.726 rows of data. Based on the experiments with the text data that has been preprocessed in the form of vectors and the specific parameters before starting, the results show that GRU achieves the highest performance value in accuracy, recall, precision, and F1 score. Although GRU becomes the model with the smallest file size, it is the slowest model to generate predictions from text news data. It also has a higher potential to be an overfitted model due to parameters than a simple RNN.

Dimensions

Plum Analytics

Author Biographies

Oktavia Citra Resmi Rachmawati, Department of Information and Computer Engineering

The Electronic Engineering Polytechnic Institute of Surabaya

Zakha Maisat Eka Darmawan, Department of Creative Multimedia Technology

The Electronic Engineering Polytechnic Institute of Surabaya

References

S. R. Sahoo and B. B. Gupta, â€œMultiple features based approach for automatic fake news detection on social networks using deep learning,â€ Applied Soft Computing, vol. 100, 2021.

X. Zhou, R. Zafarani, K. Shu, and H. Liu, â€œFake news: Fundamental theories, detection strategies and challenges,â€ in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. Melbourne VIC, Australia: Association for Computing Machinery, Feb. 11â€“15, 2019, pp. 836â€“837.

X. Zhang and A. A. Ghorbani, â€œAn overview of online fake news: Characterization, detection, and discussion,â€ Information Processing & Management, vol. 57, no. 2, 2020.

A. Gelfert, â€œFake news: A definition,â€ Informal Logic, vol. 38, no. 1, pp. 84â€“117, 2018.

T. Duile and S. Tamma, â€œPolitical language and fake news: Some considerations from the 2019 election in Indonesia,â€ Indonesia and the Malay World, vol. 49, no. 143, pp. 82â€“105, 2021.

E. H. Susanto, â€œSocial media, hoax, and threats against diversity in Indonesia,â€ International Journal of Innovation, Creativity and Change, vol. 8, no. 12, pp. 328â€“344, 2019.

T. T. Putri, S. HendryxWarra, I. Y. Sitepu, M. Sihombing, and Silvi, â€œAnalysis and detection of hoax contents in Idonesian news based on machine learning,â€ Journal of Informatic Pelita Nusantara, vol. 4, no. 1, 2019.

B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, â€œHoax analyzer for Indonesian news using deep learning models,â€ Procedia Computer Science, vol. 179, pp. 704â€“712, 2021.

B. Zaman, A. Justitia, K. N. Sani, and E. Purwanti, â€œAn Indonesian hoax news detection system using reader feedback and NaÂ¨Ä±ve Bayes algorithm,â€ Cybernetics and Information Technologies, vol. 20, no. 1, pp. 82â€“94, 2020.

K. Padmanandam, S. P. V. D. S. Bheri, L. Vegesna, and K. Sruthi, â€œA speech recognized dynamic word cloud visualization for text summarization,â€ in 2021 6th International Conference on Inventive Computation Technologies (ICICT). Coimbatore, India: IEEE, Jan. 20â€“22, 2021, pp. 609â€“613.

M. J. Denny and A. Spirling, â€œText preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it,â€ Political Analysis, vol. 26, no. 2, pp. 168â€“189, 2018.

H. Hassani, C. Beneki, S. Unger, M. T. Mazinani, and M. R. Yeganegi, â€œText mining in big data analytics,â€ Big Data and Cognitive Computing, vol. 4, no. 1, pp. 1â€“34, 2020.

M. T. F. Al Islami, A. R. Barakbah, and T. Harsono, â€œSocial media engineering for issues feature extraction using categorization knowledge modelling and rule-based sentiment analysis,â€ JOIV: International Journal on Informatics Visualization, vol. 5, no. 1, pp. 83â€“93, 2021.

L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, â€œText preprocessing for text mining in organizational research: Review and recommendations,â€ Organizational Research Methods, vol. 25, no. 1, pp. 114â€“146, 2022.

M. Alfian, A. R. Barakbah, and I. Winarno, â€œIndonesian online news extraction and clustering using evolving clustering,â€ JOIV: International Journal on Informatics Visualization, vol. 55, no. 3, pp. 280â€“290, 2021.

A. Adeyemo, H. Wimmer, and L. M. Powell, â€œEffects of normalization techniques on logistic regression in data science,â€ Journal of Information Systems Applied Research, vol. 12, no. 2, pp. 37â€“44, 2019.

S. K. R. Koduru, â€œA comprehensive analysis of normalization approaches for privacy protection in data mining,â€ International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), vol. 8, no. 5, pp. 144â€“157, 2022.

N. KÂ¨aming, A. Dawid, K. Kottmann, M. Lewenstein, K. Sengstock, A. Dauphin, and C. Weitenberg, â€œUnsupervised machine learning of topological phase transitions from experimental data,â€ Machine Learning: Science and Technology, vol. 2, pp. 1â€“20, 2021.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Press, 2016.

E. G. Adagbasa, S. A. Adelabu, and T. W. Okello, â€œApplication of deep learning with stratified Kfold for vegetation species discrimation in a protected mountainous region using Sentinel-2 image,â€ Geocarto International, vol. 37, no. 1, pp. 142â€“162, 2022.

H. Ling, C. Qian, W. Kang, C. Liang, and H. Chen, â€œCombination of support vector machine and K-Fold cross validation to predict compressive strength of concrete in marine environment,â€ Construction and Building Materials, vol. 206, pp. 355â€“363, 2019.

W. M. Fatihia, A. Fariza, and T. Karlita, â€œCNN with batch normalization adjustment for offline hand-written signature genuine verification,â€ JOIV: International Journal on Informatics Visualization, vol. 7, no. 1, pp. 200â€“207, 2023.

P. Harrington, Machine learning in action. Simon and Schuster, 2012.

F. Provost and T. Fawcett, Data science for business: What you need to know about data mining and data-analytic thinking. Oâ€™Reilly Media, Inc., 2013.

F. Amin and M. Mahmoud, â€œConfusion matrix in binary classification problems: A step-by-step tutorial,â€ Journal of Engineering Research, vol. 6, no. 5, 2022.

D. Chicco and G. Jurman, â€œThe advantages of the Matthews Correlation Coefficient (MCC) over F1 score and accuracy in binary classification evaluation,â€ BMC Genomics, vol. 21, pp. 1â€“13, 2020.

Y. Widhiyasana, T. Semiawan, I. G. A. Mudzakir, and M. R. Noor, â€œPenerapan Convolutional Long Short-Term Memory untuk klasifikasi teks berita Bahasa Indonesia,â€ Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 10, no. 4, pp. 354â€“361, 2021.

A. Singh, S. K. Dargar, A. Gupta, A. Kumar, A. K. Srivastava, M. Srivastava, P. Kumar Tiwari, and M. A. Ullah, â€œ[Retracted] Evolving long short-term memory network-based text classification,â€ Computational Intelligence and Neuroscience, vol. 2022, no. 1, pp. 1â€“11, 2022.

S. Varsamopoulos, K. Bertels, and C. G. Almudever, â€œDecoding surface code with a distributed neural networkâ€“based decoder,â€ Quantum Machine Intelligence, vol. 2, pp. 1â€“12, 2020.

H. Huan, Z. Guo, T. Cai, and Z. He, â€œA text classification method based on a convolutional and bidirectional long short-term memory model,â€ Connection Science, vol. 34, no. 1, pp. 2108â€“2124, 2022.

T. Zhang and F. You, â€œResearch on short text classification based on TextCNN,â€ Journal of Physics: Conference Series, vol. 1757, pp. 1â€“7, 2021.

S. Nam, H. Park, C. Seo, and D. Choi, â€œForged signature distinction using convolutional neural network for feature extraction,â€ Applied Sciences, vol. 8, no. 2, pp. 1â€“14, 2018.

A. Dutta, S. Kumar, and M. Basu, â€œA gated recurrent unit approach to Bitcoin price prediction,â€ Journal of Risk and Financial Management, vol. 13, no. 2, pp. 1â€“16, 2020.

X. Liu, Z. Lin, and Z. Feng, â€œShort-term offshore wind speed forecast by seasonal ARIMA-A comparison against GRU and LSTM,â€ Energy, vol. 227, 2021.

S. Paguada, L. Batina, I. Buhan, and I. Armendariz, â€œBeing patient and persistent: Optimizing an early stopping strategy for deep learning in profiled attacks,â€ IEEE Transactions on Computers, pp. 1â€“12, 2023.

A. Thakur, M. Gupta, D. K. Sinha, K. K. Mishra, V. K. Venkatesan, and S. Guluwadi, â€œTransformative breast cancer diagnosis using CNNs with optimized ReduceLROnPlateau and early stopping enhancements,â€ International Journal of Computational Intelligence Systems, vol. 17, no. 1, pp. 1â€“18, 2024.

E. Rojas, D. PÂ´erez, J. C. Calhoun, L. B. Gomez, T. Jones, and E. Meneses, â€œUnderstanding soft error sensitivity of deep learning models and frameworks through checkpoint alteration,â€ in 2021 IEEE International Conference on Cluster Computing (CLUSTER). Portland, USA: IEEE, Sept. 7â€“10, 2021, pp. 492â€“503.