K-Means Clustering to Identity Twitter Build Operate Transfer (BOT) on Influential Accounts

M. Khairul Anam; Ike Yunia Pasa; Kartina Diah Kusuma Wardhani; Lusiana Efrizoni; Muhammad Bambang Firdaus

doi:10.21512/comtech.v14i2.10620

Authors

M. Khairul Anam STMIK Amik Riau
Ike Yunia Pasa Universitas Muhammadiyah Purworejo
Kartina Diah Kusuma Wardhani Politeknik Caltex Riau
Lusiana Efrizoni STMIK Amik Riau
Muhammad Bambang Firdaus Universitas Mulawarman

DOI:

https://doi.org/10.21512/comtech.v14i2.10620

Keywords:

K-Means clustering, Twitter accounts, Build Operate Transfer (BOT), influential accounts

Abstract

Twitter is a popular social media with hundreds of millions of users, but some are not human. About 48 million accounts are created by Build Operate Transfer (BOT), which represents up to 15% of all accounts. BOTs are created for various purposes, one of which is to post information about news automatically. However, BOTs have also been abused, such as spreading hoaxes or influencing public perception of a topic. The research aimed to determine which Twitter accounts were identified as BOT accounts based on predefined attributes. The research used tweet data from 213 Twitter accounts. The accounts used as test data were accounts that had influence. After that, the data were clustered using k-means using the attributes of retweets + replies count, followers count, account age, friends count, status count, digits count in name, username length, name similarity, name ratio, and likes count. The results show the optimal number of clustering at k = 3 on the Sum of Squared Errors (SSE) evaluation and the Elbow method and the best quality and cluster power at k = 2 on the silhouette coefficient. It shows that the clustered accounts with the highest number of members on each attribute are places for accounts with high BOT scores from several aspects of the BOT score type.

Dimensions

Plum Analytics

Author Biographies

M. Khairul Anam, STMIK Amik Riau

Department of Informatics Engineering

Ike Yunia Pasa, Universitas Muhammadiyah Purworejo

Department of Information Technology, Faculty Engineering

Kartina Diah Kusuma Wardhani, Politeknik Caltex Riau

Department of Informatics Engineering

Lusiana Efrizoni, STMIK Amik Riau

Department of Informatics Engineering

Muhammad Bambang Firdaus, Universitas Mulawarman

Department of Informatics, Faculty of Engineering

References

Al-Rawi, A., & Shukla, V. (2020). Bots as active news promoters: A digital analysis of COVID-19 tweets. Information, 11(10), 1â€“13. https://doi.org/10.3390/info11100461

Anwar, A., & Yaqub, U. (2020). Bot detection in Twitter landscape using unsupervised learning. In The 21st Annual International Conference on Digital Government Research (pp. 329â€“330). https://doi.org/10.1145/3396956.3401801

Arora, P., Deepali, & Varshney, S. (2016). Analysis of K-Means and K-Medoids algorithm for big data. Procedia Computer Science, 78, 507â€“512. https://doi.org/10.1016/j.procs.2016.02.095

Bessi, A., & Ferrara, E. (2016). Social bots distort the 2016 U.S. presidential election online discussion. First Monday, 21(11). https://doi.org/https://doi.org/10.5210/fm.v21i11.7090

Bhatt, T., Kumar, V., Pande, S., Malik, R., Khamparia, A., & Gupta, D. (2021). A review on COVID-19. In F. Al-Turjman (Eds.), Artificial intelligence and machine learning for COVID-19. Studies in computational intelligence (pp. 25â€“42). https://doi.org/10.1007/978-3-030-60188-1_2

Cahyo, P. W., & Sudarmana, L. (2022). A comparison of K-Means and Agglomerative clustering for users segmentation based on question answerer reputation in Brainly platform. Elinvo (Electronics, Informatics, and Vocational Education), 6(2), 166â€“173. https://doi.org/10.21831/elinvo.v6i2.44486

Dwiarni, B. A., & Setiyono, B. (2019). Akuisisi dan clustering data sosial media menggunakan algoritma K-Means sebagai dasar untuk mengetahui profil pengguna. Jurnal Sains dan Seni, 8(2), A65â€“A70. https://doi.org/10.12962/j23373520.v8i2.49815

Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96â€“104. https://doi.org/10.1145/2818717

Fu, Q., Feng, B., Guo, D., & Li, Q. (2018). Combating the evolving spammers in online social networks. Computers & Security, 72, 60â€“73. https://doi.org/10.1016/j.cose.2017.08.014

Fuad, M., Rochman, E. M. S., & Rachmad, A. (2022). Salt commodity data clustering using Fuzzy C-Means. Journal of Physics: Conference Series, 2406, 1â€“9. https://doi.org/10.1088/1742-6596/2406/1/012025

Gilani, Z., Wang, L., Crowcroft, J., Almeida, M., & Farahbakhsh, R. (2016). Stweeler: A framework for Twitter bot analysis. Proceedings of the 25th International Conference Companion on World Wide Web (pp. 37â€“38). https://doi.org/10.1145/2872518.2889360

Himelein-Wachowiak, M., Giorgi, S., Devoto, A., Rahman, M., Ungar, L., Schwartz, H. A., ... & Curtis, B. (2021). Bots and misinformation spread on social media: Implications for COVID-19. Journal of Medical Internet Research, 23(5), 1â€“11. https://doi.org/10.2196/26933

Inuwa-Dutse, I., Liptrott, M., & Korkontzelos, I. (2018). Detection of spam-posting accounts on Twitter. Neurocomputing, 315, 496â€“511. https://doi.org/10.1016/j.neucom.2018.07.044

Ji, Y., He, Y., Jiang, X., Cao, J., & Li, Q. (2016). Combating the evasion mechanisms of social bots. Computers & Security, 58, 230â€“249. https://doi.org/10.1016/j.cose.2016.01.007

Kartino, A., M. Khairul Anam, Rahmaddeni, & Junadhi. (2021). Analisis akun Twitter berpengaruh terkait COVID-19 menggunakan Social Network Analysis. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(4), 697â€“704. https://doi.org/10.29207/resti.v5i4.3160

Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., & Malik, S. H. (2022). Detecting Twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques. International Journal of Information Management Data Insights, 2(2), 1â€“13. https://doi.org/10.1016/j.jjimei.2022.100120

Kontsewaya, Y., Antonov, E., & Artamonov, A. (2021). Evaluating the effectiveness of machine learning methods for spam detection. Procedia Computer Science, 190, 479â€“486. https://doi.org/10.1016/j.procs.2021.06.056

KuÅ¡en, E., & Strembeck, M. (2019). Something draws near, I can feel it: An analysis of human and bot emotion-exchange motifs on Twitter. Online Social Networks and Media, 10â€“11, 1â€“17. https://doi.org/10.1016/j.osnem.2019.04.001

Liu, X. (2019). A big data approach to examining social bots on Twitter. Journal of Services Marketing, 33(4), 369â€“379. https://doi.org/10.1108/JSM-02-2018-0049

Nainggolan, R., Perangin-Angin, R., Simarmata, E., & Tarigan, A. F. (2019). Improved the performance of the K-Means cluster using the Sum of Squared Error (SSE) optimized by using the Elbow method. Journal of Physics: Conference Series, 1361, 1â€“6. https://doi.org/10.1088/1742-6596/1361/1/012015

Orabi, M., Mouheb, D., Al Aghbari, Z., & Kamel, I. (2020). Detection of bots in social media: A systematic review. Information Processing & Management, 57(4), 1â€“23. https://doi.org/10.1016/j.ipm.2020.102250

Parlika, R., & Pratama, A. (2020). The online test application uses Telegram bots version 1.0. Journal of Physics: Conference Series, 1569, 1â€“7. https://doi.org/10.1088/1742-6596/1569/2/022042

Perdana, R. S., Muliawati, T. H., & Alexandro, R. (2015). Bot spammer detection in Twitter using tweet similarity and time interval entropy. Jurnal Ilmu Komputer dan Informasi, 8(1), 19â€“25.

Ramalingaiah, A., Hussaini, S., & Chaudhari, S. (2021). Twitter bot detection using supervised machine learning. Journal of Physics: Conference Series, 1950, 1â€“11. https://doi.org/10.1088/1742-6596/1950/1/012006

Reski, F. Z. E., & Rizal, Y. (2023). Implementation of the Partitioning Around Medoids (PAM) clustering method on poor population data in West Sumatera. Rangkiang Mathematics Journal, 2(1), 18â€“24. https://doi.org/10.24036/rmj.v2i1.26

Riquelme, F., & GonzÃ¡lez-Cantergiani, P. (2016). Measuring user influence on Twitter: A survey. Information Processing & Management, 52(5), 949â€“975. https://doi.org/10.1016/j.ipm.2016.04.003

Sarasvananda, I. B. G., Wardoyo, R., & Sari, A. K. (2019). The K-Means clustering algorithm with semantic similarity to estimate the cost of hospitalization. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(4), 313â€“322. https://doi.org/10.22146/ijccs.45093

Wibowo, D. W., Yunshasnawa, Y., Setiawan, A., Rohadi, E., & Khabibi, M. K. (2019). Application of K-Medoids clustering method for grouping corn plants based on productivity, production, and area of land in East Java. Journal of Physics: Conference Series, 1402, 1â€“5. https://doi.org/10.1088/1742-6596/1402/7/077061

Yang, K. C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., & Menczer, F. (2019). Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies, 1, 48â€“61. https://doi.org/10.1002/hbe2.115

Yin, L., Li, M., Chen, H., & Deng, W. (2022). An improved hierarchical clustering algorithm based on the idea of population reproduction and fusion. Electronics, 11(17), 1â€“19. https://doi.org/10.3390/electronics11172735

Zhang, M. (2019). Use Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to identify galaxy cluster members. In IOP Conference Series: Earth and Environmental Science. https://doi.org/10.1088/1755-1315/252/4/042033

Zubair, M., Iqbal, M. A., Shil, A., Chowdhury, M. J. M., Moni, M. A., & Sarker, I. H. (2022). An improved K-means clustering algorithm towards an efficient data-driven modeling. Annals of Data Science. https://doi.org/10.1007/s40745-022-00428-2