K-Means Clustering to Identity Twitter Build Operate Transfer (BOT) on Influential Accounts
DOI:
https://doi.org/10.21512/comtech.v14i2.10620Keywords:
K-Means clustering, Twitter accounts, Build Operate Transfer (BOT), influential accountsAbstract
Twitter is a popular social media with hundreds of millions of users, but some are not human. About 48 million accounts are created by Build Operate Transfer (BOT), which represents up to 15% of all accounts. BOTs are created for various purposes, one of which is to post information about news automatically. However, BOTs have also been abused, such as spreading hoaxes or influencing public perception of a topic. The research aimed to determine which Twitter accounts were identified as BOT accounts based on predefined attributes. The research used tweet data from 213 Twitter accounts. The accounts used as test data were accounts that had influence. After that, the data were clustered using k-means using the attributes of retweets + replies count, followers count, account age, friends count, status count, digits count in name, username length, name similarity, name ratio, and likes count. The results show the optimal number of clustering at k = 3 on the Sum of Squared Errors (SSE) evaluation and the Elbow method and the best quality and cluster power at k = 2 on the silhouette coefficient. It shows that the clustered accounts with the highest number of members on each attribute are places for accounts with high BOT scores from several aspects of the BOT score type.
Plum Analytics
References
Al-Rawi, A., & Shukla, V. (2020). Bots as active news promoters: A digital analysis of COVID-19 tweets. Information, 11(10), 1–13. https://doi.org/10.3390/info11100461
Anwar, A., & Yaqub, U. (2020). Bot detection in Twitter landscape using unsupervised learning. In The 21st Annual International Conference on Digital Government Research (pp. 329–330). https://doi.org/10.1145/3396956.3401801
Arora, P., Deepali, & Varshney, S. (2016). Analysis of K-Means and K-Medoids algorithm for big data. Procedia Computer Science, 78, 507–512. https://doi.org/10.1016/j.procs.2016.02.095
Bessi, A., & Ferrara, E. (2016). Social bots distort the 2016 U.S. presidential election online discussion. First Monday, 21(11). https://doi.org/https://doi.org/10.5210/fm.v21i11.7090
Bhatt, T., Kumar, V., Pande, S., Malik, R., Khamparia, A., & Gupta, D. (2021). A review on COVID-19. In F. Al-Turjman (Eds.), Artificial intelligence and machine learning for COVID-19. Studies in computational intelligence (pp. 25–42). https://doi.org/10.1007/978-3-030-60188-1_2
Cahyo, P. W., & Sudarmana, L. (2022). A comparison of K-Means and Agglomerative clustering for users segmentation based on question answerer reputation in Brainly platform. Elinvo (Electronics, Informatics, and Vocational Education), 6(2), 166–173. https://doi.org/10.21831/elinvo.v6i2.44486
Dwiarni, B. A., & Setiyono, B. (2019). Akuisisi dan clustering data sosial media menggunakan algoritma K-Means sebagai dasar untuk mengetahui profil pengguna. Jurnal Sains dan Seni, 8(2), A65–A70. https://doi.org/10.12962/j23373520.v8i2.49815
Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104. https://doi.org/10.1145/2818717
Fu, Q., Feng, B., Guo, D., & Li, Q. (2018). Combating the evolving spammers in online social networks. Computers & Security, 72, 60–73. https://doi.org/10.1016/j.cose.2017.08.014
Fuad, M., Rochman, E. M. S., & Rachmad, A. (2022). Salt commodity data clustering using Fuzzy C-Means. Journal of Physics: Conference Series, 2406, 1–9. https://doi.org/10.1088/1742-6596/2406/1/012025
Gilani, Z., Wang, L., Crowcroft, J., Almeida, M., & Farahbakhsh, R. (2016). Stweeler: A framework for Twitter bot analysis. Proceedings of the 25th International Conference Companion on World Wide Web (pp. 37–38). https://doi.org/10.1145/2872518.2889360
Himelein-Wachowiak, M., Giorgi, S., Devoto, A., Rahman, M., Ungar, L., Schwartz, H. A., ... & Curtis, B. (2021). Bots and misinformation spread on social media: Implications for COVID-19. Journal of Medical Internet Research, 23(5), 1–11. https://doi.org/10.2196/26933
Inuwa-Dutse, I., Liptrott, M., & Korkontzelos, I. (2018). Detection of spam-posting accounts on Twitter. Neurocomputing, 315, 496–511. https://doi.org/10.1016/j.neucom.2018.07.044
Ji, Y., He, Y., Jiang, X., Cao, J., & Li, Q. (2016). Combating the evasion mechanisms of social bots. Computers & Security, 58, 230–249. https://doi.org/10.1016/j.cose.2016.01.007
Kartino, A., M. Khairul Anam, Rahmaddeni, & Junadhi. (2021). Analisis akun Twitter berpengaruh terkait COVID-19 menggunakan Social Network Analysis. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(4), 697–704. https://doi.org/10.29207/resti.v5i4.3160
Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., & Malik, S. H. (2022). Detecting Twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques. International Journal of Information Management Data Insights, 2(2), 1–13. https://doi.org/10.1016/j.jjimei.2022.100120
Kontsewaya, Y., Antonov, E., & Artamonov, A. (2021). Evaluating the effectiveness of machine learning methods for spam detection. Procedia Computer Science, 190, 479–486. https://doi.org/10.1016/j.procs.2021.06.056
Kušen, E., & Strembeck, M. (2019). Something draws near, I can feel it: An analysis of human and bot emotion-exchange motifs on Twitter. Online Social Networks and Media, 10–11, 1–17. https://doi.org/10.1016/j.osnem.2019.04.001
Liu, X. (2019). A big data approach to examining social bots on Twitter. Journal of Services Marketing, 33(4), 369–379. https://doi.org/10.1108/JSM-02-2018-0049
Nainggolan, R., Perangin-Angin, R., Simarmata, E., & Tarigan, A. F. (2019). Improved the performance of the K-Means cluster using the Sum of Squared Error (SSE) optimized by using the Elbow method. Journal of Physics: Conference Series, 1361, 1–6. https://doi.org/10.1088/1742-6596/1361/1/012015
Orabi, M., Mouheb, D., Al Aghbari, Z., & Kamel, I. (2020). Detection of bots in social media: A systematic review. Information Processing & Management, 57(4), 1–23. https://doi.org/10.1016/j.ipm.2020.102250
Parlika, R., & Pratama, A. (2020). The online test application uses Telegram bots version 1.0. Journal of Physics: Conference Series, 1569, 1–7. https://doi.org/10.1088/1742-6596/1569/2/022042
Perdana, R. S., Muliawati, T. H., & Alexandro, R. (2015). Bot spammer detection in Twitter using tweet similarity and time interval entropy. Jurnal Ilmu Komputer dan Informasi, 8(1), 19–25.
Ramalingaiah, A., Hussaini, S., & Chaudhari, S. (2021). Twitter bot detection using supervised machine learning. Journal of Physics: Conference Series, 1950, 1–11. https://doi.org/10.1088/1742-6596/1950/1/012006
Reski, F. Z. E., & Rizal, Y. (2023). Implementation of the Partitioning Around Medoids (PAM) clustering method on poor population data in West Sumatera. Rangkiang Mathematics Journal, 2(1), 18–24. https://doi.org/10.24036/rmj.v2i1.26
Riquelme, F., & González-Cantergiani, P. (2016). Measuring user influence on Twitter: A survey. Information Processing & Management, 52(5), 949–975. https://doi.org/10.1016/j.ipm.2016.04.003
Sarasvananda, I. B. G., Wardoyo, R., & Sari, A. K. (2019). The K-Means clustering algorithm with semantic similarity to estimate the cost of hospitalization. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(4), 313–322. https://doi.org/10.22146/ijccs.45093
Wibowo, D. W., Yunshasnawa, Y., Setiawan, A., Rohadi, E., & Khabibi, M. K. (2019). Application of K-Medoids clustering method for grouping corn plants based on productivity, production, and area of land in East Java. Journal of Physics: Conference Series, 1402, 1–5. https://doi.org/10.1088/1742-6596/1402/7/077061
Yang, K. C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., & Menczer, F. (2019). Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies, 1, 48–61. https://doi.org/10.1002/hbe2.115
Yin, L., Li, M., Chen, H., & Deng, W. (2022). An improved hierarchical clustering algorithm based on the idea of population reproduction and fusion. Electronics, 11(17), 1–19. https://doi.org/10.3390/electronics11172735
Zhang, M. (2019). Use Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to identify galaxy cluster members. In IOP Conference Series: Earth and Environmental Science. https://doi.org/10.1088/1755-1315/252/4/042033
Zubair, M., Iqbal, M. A., Shil, A., Chowdhury, M. J. M., Moni, M. A., & Sarker, I. H. (2022). An improved K-means clustering algorithm towards an efficient data-driven modeling. Annals of Data Science. https://doi.org/10.1007/s40745-022-00428-2
Downloads
Published
Issue
Section
License
Copyright (c) 2023 M. Khairul Anam, Ike Yunia Pasa, Kartina Diah Kusuma Wardhani, Lusiana Efrizoni, Muhammad Bambang Firdaus
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: