Machine Learning for Predicting Personality using Facebook-Based Posts

Authors

  • Derwin Suhartono Bina Nusantara University
  • Marcella Marella Ciputri Bina Nusantara University
  • Stefanny Susilo Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v6i1.10748

Keywords:

Personality Prediction, Big Five Personality, Social Media, Machine Learning, Facebook

Abstract

Social media contributes a lot to human life. People can share their thoughts through text, photos, and voice through social media. Information from social media can be useful, including in personality research. Personality can generally be known through personality tests. In this research, personality prediction is formed to determine personality through Facebook posts without using a personality test. We create a model based on big five personality traits using 5 machine learning algorithms: Support Vector Machine (SVM), Multinomial Naive Bayes, Decision Tree, K-Nearest Neighbor, and Logistic Regression. Data augmentation was also used for balancing the dataset value and trained using stratified 10-fold cross-validation. This research yields the highest f1 score on Openness using Multinomial Naive Bayes algorithm of 82.31% and the highest average is 68.62%. So the five supervised Machine Learning algorithms used in this research produced Multinomial Naive Bayes as the best algorithm to predict personality based on big five personality traits from user postings on Facebook.

Dimensions

Plum Analytics

Author Biographies

Derwin Suhartono, Bina Nusantara University

Computer Science Department, School of Computer Science

Marcella Marella Ciputri, Bina Nusantara University

Computer Science Department, School of Computer Science

Stefanny Susilo, Bina Nusantara University

Computer Science Department, School of Computer Science

References

Aung, Z. M. M., & Myint, P. H. (2019, July). Personality prediction based on content of facebook users: a literature review. In 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 34-38). IEEE. doi: 10.1109/SNPD.2019.8935692.

Auxier, B., & Anderson, M. (2021). Social media use in 2021. Pew Research Center, 1, 1-4.

Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personality and individual differences, 124, 150-159., doi: https://doi.org/10.1016/j.paid.2017.12.018.

Balakrishnan, Vimala and Lloyd-Yemoh, Ethel (2014). Stemming and lemmatization: A comparison of retrieval performances. In: Proceedings of SCEI Seoul Conferences, 10-11 Apr 2014, Seoul, Korea.

Celli, F., & Lepri, B. (2018, December). Is big five better than MBTI? A personality computing challenge using Twitter data. In Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it (Vol. 2018, pp. 93-98).

Christian, H., Agus, M. P., & Suhartono, D. (2016). Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285-294.

Christian, H., Suhartono, D., Chowanda, A., and Zamli, K., 2021. “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,” J. Big Data, vol. 8, May 2021, doi: 10.1186/s40537-021-00459-1.

Jeremy, N. H., Prasetyo, C., & Suhartono, D. (2019). Identifying personality traits for Indonesian user from twitter dataset. International Journal of Fuzzy Logic and Intelligent Systems, 19(4), 283-289. doi: 10.5391/IJFIS.2019.19.4.283.

K. Davis and R. Maiden, “The Importance of Understanding False Discoveries and the Accuracy Paradox When Evaluating Quantitative Studies,” Stud. Soc. Sci. Res., vol. 2, no. 2, p. p1, 2021, doi: 10.22158/sssr.v2n2p1.

Kobayashi, S. (2018). Contextual augmentation: Data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201. doi: 10.18653/v1/N18-2072.

Kunte, A. V., & Panicker, S. (2019, October). Analysis of machine learning algorithms for predicting personality: brief survey and experimentation. In 2019 global conference for advancement in technology (GCAT) (pp. 1-5). IEEE. doi: 10.1109/GCAT47503.2019.8978469.

Liu P., Wang X., Xiang C., and Meng W., 2020. “A Survey of Text Data Augmentation,” in 2020 International Conference on Computer Communication and Network Security (CCNS), pp. 191–195. doi: 10.1109/CCNS50731.2020.00049.

Meta, 2022. “Meta Earnings Presentation Q3 2022”. https://s21.q4cdn.com/399680738/files/doc_financials/2022/q3/Q3-2022_Earnings-Presentation.pdf

Ong, V., Rahmanto, A. D., Suhartono, D., Nugroho, A. E., Andangsari, E. W., & Suprayogi, M. N. (2017, September). Personality prediction based on Twitter information in Bahasa Indonesia. In 2017 federated conference on computer science and information systems (FedCSIS) (pp. 367-372). IEEE.

Ontoum, S., & Chan, J. H. (2022). Personality Type Based on Myers-Briggs Type Indicator with Text Posting Style by using Traditional and Deep Learning. arXiv preprint arXiv:2201.08717. [Online]. Available: http://arxiv.org/abs/2201.08717

Rosen, P. A., & Kluemper, D. H. (2008). The impact of the big five personality traits on the acceptance of social networking website. AMCIS 2008 proceedings, 274.

Statista, 2022. “Instagram users worldwide 2025 | Statista”. https://www.statista.com/statistics/183585/instagram-number-of-global-users/

Tandera, T., Suhartono, D., Wongso, R., & Prasetio, Y. L. (2017). Personality prediction system from facebook users. Procedia computer science, 116, 604-611. doi: https://doi.org/10.1016/j.procs.2017.10.016.

Wei J. and Zou K., 2019. “EDA: Easy data augmentation techniques for boosting performance on text classification tasks,” EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 6382–6388. doi: 10.18653/v1/d19-1670.

Widodo, S., Brawijaya, H., & Samudi, S. (2022). Stratified K-fold cross validation optimization on machine learning for prediction. Sinkron: jurnal dan penelitian teknik informatika, 7(4), 2407-2414. doi: 10.33395/sinkron.v7i4.11792.

Downloads

Published

2024-01-31

Issue

Section

Articles
Abstract 342  .
PDF downloaded 269  .