Machine Learning for Predicting Personality using Facebook-Based Posts
DOI:
https://doi.org/10.21512/emacsjournal.v6i1.10748Keywords:
Personality Prediction, Big Five Personality, Social Media, Machine Learning, FacebookAbstract
Social media contributes a lot to human life. People can share their thoughts through text, photos, and voice through social media. Information from social media can be useful, including in personality research. Personality can generally be known through personality tests. In this research, personality prediction is formed to determine personality through Facebook posts without using a personality test. We create a model based on big five personality traits using 5 machine learning algorithms: Support Vector Machine (SVM), Multinomial Naive Bayes, Decision Tree, K-Nearest Neighbor, and Logistic Regression. Data augmentation was also used for balancing the dataset value and trained using stratified 10-fold cross-validation. This research yields the highest f1 score on Openness using Multinomial Naive Bayes algorithm of 82.31% and the highest average is 68.62%. So the five supervised Machine Learning algorithms used in this research produced Multinomial Naive Bayes as the best algorithm to predict personality based on big five personality traits from user postings on Facebook.
Plum Analytics
References
Aung, Z. M. M., & Myint, P. H. (2019, July). Personality prediction based on content of facebook users: a literature review. In 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 34-38). IEEE. doi: 10.1109/SNPD.2019.8935692.
Auxier, B., & Anderson, M. (2021). Social media use in 2021. Pew Research Center, 1, 1-4.
Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personality and individual differences, 124, 150-159., doi: https://doi.org/10.1016/j.paid.2017.12.018.
Balakrishnan, Vimala and Lloyd-Yemoh, Ethel (2014). Stemming and lemmatization: A comparison of retrieval performances. In: Proceedings of SCEI Seoul Conferences, 10-11 Apr 2014, Seoul, Korea.
Celli, F., & Lepri, B. (2018, December). Is big five better than MBTI? A personality computing challenge using Twitter data. In Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it (Vol. 2018, pp. 93-98).
Christian, H., Agus, M. P., & Suhartono, D. (2016). Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285-294.
Christian, H., Suhartono, D., Chowanda, A., and Zamli, K., 2021. “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,” J. Big Data, vol. 8, May 2021, doi: 10.1186/s40537-021-00459-1.
Jeremy, N. H., Prasetyo, C., & Suhartono, D. (2019). Identifying personality traits for Indonesian user from twitter dataset. International Journal of Fuzzy Logic and Intelligent Systems, 19(4), 283-289. doi: 10.5391/IJFIS.2019.19.4.283.
K. Davis and R. Maiden, “The Importance of Understanding False Discoveries and the Accuracy Paradox When Evaluating Quantitative Studies,” Stud. Soc. Sci. Res., vol. 2, no. 2, p. p1, 2021, doi: 10.22158/sssr.v2n2p1.
Kobayashi, S. (2018). Contextual augmentation: Data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201. doi: 10.18653/v1/N18-2072.
Kunte, A. V., & Panicker, S. (2019, October). Analysis of machine learning algorithms for predicting personality: brief survey and experimentation. In 2019 global conference for advancement in technology (GCAT) (pp. 1-5). IEEE. doi: 10.1109/GCAT47503.2019.8978469.
Liu P., Wang X., Xiang C., and Meng W., 2020. “A Survey of Text Data Augmentation,” in 2020 International Conference on Computer Communication and Network Security (CCNS), pp. 191–195. doi: 10.1109/CCNS50731.2020.00049.
Meta, 2022. “Meta Earnings Presentation Q3 2022”. https://s21.q4cdn.com/399680738/files/doc_financials/2022/q3/Q3-2022_Earnings-Presentation.pdf
Ong, V., Rahmanto, A. D., Suhartono, D., Nugroho, A. E., Andangsari, E. W., & Suprayogi, M. N. (2017, September). Personality prediction based on Twitter information in Bahasa Indonesia. In 2017 federated conference on computer science and information systems (FedCSIS) (pp. 367-372). IEEE.
Ontoum, S., & Chan, J. H. (2022). Personality Type Based on Myers-Briggs Type Indicator with Text Posting Style by using Traditional and Deep Learning. arXiv preprint arXiv:2201.08717. [Online]. Available: http://arxiv.org/abs/2201.08717
Rosen, P. A., & Kluemper, D. H. (2008). The impact of the big five personality traits on the acceptance of social networking website. AMCIS 2008 proceedings, 274.
Statista, 2022. “Instagram users worldwide 2025 | Statista”. https://www.statista.com/statistics/183585/instagram-number-of-global-users/
Tandera, T., Suhartono, D., Wongso, R., & Prasetio, Y. L. (2017). Personality prediction system from facebook users. Procedia computer science, 116, 604-611. doi: https://doi.org/10.1016/j.procs.2017.10.016.
Wei J. and Zou K., 2019. “EDA: Easy data augmentation techniques for boosting performance on text classification tasks,” EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 6382–6388. doi: 10.18653/v1/d19-1670.
Widodo, S., Brawijaya, H., & Samudi, S. (2022). Stratified K-fold cross validation optimization on machine learning for prediction. Sinkron: jurnal dan penelitian teknik informatika, 7(4), 2407-2414. doi: 10.33395/sinkron.v7i4.11792.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Engineering, MAthematics and Computer Science (EMACS) Journal
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)