Comparison of Machine Learning Classification Models in Predicting The Titanic Survival Rate

Authors

  • Andika Elok Amalia Bina Nusantara University
  • Cindy Rahayu Bina Nusantara University

DOI:

https://doi.org/10.21512/ijcshai.v2i1.12163

Keywords:

classification, XGBoost, random forest, exta trees, logistic regression

Abstract

The tragic sinking of the Titanic in 1912 has been a subject of great interest, particularly in analyzing the factors that influenced passenger survival rates. This study applies machine learning techniques to predict the survival of Titanic passengers based on various attributes. The dataset used includes demographic details and passenger-specific features such as age, gender, ticket class, number of siblings/spouses, number of parents/children traveling, ticket fare, and departure location. An exploratory data analysis is conducted to understand patterns within the dataset, followed by data preprocessing steps, including handling missing values and encoding categorical variables. To develop the predictive model, multiple machine learning algorithms are implemented, including Logistic Regression, Random Forest, Extra Trees, Decision Tree, LGBM Classifier, and XGBoost Classifier. The results indicate that the Random Forest model achieves the highest accuracy at 0.815, while the LGBM Classifier attains the highest cross-validation score of 0.821. Feature importance analysis highlights gender and ticket class as the most significant factors affecting survival probability. This study demonstrates the effectiveness of machine learning classification techniques in analyzing historical data and predicting binary outcomes. The insights gained from this research can be applied to other domains involving historical data analysis and classification tasks, such as risk assessment, medical prognosis, and social science research. By leveraging machine learning, this approach provides a data-driven perspective on historical events, enabling better decision-making in similar predictive modeling scenarios.

Dimensions

Author Biographies

Andika Elok Amalia, Bina Nusantara University

Computer Science Department, School of Computer Science

Cindy Rahayu, Bina Nusantara University

Computer Science Department, School of Computer Science

References

Jijo, B, T., and Abdulazeez, A, M. (2021) “Classification Based on Decision Tree Algorithm for Machine Learning”, JASTT, vol. 2, no. 01, pp. 20 - 28,. https://doi.org/10.38094/jastt20165

Shekhar, S., Arora, D., Sharma, P. (2021). Classifying Titanic Passenger Data and Prediction of Survival from Disaster. In: Goar, V., Kuri, M., Kumar, R., Senjyu, T. (eds) Advances in Information Communication Technology and Computing. Lecture Notes in Networks and Systems, vol 135. Springer, Singapore. https://doi.org/10.1007/978-981-15-5421-618_

Ekinci, Ekin & Omurca, Sevinc & Acun, Neytullah. (2018). A Comparative Study on Machine Learning Techniques Using Titanic Dataset. https://www.researchgate.net/publication/324909545_A_Comparative_Study_on_Machine_Lea rning_Techniques_Using_Titanic_Dataset

Singh, K., Nagpal, R., & Sehgal, R. (2020). Exploratory Data Analysis and Machine Learning on Titanic Disaster Dataset. 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence). https://doi:10.1109/confluence47617.2020.

Huang, S. (2024). Processing and Comparison of GBoost, XGBoost, and Random Forest in Titanic Survival Prediction. Applied and Computational Engineering, 102(1), 175–182. https://doi.org/10.54254/2755-2721/102/20241195

Huang, Y. (2024). Comparative Analysis of Models Based on Titanic Survival Predictions (pp. 146–153). https://doi.org/10.2991/978-94-6463-540-9_17

Dasgupta, A., Mishra, V. P., Jha, S., Singh, B., & Shukla, V. K. (2021). Predicting the Likelihood of Survival of Titanic’s Passengers by Machine Learning. Proceedings of 2nd IEEE International Conference on Computational Intelligence and Knowledge Economy, ICCIKE 2021, 52–57. https://doi.org/10.1109/ICCIKE51210.2021.9410757

Tutica, L., Vineel, K., Mishra, S., Mishra, M.K., Suman, S. (2021). Invoice Deduction Classification Using LGBM Prediction Model. In: Mallick, P.K., Bhoi, A.K., Chae, GS., Kalita, K. (eds) Advances in Electronics, Communication and Computing. ETAEERE 2020. Lecture Notes in Electrical Engineering, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-15-8752-8_13

Ahamed, B. S., Arya, M. S., Sangeetha, S. K. B., & Auxilia Osvin, N. v. (2022). Diabetes Mellitus Disease Prediction and Type Classification Involving Predictive Modeling Using Machine Learning Techniques and Classifiers. Applied Computational Intelligence and Soft Computing, 2022. https://doi.org/10.1155/2022/7899364

Al-Hadhrami, S., Al-Fassam, N., Benhidour, H. (2019). “Sentiment Analysis Of English Tweets: A Comparative Study Of Supervised And Unsupervised Approaches”, . In 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyad, Suudi Arabaistan, 1-3 Mayıs. https://doi.org/10.1016/j.procs.2021.12.187

Priyanka, N. A., & Kumar, D. (2020). Decision tree classifier: a detailed survey. International Journal of Information and Decision Sciences, 12(3), 246. https://doi:10.1504/ijids.2020.108141

Y. Huang, "Comparative Analysis of Models Based on Titanic Survival Predictions," in International Conference on Image, Algorithms and Artificial Intelligence, China, 2024.

Abhishek, L. (2020). Optical Character Recognition using Ensemble of SVM, MLP and Extra Trees Classifier. 2020 International Conference for Emerging Technology (INCET). doi: https://10.1109/incet49848.2020.915405

Sinha, N. K., Khulal, M., Gurung, M., & Lal, A. (2020). Developing a web based system for breast cancer prediction using xgboost classifier. International Journal of Engineering Research Technology (IJERT), 9(6), 852-856.

Wang, Y., Liu, Y., Zhao, J., & Zhang, Q. (2023). Low-Complexity Fast CU Classification Decision Method Based on LGBM Classifier. Electronics, 12(11), 2488. MDPI AG. Retrieved from http://dx.doi.org/10.3390/electronics12112488

Osman, M., He, J., Mokbal, F. M. M., Zhu, N., & Qureshi, S. (2021). ML-LGBM: A Machine Learning Model Based on Light Gradient Boosting Machine for the Detection of Version Number Attacks in RPL-Based Networks. IEEE Access, 9, 83654–83665. https://doi.org/10.1109/ACCESS.2021.3087175

Will Cukierski. Titanic - Machine Learning from Disaster. https://kaggle.com/competitions/titanic, 2012. Kaggle.

Downloads

Published

2025-02-20
Abstract 68  .
PDF downloaded 27  .