Comparison of Machine Learning Classification Models in Predicting The Titanic Survival Rate
DOI:
https://doi.org/10.21512/ijcshai.v2i1.12163Keywords:
classification, XGBoost, random forest, exta trees, logistic regressionAbstract
The tragic sinking of the Titanic in 1912 has been a subject of great interest, particularly in analyzing the factors that influenced passenger survival rates. This study applies machine learning techniques to predict the survival of Titanic passengers based on various attributes. The dataset used includes demographic details and passenger-specific features such as age, gender, ticket class, number of siblings/spouses, number of parents/children traveling, ticket fare, and departure location. An exploratory data analysis is conducted to understand patterns within the dataset, followed by data preprocessing steps, including handling missing values and encoding categorical variables. To develop the predictive model, multiple machine learning algorithms are implemented, including Logistic Regression, Random Forest, Extra Trees, Decision Tree, LGBM Classifier, and XGBoost Classifier. The results indicate that the Random Forest model achieves the highest accuracy at 0.815, while the LGBM Classifier attains the highest cross-validation score of 0.821. Feature importance analysis highlights gender and ticket class as the most significant factors affecting survival probability. This study demonstrates the effectiveness of machine learning classification techniques in analyzing historical data and predicting binary outcomes. The insights gained from this research can be applied to other domains involving historical data analysis and classification tasks, such as risk assessment, medical prognosis, and social science research. By leveraging machine learning, this approach provides a data-driven perspective on historical events, enabling better decision-making in similar predictive modeling scenarios.
References
Jijo, B, T., and Abdulazeez, A, M. (2021) “Classification Based on Decision Tree Algorithm for Machine Learning”, JASTT, vol. 2, no. 01, pp. 20 - 28,. https://doi.org/10.38094/jastt20165
Shekhar, S., Arora, D., Sharma, P. (2021). Classifying Titanic Passenger Data and Prediction of Survival from Disaster. In: Goar, V., Kuri, M., Kumar, R., Senjyu, T. (eds) Advances in Information Communication Technology and Computing. Lecture Notes in Networks and Systems, vol 135. Springer, Singapore. https://doi.org/10.1007/978-981-15-5421-618_
Ekinci, Ekin & Omurca, Sevinc & Acun, Neytullah. (2018). A Comparative Study on Machine Learning Techniques Using Titanic Dataset. https://www.researchgate.net/publication/324909545_A_Comparative_Study_on_Machine_Lea rning_Techniques_Using_Titanic_Dataset
Singh, K., Nagpal, R., & Sehgal, R. (2020). Exploratory Data Analysis and Machine Learning on Titanic Disaster Dataset. 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence). https://doi:10.1109/confluence47617.2020.
Huang, S. (2024). Processing and Comparison of GBoost, XGBoost, and Random Forest in Titanic Survival Prediction. Applied and Computational Engineering, 102(1), 175–182. https://doi.org/10.54254/2755-2721/102/20241195
Huang, Y. (2024). Comparative Analysis of Models Based on Titanic Survival Predictions (pp. 146–153). https://doi.org/10.2991/978-94-6463-540-9_17
Dasgupta, A., Mishra, V. P., Jha, S., Singh, B., & Shukla, V. K. (2021). Predicting the Likelihood of Survival of Titanic’s Passengers by Machine Learning. Proceedings of 2nd IEEE International Conference on Computational Intelligence and Knowledge Economy, ICCIKE 2021, 52–57. https://doi.org/10.1109/ICCIKE51210.2021.9410757
Tutica, L., Vineel, K., Mishra, S., Mishra, M.K., Suman, S. (2021). Invoice Deduction Classification Using LGBM Prediction Model. In: Mallick, P.K., Bhoi, A.K., Chae, GS., Kalita, K. (eds) Advances in Electronics, Communication and Computing. ETAEERE 2020. Lecture Notes in Electrical Engineering, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-15-8752-8_13
Ahamed, B. S., Arya, M. S., Sangeetha, S. K. B., & Auxilia Osvin, N. v. (2022). Diabetes Mellitus Disease Prediction and Type Classification Involving Predictive Modeling Using Machine Learning Techniques and Classifiers. Applied Computational Intelligence and Soft Computing, 2022. https://doi.org/10.1155/2022/7899364
Al-Hadhrami, S., Al-Fassam, N., Benhidour, H. (2019). “Sentiment Analysis Of English Tweets: A Comparative Study Of Supervised And Unsupervised Approaches”, . In 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyad, Suudi Arabaistan, 1-3 Mayıs. https://doi.org/10.1016/j.procs.2021.12.187
Priyanka, N. A., & Kumar, D. (2020). Decision tree classifier: a detailed survey. International Journal of Information and Decision Sciences, 12(3), 246. https://doi:10.1504/ijids.2020.108141
Y. Huang, "Comparative Analysis of Models Based on Titanic Survival Predictions," in International Conference on Image, Algorithms and Artificial Intelligence, China, 2024.
Abhishek, L. (2020). Optical Character Recognition using Ensemble of SVM, MLP and Extra Trees Classifier. 2020 International Conference for Emerging Technology (INCET). doi: https://10.1109/incet49848.2020.915405
Sinha, N. K., Khulal, M., Gurung, M., & Lal, A. (2020). Developing a web based system for breast cancer prediction using xgboost classifier. International Journal of Engineering Research Technology (IJERT), 9(6), 852-856.
Wang, Y., Liu, Y., Zhao, J., & Zhang, Q. (2023). Low-Complexity Fast CU Classification Decision Method Based on LGBM Classifier. Electronics, 12(11), 2488. MDPI AG. Retrieved from http://dx.doi.org/10.3390/electronics12112488
Osman, M., He, J., Mokbal, F. M. M., Zhu, N., & Qureshi, S. (2021). ML-LGBM: A Machine Learning Model Based on Light Gradient Boosting Machine for the Detection of Version Number Attacks in RPL-Based Networks. IEEE Access, 9, 83654–83665. https://doi.org/10.1109/ACCESS.2021.3087175
Will Cukierski. Titanic - Machine Learning from Disaster. https://kaggle.com/competitions/titanic, 2012. Kaggle.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Andika Elok Amalia; Cindy Rahayu

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.