Comparison of the Performance Results of C4.5 and Random Forest Algorithm in Data Mining to Predict Childbirth Process

Muhasshanah Muhasshanah; Mohammad Tohir; Dewi Andariya Ningsih; Neny Yuli Susanti; Astik Umiyah; Lia Fitria

doi:10.21512/commit.v17i1.8236

Authors

Muhasshanah Universitas Ibrahimy https://orcid.org/0000-0003-1539-9598
Mohammad Tohir Universitas Ibrahimy https://orcid.org/0000-0001-8342-0972
Dewi Andariya Ningsih Universitas Ibrahimy
Neny Yuli Susanti Universitas Ibrahimy
Astik Umiyah Universitas Ibrahimy
Lia Fitria Universitas Ibrahimy

DOI:

https://doi.org/10.21512/commit.v17i1.8236

Keywords:

C4.5 Algorithm, Random Forest Algorithm, Data Mining, Childbirth Process

Abstract

Technology advancements in the world of information have made it easier for many people to process data. Data mining is a process of mining more valuable information from large data sets. The research aims to determine the difference between the C.45 and random forest algorithms in data mining to predict the childbirth process of pregnant women. It compares the accuracy of the performance results of the C4.5 and random forest algorithms to predict the delivery process for pregnant women. Then, experimental research is conducted to classify the childbirth process in Situbondo, Indonesia, by applying the C.45 and the random forest algorithm in the data mining. The decision tree J48 algorithm is used for the C4.5 algorithm in the research. Both algorithms are compared for their error classification and accuracy level. The research uses 1,000 data for training and 200 data for testing. The results show the accuracy of implementing the C4.5 and random forest algorithms with data mining using 10-fold cross-validation, generating 96% and 95% as correctly classified data. Then, the Relative Absolute Error for both algorithms has the same result. It is 15%. The C4.5 algorithm has a better result than the random forest algorithm by comparing the performance results. Further research can add more data to improve the accuracy of the analysis results by using another algorithm.

Dimensions

Plum Analytics

Author Biographies

Muhasshanah, Universitas Ibrahimy

Program Studi Teknologi Informasi

Mohammad Tohir, Universitas Ibrahimy

Program Studi Tadris Matematika

Dewi Andariya Ningsih, Universitas Ibrahimy

Program Studi Kebidanan

Neny Yuli Susanti, Universitas Ibrahimy

Program Studi Kebidanan

Astik Umiyah, Universitas Ibrahimy

Program Studi Kebidanan

Lia Fitria, Universitas Ibrahimy

Program Studi Pendidikan Profesi Bidan

References

J. Yang, Y. Li, Q. Liu, L. Li, A. Feng, T. Wang, S. Zheng, A. Xu, and J. Lyu, â€œBrief introduction of medical database and data mining technology in big data era,â€ Journal of Evidence-Based Medicine, vol. 13, no. 1, pp. 57â€“69, 2020.

S. Lv, H. Kim, B. Zheng, and H. Jin, â€œA review of data mining with big data towards its applications in the electronics industry,â€ Applied Sciences, vol. 8, no. 4, pp. 1â€“34, 2018.

S. Dutta and S. K. Bandyopadhyay, â€œEmployee attrition prediction using neural network cross validation method,â€ International Journal of Commerce and Management Research, vol. 6, no. 3, pp. 80â€“85, 2020.

D. S. Abdelminaam, N. Neggaz, I. A. E. Gomaa, F. H. Ismail, and A. A. Elsawy, â€œArabicDialects: An efficient framework for Arabic dialects opinion mining on Twitter using optimized deep neural networks,â€ IEEE Access, vol. 9, pp. 97 079â€“97 099, 2021.

S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia, and J. Gutierrez, â€œA comprehensive investigation and comparison of machine learning techniques in the domain of heart disease,â€ in 2017 IEEE Symposium on Computers and Communications (ISCC). Heraklion, Greece: IEEE, July 3â€“6, 2017, pp. 204â€“207.

Z. Xu and Z. Wang, â€œA risk prediction model for type 2 diabetes based on weighted feature selection of random forest and XGBoost ensemble classifier,â€ in 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI). Guilin, China: IEEE, June 7â€“9, 2019, pp. 278â€“283.

T. T. Huynh-Cam, L. S. Chen, and H. Le, â€œUsing decision trees and random forest algorithms to predict and determine factors contributing to firstyear university studentsâ€™ learning performance,â€ Algorithms, vol. 14, no. 11, pp. 1â€“17, 2021.

S. Poudyal, M. Nagahi, M. Nagahisarchoghaei, and G. Ghanbari, â€œMachine learning techniques for determining studentsâ€™ academic performance: A sustainable development case for engineering education,â€ in 2020 International Conference on Decision Aid Sciences and Application (DASA). Sakheer, Bahrain: IEEE, Nov. 8â€“9, 2020, pp. 920â€“924.

A. Hamoud, A. S. Hashim, and W. A. Awadh, â€œPredicting student performance in higher education institutions using decision tree analysis,â€ International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, pp. 26â€“31, 2018.

R. Sudrajat, I. Irianingsih, and D. Krisnawan, â€œAnalysis of data mining classification by comparison of C4.5 and ID algorithms,â€ in IOP Conference Series: Materials Science and Engineering, vol. 166. IOP Publishing, 2017, pp.1â€“8.

R. Jothikumar and S. R. Balan, â€œC4.5 classification algorithm with back-track pruning for accurate prediction of heart disease,â€ Biomedical Research, pp. S107â€“S111, 2016.

M. T. Yazici, S. Basurra, and M. M. Gaber, â€œEdge machine learning: Enabling smart internet of things applications,â€ Big Data and Cognitive Computing, vol. 2, no. 3, pp. 1â€“17, 2018.

A. Priyam, G. R. Abhijeeta, A. Rathee, and S. Srivastava, â€œComparative analysis of decision tree classification algorithms,â€ International Journal of Current Engineering and Technology, vol. 3, no. 2, pp. 334â€“337, 2013.

K. Kim, â€œA hybrid classification algorithm by subspace partitioning through semi-supervised decision tree,â€ Pattern Recognition, vol. 60, pp. 157â€“163, 2016.

P. Y. Pawar and S. H. Gawande, â€œA comparative study on different types of approaches to text categorization,â€ International Journal of Machine Learning and Computing, vol. 2, no. 4, pp. 423â€“426, 2012.

W. Baswardono, D. Kurniadi, A. Mulyani, and D. M. Arifin, â€œComparative analysis of decision tree algorithms: Random forest and C4.5 for airlines customer satisfaction classification,â€ Journal of Physics: Conference Series, vol. 1402, pp. 1â€“6, 2019.

W. Gata, H. Basri, R. Hidayat, Y. E. Patras, B. Baharuddin, R. Fatmasari, S. Tohari, and N. K. Wardhani, â€œAlgorithm implementations NaÂ¨Ä±ve Bayes, random forest. C4.5 on online gaming for learning achievement predictions,â€ in 2nd International Conference on Research of Educational Administration and Management (ICREAM 2018). Bandung, Indonesia: Atlantis Press, Oct. 18, 2019, pp. 1â€“9.

E. Ismanto and M. Novalia, â€œKomparasi kinerja algoritma C4.5, random forest, dan gradient boosting untuk klasifikasi komoditas,â€ Techno.Com, vol. 20, no. 3, pp. 400â€“410, 2021.

R. B. Bhardwaj and S. R. Chaurasia, â€œUse of ANN, C4.5 and random forest algorithm in the evaluation of seismic soil liquefaction,â€ Journal of Soft Computing in Civil Engineering, vol. 6, no. 2, pp. 92â€“106, 2022.

W. Gata, G. Grand, R. Fatmasari, B. Baharuddin, Y. E. Patras, R. Hidayat, S. Tohari, and N. K. Wardhani, â€œPrediction of teachersâ€™ lateness factors coming to school using C4.5, random tree, random forest algorithm,â€ in 2nd International Conference on Research of Educational Administration and Management (ICREAM 2018). Bandung, Indonesia: Atlantis Press, Oct. 18, 2019, pp. 161â€“166.

A. Lalonde, K. Herschderfer, D. Pascali-Bonaro, C. Hanson, C. Fuchtner, and G. H. A. Visser, â€œThe international childbirth initiative: 12 steps to safe and respectful MotherBaby-Family maternity care,â€ International Journal of Gynecology & Obstetrics, vol. 146, no. 1, pp. 65â€“73, 2019.

A. A. Daniels and A. Abuosi, â€œImproving emergency obstetric referral systems in low and middle income countries: A qualitative study in a tertiary health facility in Ghana,â€ BMC Health Services Research, vol. 20, no. 1, pp. 1â€“10, 2020.

R. Rahim, I. Zufria, N. Kurniasih, M. Y. Simargolang, A. Hasibuan, D. U. Sutiksno, R. F. Nanuru, J. N. Anamofa, A. S. Ahmar, and A. D. GS, â€œC4.5 classification data mining for inventory control,â€ International Journal of Engineering & Technology, vol. 7, no. 2.3, pp. 68â€“72, 2018.

E. M. Moegni and D. Ocviyanti, Buku saku pelayanan kesehatan ibu di fasilitas kesehatan dasar dan rujukan. Kementerian Kesehatan Republik Indonesia, 2013.

A. Sofian, Rustam Mochtar sinopsis obstetri. EGC, 2012.

A. Craik, Y. He, and J. L. Contreras-Vidal, â€œDeep learning for electroencephalogram (EEG) classification tasks: A review,â€ Journal of Neural Engineering, vol. 16, no. 3, pp. 1â€“28, 2019.

W. Liu, J. Su, Z. Mao, P. Jin, Y. Huang, C. Dou, L. Zhou, and Y. Shang, â€œResearch on text classification method of distribution network equipment fault based on deep learning,â€ in 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom). New York, USA: IEEE, Aug. 1â€“3, 2020, pp. 11â€“16.

W. Liu, B.Wang, and Z. Song, â€œFailure prediction of municipal water pipes using machine learning algorithms,â€ Water Resources Management, vol. 36, no. 4, pp. 1271â€“1285, 2022.

A. Rana and R. Pandey, â€œA review of popular decision tree algorithms in data mining,â€ Asian Journal of Multidimensional Research, vol. 10, no. 10, pp. 230â€“237, 2021.

M. Rakhra, P. Soniya, D. Tanwar, P. Singh, D. Bordoloi, P. Agarwal, S. Takkar, K. Jairath, and N. Verma, â€œCrop price prediction using random forest and decision tree regression:-A review,â€ Materials Today: Proceedings, 2021.

S. R. Hashemi, S. S. M. Salehi, D. Erdogmus, S. P. Prabhu, S. K. Warfield, and A. Gholipour, â€œAsymmetric loss functions and deep denselyconnected networks for highly-imbalanced medical image segmentation: Application to multiple sclerosis lesion detection,â€ IEEE Access, vol. 7, pp. 1721â€“1735, 2018.

K. Sprute, V. Kramer, S. A. Koerber, M. Meneses, R. Fernandez, C. Soza-Ried, M. Eiber, W. A. Weber, I. Rauscher, K. Rahbar et al., â€œDiagnostic accuracy of 18F-PSMA-1007 PET/CT imaging for lymph node staging of prostate carcinoma in primary and biochemical recurrence,â€ Journal of Nuclear Medicine, vol. 62, no. 2, pp. 208â€“213, 2021.

M. A. Muslim, S. H. Rukmana, E. Sugiharti, B. Prasetiyo, and S. Alimah, â€œOptimization of C4.5 algorithm-based particle swarm optimization for breast cancer diagnosis,â€ Journal of Physics: Conference Series, vol. 983, pp. 1â€“7, 2018.

M. Pal and S. Parija, â€œPrediction of heart diseases using random forest,â€ Journal of Physics: Conference Series, vol. 1817, pp. 1â€“8, 2021.

Y. A. Saadoon and R. H. Abdulamir, â€œImproved random forest algorithm performance for big data,â€ Journal of Physics: Conference Series, vol. 1897, pp. 1â€“13, 2021.

OÂ¨ . Akar and O. GuÂ¨ngoÂ¨r, â€œClassification of multispectral images using random forest algorithm,â€ Journal of Geodesy and Geoinformation, vol. 1, no. 2, pp. 105â€“112, 2012.