Performance of Fuzzy C-Means (FCM) and Fuzzy Subtractive Clustering (FSC) on Medical Data Imputation
DOI:
https://doi.org/10.21512/comtech.v15i1.11002Keywords:
Fuzzy C-Means (FCM), Fuzzy Subtractive Clustering (FSC), medical data imputationAbstract
Missing values or incomplete data are frequently encountered in medical records. These issues will be a serious problem if the data must be provided completely for analysis. The research aimed to prove the performance of the Fuzzy Subtractive Clustering (FSC) and Fuzzy C-Means (FCM) methods for solving imputation problems. Both methods were implemented using medical data. It had been conducted using K-Means as a crisp clustering approach for imputation. In the research, fuzzy clustering—a distinct methodology—was applied. The primary research contribution was the suggested fuzzy logic imputation method, which took uncertainty under consideration. The data sample consisted of patients who were at least 40 years old and had a history of hypertension, diabetes, heart disease, stroke, or chronic kidney disease. The test was carried out by taking random portions of data from the entire medical record. The randomization technique used a probability of 10%–50%. The results of the ANOVA test show that the p-value is greater than ∝(=0.05). It means that the imputed value does not differ from the original value, whether implemented in the FSC or FCM method. The algorithm’s performance is evaluated using the Pearson correlation coefficient. According to the t-test results, the FCM method has a higher correlation coefficient than the FSC method. It implies that FCM is superior to FSC.
Plum Analytics
References
Afghari, A. P., Washington, S., Prato, C., & Haque, M. M. (2019). Contrasting case-wise deletion with multiple imputation and latent variable approaches to dealing with missing observations in count regression models. Analytic Methods in Accident Research, 24. https://doi.org/10.1016/J.AMAR.2019.100104
Audigier, V., White, I. R., Jolani, S., Debray, T. P. A., Quartagno, M., Carpenter, J., Van Buuren, S., & Resche-Rigon, M. (2018). Multiple imputation for multilevel data with continuous and binary variables. Statistical Science, 33(2), 160–183. https://doi.org/10.1214/18-STS646
Austin, P. C., White, I. R., Lee, D. S., & Van Buuren, S. (2020). Missing data in clinical research: A tutorial on multiple imputation. Canadian Journal of Cardiology, 37(9), 1322–1331. https://doi.org/10.1016/J.CJCA.2020.11.010
Batool, F., & Hennig, C. (2021). Clustering with the average silhouette width. Computational Statistics & Data Analysis, 158. https://doi.org/10.1016/J.CSDA.2021.107190
Blazek, K., Van Zwieten, A., Saglimbene, V., & Teixeira-Pinto, A. (2021). A practical guide to multiple imputation of missing data in nephrology. Kidney International, 99(1), 68–74. https://doi.org/10.1016/J.KINT.2020.07.035
Cheng, C. H., Chang, J. R., & Huang, H. H. (2020). A novel weighted distance threshold method for handling medical missing values. Computers in Biology and Medicine, 122. https://doi.org/10.1016/J.COMPBIOMED.2020.103824
Choudhury, S. J., & Pal, N. R. (2019). Imputation of missing data with neural networks for classification. Knowledge-Based Systems, 182. https://doi.org/10.1016/J.KNOSYS.2019.07.009
Crambes, C., & Henchiri, Y. (2019). Regression imputation in the functional linear model with missing values in the response. Journal of Statistical Planning and Inference, 201, 103–119. https://doi.org/10.1016/J.JSPI.2018.12.004
Faisal, S., & Tutz, G. (2021). Multiple imputation using nearest neighbor methods. Information Sciences, 570, 500–516. https://doi.org/10.1016/J.INS.2021.04.009
Ferrer, A. H., El Korso, M. N., Breloy, A., & Ginolhac, G. (2021). Robust mean and covariance matrix estimation under heterogeneous mixed-effects model with missing values. Signal Processing, 188. https://doi.org/10.1016/J.SIGPRO.2021.108195
Gautam, C., & Ravi, V. (2015). Data imputation via evolutionary computation, clustering and a neural network. Neurocomputing, 156, 134–142. https://doi.org/10.1016/J.NEUCOM.2014.12.073
Khan, H., Wang, X., & Liu, H. (2021). Missing value imputation through shorter interval selection driven by Fuzzy C-Means clustering. Computers & Electrical Engineering, 93. https://doi.org/10.1016/J.COMPELECENG.2021.107230
Kumaran, S. R., Othman, M. S., Yusuf, L. M., & Yunianta, A. (2019). Estimation of missing values using hybrid Fuzzy Clustering Mean and majority vote for microarray data. Procedia Computer Science, 163, 145–153. https://doi.org/10.1016/J.PROCS.2019.12.096
Kusumadewi, S., Rosita, L., & Wahyuni, E. G. (2020). Model sistem pendukung keputusan klinis untuk sindrom metabolik (1st ed.). UII Press.
Kusumadewi, S., Rosita, L., & Wahyuni, E. G. (2022a). Development of a modified certainty factor model for prediction of metabolic syndrome. International Journal of Innovative Computing, Information and Control (IJICIC), 18(5), 1463–1475.
Kusumadewi, S., Rosita, L., & Wahyuni, E. G. (2022b). Selection of aggregation function in Fuzzy inference system for metabolic syndrome. International Journal on Advanced Science, Engineering and Information Technology, 12(5), 2140–2146. https://doi.org/10.18517/IJASEIT.12.5.15552
Luo, H., & Paal, S. G. (2021). Advancing post-earthquake structural evaluations via sequential regression-based predictive mean matching for enhanced forecasting in the context of missing data. Advanced Engineering Informatics, 47. https://doi.org/10.1016/J.AEI.2020.101202
Naghizadeh, A., & Metaxas, D. N. (2020). Condensed silhouette: An optimized filtering process for cluster selection in K-Means. Procedia Computer Science, 176, 205–214. https://doi.org/10.1016/J.PROCS.2020.08.022
Nancy, J. Y., Khanna, N. H., & Arputharaj, K. (2017). Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework. Computational Statistics & Data Analysis, 112, 63–79. https://doi.org/10.1016/J.CSDA.2017.02.012
Nekouie, A., & Moattar, M. H. (2019). Missing value imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive particle swarm optimization. Journal of King Saud University - Computer and Information Sciences, 31(3), 287–294. https://doi.org/10.1016/J.JKSUCI.2018.01.006
Nishanth, K. J., & Ravi, V. (2016). Probabilistic neural network based categorical data imputation. Neurocomputing, 218, 17–25. https://doi.org/10.1016/J.NEUCOM.2016.08.044
Nobach, H. (2019). Note on nonparametric spectral analysis of wideband spectrum with missing data via sample-and-hold interpolation and deconvolution. Digital Signal Processing, 87, 19–20. https://doi.org/10.1016/J.DSP.2019.01.008
Pandey, A. K., Singh, G. N., Sayed-Ahmed, N., & Abu-Zinadah, H. (2021). Improved estimators for mean estimation in presence of missing information. Alexandria Engineering Journal, 60(6), 5977–5990. https://doi.org/10.1016/J.AEJ.2021.04.053
Roeling, M. P., & Nicholls, G. K. (2020). Imputation of attributes in networked data using Bayesian autocorrelation regression models. Social Networks, 62, 24–32. https://doi.org/10.1016/J.SOCNET.2020.02.005
Sefidian, A. M., & Daneshpour, N. (2019). Missing value imputation using a novel grey based Fuzzy C-Means, mutual information based feature selection, and regression model. Expert Systems with Applications, 115, 68–94. https://doi.org/10.1016/J.ESWA.2018.07.057
Verpoort, P. C., MacDonald, P., & Conduit, G. J. (2018). Materials data validation and imputation with an artificial neural network. Computational Materials Science, 147, 176–185. https://doi.org/10.1016/J.COMMATSCI.2018.02.002
Yang, J., & Hu, M. (2018). Filling the missing data gaps of daily MODIS AOD using spatiotemporal interpolation. Science of the Total Environment, 633, 677–683. https://doi.org/10.1016/J.SCITOTENV.2018.03.202
Yang, L. H., Ye, F. F., Liu, J., Wang, Y. M., & Hu, H. (2021). An improved Fuzzy rule-based system using evidential reasoning and subtractive clustering for environmental investment prediction. Fuzzy Sets and Systems, 421, 44–61. https://doi.org/10.1016/J.FSS.2021.02.018
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Sri Kusumadewi, Linda Rosita, Elyza Gustri Wahyuni
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: