Performance of Fuzzy C-Means (FCM) and Fuzzy Subtractive Clustering (FSC) on Medical Data Imputation

Authors

DOI:

https://doi.org/10.21512/comtech.v15i1.11002

Keywords:

Fuzzy C-Means (FCM), Fuzzy Subtractive Clustering (FSC), medical data imputation

Abstract

Missing values or incomplete data are frequently encountered in medical records. These issues will be a serious problem if the data must be provided completely for analysis. The research aimed to prove the performance of the Fuzzy Subtractive Clustering (FSC) and Fuzzy C-Means (FCM) methods for solving imputation problems. Both methods were implemented using medical data. It had been conducted using K-Means as a crisp clustering approach for imputation. In the research, fuzzy clustering—a distinct methodology—was applied. The primary research contribution was the suggested fuzzy logic imputation method, which took uncertainty under consideration. The data sample consisted of patients who were at least 40 years old and had a history of hypertension, diabetes, heart disease, stroke, or chronic kidney disease. The test was carried out by taking random portions of data from the entire medical record. The randomization technique used a probability of 10%–50%. The results of the ANOVA test show that the p-value is greater than ∝(=0.05). It means that the imputed value does not differ from the original value, whether implemented in the FSC or FCM method. The algorithm’s performance is evaluated using the Pearson correlation coefficient. According to the t-test results, the FCM method has a higher correlation coefficient than the FSC method. It implies that FCM is superior to FSC.

Dimensions

Plum Analytics

References

Afghari, A. P., Washington, S., Prato, C., & Haque, M. M. (2019). Contrasting case-wise deletion with multiple imputation and latent variable approaches to dealing with missing observations in count regression models. Analytic Methods in Accident Research, 24. https://doi.org/10.1016/J.AMAR.2019.100104

Audigier, V., White, I. R., Jolani, S., Debray, T. P. A., Quartagno, M., Carpenter, J., Van Buuren, S., & Resche-Rigon, M. (2018). Multiple imputation for multilevel data with continuous and binary variables. Statistical Science, 33(2), 160–183. https://doi.org/10.1214/18-STS646

Austin, P. C., White, I. R., Lee, D. S., & Van Buuren, S. (2020). Missing data in clinical research: A tutorial on multiple imputation. Canadian Journal of Cardiology, 37(9), 1322–1331. https://doi.org/10.1016/J.CJCA.2020.11.010

Batool, F., & Hennig, C. (2021). Clustering with the average silhouette width. Computational Statistics & Data Analysis, 158. https://doi.org/10.1016/J.CSDA.2021.107190

Blazek, K., Van Zwieten, A., Saglimbene, V., & Teixeira-Pinto, A. (2021). A practical guide to multiple imputation of missing data in nephrology. Kidney International, 99(1), 68–74. https://doi.org/10.1016/J.KINT.2020.07.035

Cheng, C. H., Chang, J. R., & Huang, H. H. (2020). A novel weighted distance threshold method for handling medical missing values. Computers in Biology and Medicine, 122. https://doi.org/10.1016/J.COMPBIOMED.2020.103824

Choudhury, S. J., & Pal, N. R. (2019). Imputation of missing data with neural networks for classification. Knowledge-Based Systems, 182. https://doi.org/10.1016/J.KNOSYS.2019.07.009

Crambes, C., & Henchiri, Y. (2019). Regression imputation in the functional linear model with missing values in the response. Journal of Statistical Planning and Inference, 201, 103–119. https://doi.org/10.1016/J.JSPI.2018.12.004

Faisal, S., & Tutz, G. (2021). Multiple imputation using nearest neighbor methods. Information Sciences, 570, 500–516. https://doi.org/10.1016/J.INS.2021.04.009

Ferrer, A. H., El Korso, M. N., Breloy, A., & Ginolhac, G. (2021). Robust mean and covariance matrix estimation under heterogeneous mixed-effects model with missing values. Signal Processing, 188. https://doi.org/10.1016/J.SIGPRO.2021.108195

Gautam, C., & Ravi, V. (2015). Data imputation via evolutionary computation, clustering and a neural network. Neurocomputing, 156, 134–142. https://doi.org/10.1016/J.NEUCOM.2014.12.073

Khan, H., Wang, X., & Liu, H. (2021). Missing value imputation through shorter interval selection driven by Fuzzy C-Means clustering. Computers & Electrical Engineering, 93. https://doi.org/10.1016/J.COMPELECENG.2021.107230

Kumaran, S. R., Othman, M. S., Yusuf, L. M., & Yunianta, A. (2019). Estimation of missing values using hybrid Fuzzy Clustering Mean and majority vote for microarray data. Procedia Computer Science, 163, 145–153. https://doi.org/10.1016/J.PROCS.2019.12.096

Kusumadewi, S., Rosita, L., & Wahyuni, E. G. (2020). Model sistem pendukung keputusan klinis untuk sindrom metabolik (1st ed.). UII Press.

Kusumadewi, S., Rosita, L., & Wahyuni, E. G. (2022a). Development of a modified certainty factor model for prediction of metabolic syndrome. International Journal of Innovative Computing, Information and Control (IJICIC), 18(5), 1463–1475.

Kusumadewi, S., Rosita, L., & Wahyuni, E. G. (2022b). Selection of aggregation function in fuzzy inference system for metabolic syndrome. International Journal on Advanced Science, Engineering and Information Technology, 12(5), 2140–2146. https://doi.org/10.18517/IJASEIT.12.5.15552

Luo, H., & Paal, S. G. (2021). Advancing post-earthquake structural evaluations via sequential regression-based predictive mean matching for enhanced forecasting in the context of missing data. Advanced Engineering Informatics, 47. https://doi.org/10.1016/J.AEI.2020.101202

Naghizadeh, A., & Metaxas, D. N. (2020). Condensed silhouette: An optimized filtering process for cluster selection in K-Means. Procedia Computer Science, 176, 205–214. https://doi.org/10.1016/J.PROCS.2020.08.022

Nancy, J. Y., Khanna, N. H., & Arputharaj, K. (2017). Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework. Computational Statistics & Data Analysis, 112, 63–79. https://doi.org/10.1016/J.CSDA.2017.02.012

Nekouie, A., & Moattar, M. H. (2019). Missing value imputation for breast cancer diagnosis data using tensor factorization improved by enhanced reduced adaptive particle swarm optimization. Journal of King Saud University - Computer and Information Sciences, 31(3), 287–294. https://doi.org/10.1016/J.JKSUCI.2018.01.006

Nishanth, K. J., & Ravi, V. (2016). Probabilistic neural network based categorical data imputation. Neurocomputing, 218, 17–25. https://doi.org/10.1016/J.NEUCOM.2016.08.044

Nobach, H. (2019). Note on nonparametric spectral analysis of wideband spectrum with missing data via sample-and-hold interpolation and deconvolution. Digital Signal Processing, 87, 19–20. https://doi.org/10.1016/J.DSP.2019.01.008

Pandey, A. K., Singh, G. N., Sayed-Ahmed, N., & Abu-Zinadah, H. (2021). Improved estimators for mean estimation in presence of missing information. Alexandria Engineering Journal, 60(6), 5977–5990. https://doi.org/10.1016/J.AEJ.2021.04.053

Roeling, M. P., & Nicholls, G. K. (2020). Imputation of attributes in networked data using Bayesian autocorrelation regression models. Social Networks, 62, 24–32. https://doi.org/10.1016/J.SOCNET.2020.02.005

Sefidian, A. M., & Daneshpour, N. (2019). Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Systems with Applications, 115, 68–94. https://doi.org/10.1016/J.ESWA.2018.07.057

Verpoort, P. C., MacDonald, P., & Conduit, G. J. (2018). Materials data validation and imputation with an artificial neural network. Computational Materials Science, 147, 176–185. https://doi.org/10.1016/J.COMMATSCI.2018.02.002

Yang, J., & Hu, M. (2018). Filling the missing data gaps of daily MODIS AOD using spatiotemporal interpolation. Science of the Total Environment, 633, 677–683. https://doi.org/10.1016/J.SCITOTENV.2018.03.202

Yang, L. H., Ye, F. F., Liu, J., Wang, Y. M., & Hu, H. (2021). An improved fuzzy rule-based system using evidential reasoning and subtractive clustering for environmental investment prediction. Fuzzy Sets and Systems, 421, 44–61. https://doi.org/10.1016/J.FSS.2021.02.018

Downloads

Published

2024-05-22
Abstract 55  .
PDF downloaded 36  .