The Implementation of the Fuzzy C-Means Method in Handling Outlier Data in the 2021 Village Potential Data of Bengkulu Province

Intan Juliana Panjaitan; Indahwati Indahwati; Farit Mochamad Afendi

doi:10.21512/comtech.v16i1.12274

Authors

Intan Juliana Panjaitan IPB University
Indahwati IPB University
Farit Mochamad Afendi Universitas IPB

DOI:

https://doi.org/10.21512/comtech.v16i1.12274

Keywords:

Fuzzy C-Means (FCM) method, outlier data, village potential data

Abstract

Clustering groups aims to ensure similarity within clusters and disparity between them. The research evaluated the Fuzzy C-Means method’s effectiveness in clustering large datasets containing outliers, focusing on the 2021 Village Potential data from Bengkulu Province. The dataset, comprising 1,514 observations from villages and urban villages, provided a comprehensive resource for understanding regional development. Outliers, a common challenge in cluster analysis, were detected using univariate and multivariate methods, revealing substantial variability. PCA was applied, improving clustering quality to address multicollinearity among variables. In the results, the fuzzifier (w) parameter in the FCM method plays a crucial role in controlling the degree of membership for data points in clusters, which can potentially reduce the impact of outliers, enhancing clustering robustness and accuracy. The FCM method effectively produces clusters with high intra-cluster homogeneity and inter-cluster heterogeneity. Using the Elbow method, three optimal clusters are identified. Cluster 1, dominated by villages in Bengkulu City, is the most advanced, with superior infrastructure and services, but the fewest villages business units, necessitating economic empowerment. Cluster 2, comprising villages in North Bengkulu Regency, demonstrates moderate development but suffers from poor transportation access, requiring improvements to support socio-economic activities. Cluster 3, dominated by villages in Kaur Regency, is the least developed, with limited basic services and infrastructure, highlighting the need for substantial investments in governance and essential services. These findings provide actionable insights for village development in Bengkulu Province, supporting targeted policies tailored to each cluster’s unique characteristics.

Dimensions

Author Biographies

Intan Juliana Panjaitan, IPB University

Departemen Statistika, FMIPA

Indahwati, IPB University

Departemen Statistika, FMIPA

Farit Mochamad Afendi, Universitas IPB

Departemen Statistika, FMIPA

References

Abdellahoum, H., Mokhtari, N., Brahimi, A., & Boukra, A. (2021). CSFCM: An improved Fuzzy C-Means image segmentation algorithm using a cooperative approach. Expert Systems with Applications, 166. https://doi.org/10.1016/j.eswa.2020.114063

Ahmadov, E. Y. (2023). Comparative analysis of K-Means and Fuzzy C-Means algorithms on demographic data using the PCA method. Problems of Information Technology, 14(1), 15–22. https://doi.org/10.25045/jpit.v14.i1.03

Azrahwati, Nusrang, M., Aidid, M. K., & Rais, Z. (2022). K-Means cluster analysis for grouping districts in South Sulawesi province based on village potential. ARRUS Journal of Mathematics and Applied Science, 2(2), 73–82. https://doi.org/10.35877/mathscience739

Badan Pusat Statistik. (2019, May 9). Indeks pembangunan desa 2018. https://www.bps.go.id/id/publication/2019/05/09/4edae4bd6c18d24b1b4273fe/indeks-pembangunan-desa-2018.html

Badan Pusat Statistik. (2022, March 24). Statistik potensi desa Indonesia 2021. https://www.bps.go.id/id/publication/2022/03/24/ceab4ec9f942b1a4fdf4cd08/statistik-potensi-desa-indonesia-2021.html

Bieber, M., Verhagen, W. J. C., Cosson, F., & Santos, B. F. (2023). Generic diagnostic framework for anomaly detection—Application in satellite and spacecraft systems. Aerospace, 10(8), 1–24. https://doi.org/10.3390/aerospace10080673

Choudhary, B., & Saxena, V. (2023). Fuzzy C-Mean technique for accessing large database of banking sector. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 263–271.

Chrisinta, D., Sumertajaya, I. M., & Indahwati. (2020). Evaluasi kinerja metode cluster ensemble dan latent class clustering pada peubah campuran. Indonesian Journal of Statistics and Its Applications, 4(3), 448–461.

Hassan, A. A. H., Shah, W. M., Othman, M. F. I., & Hassan, H. A. H. (2020). Evaluate the performance of K-Means and the Fuzzy C-Means algorithms to formation balanced clusters in wireless sensor networks. International Journal of Electrical and Computer Engineering, 10(2), 1515–1523. https://doi.org/10.11591/ijece.v10i2.pp1515-1523

Hennig, C. (2019). Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas & J. R. Bozeman (Eds.), Data analysis and applications 1: Clustering and regression, modeling-estimating, forecasting and data mining. Wiley. https://doi.org/10.1002/9781119597568.ch1

Kenger, O. N., Kenger, Z. D., Ozceylan, E., & Mrugalska, B. (2023). Clustering of cities based on their smart performances: A comparative approach of Fuzzy C-Means, K-Means, and K-Medoids. IEEE Access, 11, 134446–134459. https://doi.org/10.1109/ACCESS.2023.3333753

Mahmudi, Goejantoro, R., & Amijaya, F. D. T. (2021). Perbandingan metode C-Means dan Fuzzy C-Means pada pengelompokan kabupaten/kota di Kalimantan berdasarkan indikator IPM tahun 2019. Jurnal EKSPONENSIAL, 12(2), 193–200. https://doi.org/10.30872/eksponensial.v12i2.814

Nowak-Brzezińska, A., & Łazarz, W. (2021). Qualitative data clustering to detect outliers. Entropy, 23(7), 1–27. https://doi.org/10.3390/e23070869

Oti, E. U., Olusola, M. O., Eze, F. C., & Enogwe, S. U. (2021). Comprehensive review of K-Means clustering algorithms. International Journal of Advances in Scientific Research and Engineering, 7(8), 64–68. https://doi.org/10.31695/ijasre.2021.34050

Singh, P., Rathee, N., Sharda, S., & Kumar, S. (2023). Comparative study of rough set-based FCM and K-Means clustering for tumor segmentation from brain MRI images. Revue d’Intelligence Artificielle, 37(4), 921–927. https://doi.org/10.18280/ria.370412

Supandi, A., Saefuddin, A., & Sulvianti, I. D. (2020). Two step cluster application to classify villages in Kabupaten Madiun based on village potential data. Xplore: Journal of Statistics, 10(1), 12–26.

Wang, H. Y., Wang, J. S., & Wang, G. (2021). Combination evaluation method of Fuzzy C-Mean clustering validity based on hybrid weighted strategy. IEEE Access, 9, 27239–27261. https://doi.org/10.1109/ACCESS.2021.3058264

Zhou, K., & Yang, S. (2019). Fuzzifier selection in Fuzzy C-Means from cluster size distribution perspective. Informatica, 30(3), 613–628. https://doi.org/10.15388/informatica.2019.221

Zhou, S., Li, D., Zhang, Z., & Ping, R. (2021). A new membership scaling Fuzzy C-Means clustering algorithm. IEEE Transactions on Fuzzy Systems, 29(9), 2810–2818. https://doi.org/10.1109/TFUZZ.2020.3003441