Variable Selection in Clustering for Sanitation Access Analysis in East Java Supporting SDG 6
DOI:
https://doi.org/10.21512/emacsjournal.v8i1.14729Keywords:
Clustering, Variable Selection, Sanitation Access, SDG 6Abstract
To have sanitation we need to think about a few things to make people healthy and help the world be a better place. This study is trying to figure out how people in East Java get to use sanitation. We are looking at an important things that help us understand how people use sanitation. We used a method called clustering to see how different cities and districts in East Java are doing. This study utilized a set of six variables, encompassing the five pillars of community-based total sanitation (STBM). The variables employed following the selection process include awareness of open defecation (SBS), awareness of hand washing with soap (CPTM), and drinking water and food management (PAMMRT). The resulting in this study has three distinct clusters, each reflecting different levels of sanitation across cities and districts in East Java. However, the clustering is important to recognize that the excluded variables maintain considerable value as indicators established by the government. Furthermore, to its capacity to implement the variable selection method in the context of clustering, it is anticipated that this research will serve as a valuable resource for policymakers, providing them with a framework to prioritize specific areas in their efforts to enhance sanitation access for the purpose of achieving sustainable development.
References
Buitrago-Boret, S. E., Martínez-Rivas, R., Florez-Diaz, J., Mijares-Seminario, R., & Rincón, E. (2023). Using cluster analysis on municipal statistical data to configure public policies about Water, Sanitation and Hygiene in Venezuela. arXiv preprint arXiv:2301.12604.
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing, 20(1), 270-281.
Gogebakan, M. (2021). A novel approach for Gaussian mixture model clustering based on soft computing method. IEEE Access, 9, 159987-160003.
Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980–2019). Atmospheric pollution research, 11(1), 40-56.
Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2023). Handbook of Cluster Analysis. CRC Press.
Lu, Z., & Lou, W. (2023). Bayesian approaches to variable selection in mixture models with application to disease clustering. Journal of Applied Statistics, 50(2), 387-407.
Maugis, C., Celeux, G., & Martin-Magniette, M. L. (2009). Variable selection for clustering with Gaussian mixture models. Biometrics, 65(3), 701-709.
Pereira, M. A., & Marques, R. C. (2021). Sustainable water and sanitation for all: are we there yet?. Water Research, 207, 117765.
Purnama, M. D. (2023). Average Linkage-based Agglomerative Hierarchical Clustering terhadap Indikator Pembangunan Ekonomi Jawa Timur 2022. Jurnal Sains dan Seni ITS, 12(6), D477-D482.
Purnama, M. D. (2025). Cluster Analysis of Highest Education Completed in East Java Province with Spherical K-Means Method. Parameter: Journal of Statistics, 5(1), 61-67.
Purnama, M. D., & Sofro, A. Y. (2025). Implementation of agglomerative nesting and divisive analysis in East Java criminality rate hierarchical clustering. In AIP Conference Proceedings (Vol. 3316, No. 1, p. 040001). AIP Publishing LLC.
Redivo, E., Nguyen, H. D., & Gupta, M. (2020). Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions. Computational Statistics & Data Analysis, 152, 107040.
Saraçlı, S., & Akşit, M. (2022). Comparison of hierarchic clustering methods with cophenetic correlation coefficient in big data. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 22(3), 552-559.
Tosunoglu, B. A., & Kocak, C. (2023). Feature selection for clustering and classification based attack detection systems in vehicular ad-hoc networks. Microprocessors and Microsystems, 104808.
United Nations. (2024). The Sustainable Development Goals Report 2024. United Nations Publications.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mohammad Dian Purnama

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)


