Discovering the Optimal Number of Crime Cluster Using Elbow, Silhouette, Gap Statistics, and NbClust Methods
Keywords:Crime Clustering, Elbow method, Silhouette method, Gap Statistics method, NbClust method
In recent years, crime has been critical to be analyzed and tracked to identify the trends and associations with crime patterns and activities. Generally, the analysis is conducted to discover the area or location where the crime is high or low by using different clustering methods, including k-means clustering. Even though the k-means algorithm is commonly used in clustering techniques because of its simplicity, convergence speed, and high efficiency, finding the optimal number of clusters is difficult. Determining the correct clusters for crime analysis is critical to enhancing current crime resolution rates, avoiding future incidents, spending less time for new officers, and increasing activity quality. To address the problem of estimating the number of clusters in the crime domain without the interference of humans, the research carried out Elbow, Silhouette, Gap Statistics, and NbClust methods on datasets of Major Crime Indicators (MCI) in 2014−2019. Several stages were performed to process the crime datasets: data understanding, data preparation, cluster modelling, and cluster validation. The first two phases were performed in the R Studio environment and the last two stages in Azure Studio. From the experimental result, Elbow, Silhouette, and NbClust methods suggest a similar number of optimum clusters that is two. After validating the result using the average Silhouette method, the research considers two clusters as the best clusters for the dataset. The visualization result of Silhouette method displays the value of 0,73. Then, the observation of the data is well-grouped. It is placed in the correct group.
Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data (pp. 25‒71). Springer.
Bokde, K. A., Kakade, T. P., Tumsare, D. S., & Wadhai, C. G. (2018). Crime detection technique using data mining and K-means. International Journal of Engineering Research & Technology (IJERT), 7(02), 223‒226.
Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2014). NbClust: An R package for determining the relevant number of clusters in a data set. Journal of Statistical Software, 61(6), 1‒36.
Hajela, G., Chawla, M., & Rasool, A. (2020). A clustering based hotspot identification approach for crime prediction. Procedia Computer Science, 167, 1462–1470.
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques. Elsevier.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
Joshi, A., Sabitha, A. S., & Choudhury, T. (2017). Crime analysis using K-means clustering. In 2017 3rd International Conference on Computational Intelligence and Networks (CINE) (pp. 33‒39). IEEE.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. John Wiley & Sons.
Kingrani, S. K., Levene, M., & Zhang, D. (2018). Estimating the number of clusters using diversity. Artificial Intelligence Research, 7(1), 15–22.
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13.
Li, X. Y., Yu, L. Y., Lei, H., & Tang, X. F. (2017). The parallel implementation and application of an improved K-means algorithm. Journal of University of Electronic Science and Technology of China, 46(1), 61–68.
Maheswari, K. (2019). Finding best possible number of clusters using K-means algorithm. International Journal of Engineering and Advanced Technology, 9(1S3), 533–538.
Nath, S. V. (2006). Crime pattern detection using data mining. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops (pp. 41–44). IEEE.
Palacio-Niño, J. O., & Berzal, F. (2019). Evaluation metrics for unsupervised learning algorithms. ArXiv Preprint ArXiv:1905.05667.
Prabakaran, S., & Mitra, S. (2018). Survey of analysis of crime detection techniques using data mining and machine learning. Journal of Physics: Conference Series, 1000, 1‒10.
Saleh, M. A., & Khan, I. R. (2019). Crime data analysis in Python using K-means clustering. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 7(IV), 151–155.
Subbalakshmi, C., Krishna, G. R., Rao, S. K. M., & Rao, P. V. (2015). A method to find optimum number of clusters based on fuzzy silhouette on dynamic data set. Procedia Computer Science, 46, 346–353.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411–423.
Toronto Police Service Public Safety Data Portal. (n.d.) Major Crime Indicators. Retrieved from https://data.torontopolice.on.ca/datasets/TorontoPS::major-crime-indicators-1/about
Xiao, Y., & Yu, J. (2007). Gap statistic and K-means algorithm. J. Comput. Res. Dev, 44, 176–180.
Yuan, C., & Yang, H. (2019). Research on K-value selection method of K-means clustering algorithm. J: Multidisciplinary Scientific Journal, 2(2), 226‒235.
Copyright (c) 2022 Noviyanti Sagala, Alexander A S Gunawan
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: