Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta
Keywords:Dynamic Time Warping (DTW), time series, Covid-19
The number of positive cases of Covid-19 in DKI Jakarta has contributed to the national issues, reaching 25% of the total cases in Indonesia. The research examined and modeled the distribution pattern of Covid-19 positive cases in DKI Jakarta based on 44 districts spreading over six administrative areas. The data were regarding positive Covid-19 cases in DKI Jakarta for the past year, from April 2020 to April 2021. The research related to the pattern of positive Covid-19 distribution in 44 districts was carried out by time series clustering through Dynamic Time Warping (DTW) distances and agglomerative hierarchical methods. Then, the effectiveness of the clustering process is evaluated by comparing the predicted value of Covid-19 cases between clustering and non-clustering forecast results at the city level for the next 14 days through the Autoregressive Integrated Moving Average (ARIMA) model. The results group 44 districts into 6 optimal clusters based on the pattern of positive cases of Covid-19 in each district. The highest distribution rate is in cluster A, and the lowest is in cluster F. Geographical characteristics are also indicated by clusters A, B, E, and F. Then, the results show that the Mean Average Percentage Error (MAPE) value of the clustering model ranges from 16% to 20%. The difference between MAPE values to the non-clustering model implies that the forecasting accuracy is not far apart, which is in the round of 5%−6%.
Atique, S., Noureen, S., Roy, V., Subburaj, V., Bayne, S., & Macfie, J. (2019). Forecasting of total daily solar energy generation using ARIMA: A case study. In 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 114–119). IEEE. https://doi.org/10.1109/CCWC.2019.8666481
Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., & Ciccozzi, M. (2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief, 29(April), 1–4. https://doi.org/10.1016/j.dib.2020.105340
Carvalho, P. R., Munita, C. S., & Lapolli, A. L. (2019). Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient. Brazilian Journal of Radiation Sciences, 7(2A), 1–14. https://doi.org/10.15392/bjrs.v7i2a.668
Dong, G., & Liu, H. (Eds). (2018). Feature engineering for machine learning and data analytics. CRC Press.
Fontes, C. H., Santos, I. C., Embiruçu, M., & Aragão, P. (2021). Pattern reconciliation: A new approach involving constrained clustering of time series. Computers & Chemical Engineering, 145(February), 1–23. https://doi.org/10.1016/j.compchemeng.2020.107169
Fransiska, H. (2021). Clustering provinces in Indonesia based on daily COVID-19 cases. Journal of Physics: Conference Series, 1863, 1–9. https://doi.org/10.1088/1742-6596/1863/1/012015
Hämäläinen, J., Jauhiainen, S., & Kärkkäinen, T. (2017). Comparison of internal clustering validation indices for prototype-based clustering. Algorithms, 10(3), 1–14. https://doi.org/10.3390/a10030105
Han, T., Peng, Q., Zhu, Z., Shen, Y., Huang, H., & Abid, N. N. (2020). A pattern representation of stock time series based on DTW. Physica A: Statistical Mechanics and Its Applications, 550(July), 1–12. https://doi.org/10.1016/j.physa.2020.124161
Johns Hopkins University. (2022). Coronavirus resource center. Retrieved from https://coronavirus.jhu.edu/
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. John Wiley & Sons, Inc.
Montgomery, D. C., Jennings, C. L., & Kulahci, M. (2016). Introduction to time series analysis and forecasting. Wiley.
Novidianto, R., & Dani, A. T. R. (2020). Analisis klaster kasus aktif COVID-19 menurut provinsi di Indonesia berdasarkan data deret waktu. Jurnal Aplikasi Statistika dan Komputasi Statistik, 12(2), 15–24. https://doi.org/10.34123/jurnalasks.v12i2.280
Pangaribuan, M. T., & Munandar, A. I. (2021). Kebijakan pemerintah DKI Jakarta menangani pandemi COVID-19. Government: Jurnal Ilmu Pemerintahan, 14(1), 1–9.
Phuenaree, B., & Sanorsap, S. (2017). An interval estimation of Pearson’s correlation coefficient by bootstrap methods. Asian Journal of Applied Sciences, 05(03), 623–627.
Puspita, P. E., & Zulkarnain. (2020). A practical evaluation of dynamic time warping in financial time series clustering. In 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS) (pp. 61–68). IEEE. https://doi.org/10.1109/ICACSIS51025.2020.9263123
Řezanková, H. (2018). Different approaches to the silhouette coefficient calculation in cluster evaluation. In 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics (pp. 1–10).
Sammour, M., Othman, Z. A., Rus, A. M. M., & Mohamed, R. (2019). Modified dynamic time warping for hierarchical clustering. International Journal on Advanced Science, Engineering and Information Technology, 9(5), 1481–1487. https://doi.org/10.18517/ijaseit.9.5.7079
Siami-Namini, S., Tavakoli, N., & Siami Namin, A. (2018). A comparison of ARIMA and LSTM in forecasting time series. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1394–1401). IEEE. https://doi.org/10.1109/ICMLA.2018.00227
Solichin, A., & Khairunnisa, K. (2020). Klasterisasi persebaran virus Corona (COVID-19) di DKI Jakarta menggunakan metode k-means. Fountain of Informatics Journal, 5(2), 52–59.
Sulastri, S., Usman, L., & Syafitri, U. D. (2021). K-prototypes algorithm for clustering schools based on the student admission data in IPB University. Indonesian Journal of Statistics and Its Applications. 5(2), 228–242. https://doi.org/10.29244/ijsa.v5i2p228-242
Wan, Y., Chen, X. L., & Shi, Y. (2017). Adaptive cost dynamic time warping distance in time series analysis for classification. Journal of Computational and Applied Mathematics, 319(August), 514–520. https://doi.org/10.1016/j.cam.2017.01.004
Wang, W., Lyu, G., Shi, Y., & Liang, X. (2018). Time series clustering based on dynamic time warping. In 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) (pp. 487–490). https://doi.org/10.1109/ICSESS.2018.8663857
Wiguna, H., Nugraha, Y., Rizka, F., Andika, A., Kanggrawan, J. I., & Suherman, A. L. (2020). Kebijakan berbasis data: Analisis dan prediksi penyebaran COVID-19 di Jakarta dengan metode Autoregressive Integrated Moving Average (ARIMA). Jurnal Sistem Cerdas, 3(2), 74–83.
Yati, E., Devianto, D., & Asdi, Y. (2013). Transformasi Box-Cox pada analisis regresi linier sederhana. Jurnal Matematika UNAND, 2(2), 115–122. https://doi.org/10.25077/jmu.2.2.115-122.2013
Copyright (c) 2022 Meicheil Yohansa, Khairil Anwar Notodiputro, Erfiani
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: