Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta

Authors

  • Meicheil Yohansa IPB University
  • Khairil Anwar Notodiputro IPB University
  • Erfiani IPB University

DOI:

https://doi.org/10.21512/comtech.v13i2.7413

Keywords:

Dynamic Time Warping (DTW), time series, Covid-19

Abstract

The number of positive cases of Covid-19 in DKI Jakarta has contributed to the national issues, reaching 25% of the total cases in Indonesia. The research examined and modeled the distribution pattern of Covid-19 positive cases in DKI Jakarta based on 44 districts spreading over six administrative areas. The data were regarding positive Covid-19 cases in DKI Jakarta for the past year, from April 2020 to April 2021. The research related to the pattern of positive Covid-19 distribution in 44 districts was carried out by time series clustering through Dynamic Time Warping (DTW) distances and agglomerative hierarchical methods. Then, the effectiveness of the clustering process is evaluated by comparing the predicted value of Covid-19 cases between clustering and non-clustering forecast results at the city level for the next 14 days through the Autoregressive Integrated Moving Average (ARIMA) model. The results group 44 districts into 6 optimal clusters based on the pattern of positive cases of Covid-19 in each district. The highest distribution rate is in cluster A, and the lowest is in cluster F. Geographical characteristics are also indicated by clusters A, B, E, and F. Then, the results show that the Mean Average Percentage Error (MAPE) value of the clustering model ranges from 16% to 20%. The difference between MAPE values to the non-clustering model implies that the forecasting accuracy is not far apart, which is in the round of 5%−6%.

Dimensions

Plum Analytics

Author Biographies

Meicheil Yohansa, IPB University

Department of Statistics, Faculty of Mathematics and Natural Sciences

Khairil Anwar Notodiputro, IPB University

Department of Statistics, Faculty of Mathematics and Natural Sciences

Erfiani, IPB University

Department of Statistics, Faculty of Mathematics and Natural Sciences

References

Atique, S., Noureen, S., Roy, V., Subburaj, V., Bayne, S., & Macfie, J. (2019). Forecasting of total daily solar energy generation using ARIMA: A case study. In 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 114–119). IEEE. https://doi.org/10.1109/CCWC.2019.8666481

Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., & Ciccozzi, M. (2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief, 29(April), 1–4. https://doi.org/10.1016/j.dib.2020.105340

Carvalho, P. R., Munita, C. S., & Lapolli, A. L. (2019). Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient. Brazilian Journal of Radiation Sciences, 7(2A), 1–14. https://doi.org/10.15392/bjrs.v7i2a.668

Dong, G., & Liu, H. (Eds). (2018). Feature engineering for machine learning and data analytics. CRC Press.

Fontes, C. H., Santos, I. C., Embiruçu, M., & Aragão, P. (2021). Pattern reconciliation: A new approach involving constrained clustering of time series. Computers & Chemical Engineering, 145(February), 1–23. https://doi.org/10.1016/j.compchemeng.2020.107169

Fransiska, H. (2021). Clustering provinces in Indonesia based on daily COVID-19 cases. Journal of Physics: Conference Series, 1863, 1–9. https://doi.org/10.1088/1742-6596/1863/1/012015

Hämäläinen, J., Jauhiainen, S., & Kärkkäinen, T. (2017). Comparison of internal clustering validation indices for prototype-based clustering. Algorithms, 10(3), 1–14. https://doi.org/10.3390/a10030105

Han, T., Peng, Q., Zhu, Z., Shen, Y., Huang, H., & Abid, N. N. (2020). A pattern representation of stock time series based on DTW. Physica A: Statistical Mechanics and Its Applications, 550(July), 1–12. https://doi.org/10.1016/j.physa.2020.124161

Johns Hopkins University. (2022). Coronavirus resource center. Retrieved from https://coronavirus.jhu.edu/

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. John Wiley & Sons, Inc.

Montgomery, D. C., Jennings, C. L., & Kulahci, M. (2016). Introduction to time series analysis and forecasting. Wiley.

Novidianto, R., & Dani, A. T. R. (2020). Analisis klaster kasus aktif COVID-19 menurut provinsi di Indonesia berdasarkan data deret waktu. Jurnal Aplikasi Statistika dan Komputasi Statistik, 12(2), 15–24. https://doi.org/10.34123/jurnalasks.v12i2.280

Pangaribuan, M. T., & Munandar, A. I. (2021). Kebijakan pemerintah DKI Jakarta menangani pandemi COVID-19. Government: Jurnal Ilmu Pemerintahan, 14(1), 1–9.

Phuenaree, B., & Sanorsap, S. (2017). An interval estimation of Pearson’s correlation coefficient by bootstrap methods. Asian Journal of Applied Sciences, 05(03), 623–627.

Puspita, P. E., & Zulkarnain. (2020). A practical evaluation of dynamic time warping in financial time series clustering. In 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS) (pp. 61–68). IEEE. https://doi.org/10.1109/ICACSIS51025.2020.9263123

Řezanková, H. (2018). Different approaches to the silhouette coefficient calculation in cluster evaluation. In 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics (pp. 1–10).

Sammour, M., Othman, Z. A., Rus, A. M. M., & Mohamed, R. (2019). Modified dynamic time warping for hierarchical clustering. International Journal on Advanced Science, Engineering and Information Technology, 9(5), 1481–1487. https://doi.org/10.18517/ijaseit.9.5.7079

Siami-Namini, S., Tavakoli, N., & Siami Namin, A. (2018). A comparison of ARIMA and LSTM in forecasting time series. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1394–1401). IEEE. https://doi.org/10.1109/ICMLA.2018.00227

Solichin, A., & Khairunnisa, K. (2020). Klasterisasi persebaran virus Corona (COVID-19) di DKI Jakarta menggunakan metode k-means. Fountain of Informatics Journal, 5(2), 52–59.

Sulastri, S., Usman, L., & Syafitri, U. D. (2021). K-prototypes algorithm for clustering schools based on the student admission data in IPB University. Indonesian Journal of Statistics and Its Applications. 5(2), 228–242. https://doi.org/10.29244/ijsa.v5i2p228-242

Wan, Y., Chen, X. L., & Shi, Y. (2017). Adaptive cost dynamic time warping distance in time series analysis for classification. Journal of Computational and Applied Mathematics, 319(August), 514–520. https://doi.org/10.1016/j.cam.2017.01.004

Wang, W., Lyu, G., Shi, Y., & Liang, X. (2018). Time series clustering based on dynamic time warping. In 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) (pp. 487–490). https://doi.org/10.1109/ICSESS.2018.8663857

Wiguna, H., Nugraha, Y., Rizka, F., Andika, A., Kanggrawan, J. I., & Suherman, A. L. (2020). Kebijakan berbasis data: Analisis dan prediksi penyebaran COVID-19 di Jakarta dengan metode Autoregressive Integrated Moving Average (ARIMA). Jurnal Sistem Cerdas, 3(2), 74–83.

Yati, E., Devianto, D., & Asdi, Y. (2013). Transformasi Box-Cox pada analisis regresi linier sederhana. Jurnal Matematika UNAND, 2(2), 115–122. https://doi.org/10.25077/jmu.2.2.115-122.2013

Downloads

Published

2022-11-23

Issue

Section

Articles
Abstract 115  .
PDF downloaded 58  .