Forecasting Food Prices in East Java Using Stacking Ensemble Learning via K-MEANS

Authors

  • Aviolla Terza Damaliana Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Amri Muhaimin Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Nabilah Selayanti Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Shafira Amanda Putri Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Muhammad Nasrudin Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur

DOI:

https://doi.org/10.21512/comtech.v17i1.14218

Keywords:

clustering, ensemble, food commodity, price, time-series

Abstract

Food commodities are essential in developing countries such as Indonesia, and the government regulates food commodity prices in every province. However, price instability issues persist in certain provinces, creating challenges for effective policy control. Data science and statistical techniques play an important role in supporting the government’s efforts to monitor and manage food commodity prices. This study proposes the Stackelberg-K-Means method to predict the commodity price index in East Java. The proposed method is a collaborative framework that combines cluster analysis and stacking ensemble learning for time-series prediction. Cluster analysis is conducted first using Dynamic Time Warping as the distance measure, which is suitable for time-series data, resulting in two clusters for each commodity: rice, oil, and flour. The stacking model consists of base learners and a meta-learner. The base learner models include Ridge Regression, Random Forest, and Support Vector Regression, while the meta-learner uses Light Gradient Boosting. Parameter optimization is performed using grid search, and the proposed method is evaluated against AutoARIMA implemented in Python using both training and testing data. The results show that the proposed method outperforms the ARIMA model across all three error metrics: MAPE, MAE, and RMSE. For flour commodities, the scores are 0.042% versus 0.328%, 4.715 versus 37.57, and 6.34 versus 523.99, respectively. For rice commodities, the scores are 0.261% compared to 0.392%, 31.585 compared to 48.142, and 41.92 compared to 56.068. For oil commodities, the scores are 0.185% compared to 0.250%, 33.02 compared to 47.571, and 39.35 compared to 56.060.

Dimensions

References

Amatullah, F. F., Ilmani, E. A., Fitrianto, A., Erfiani, & Jumansyah, L. M. R. D. (2025). Clustering time series forecasting model for grouping provinces in Indonesia based on granulated sugar prices. Journal of Applied Informatics and Computing (JAIC), 9(1), 121–130. https://doi.org/10.30871/jaic.v9i1.8840

Cuturi, M., & Blondel, M. (2017). Soft-DTW: a differentiable loss function for time-series. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 894–903. https://doi.org/10.48550/arXiv.1703.01541

Farisi, O. I. R., Jannah, N., & Insania, R. (2022). Prediksi Harga Komoditas Pangan di Indonesia Menggunakan Backpropagation. COREAI Jurnal Kecerdasan Buatan, Komputasi Dan Teknologi Informasi, 3(1), 91–101. https://doi.org/10.33650/coreai.v3i1.4282

Folgado, D., Barandas, M., Matias, R., Martins, R., Carvalho, M., & Gamboa, H. (2018). Time alignment measurement for time series. Pattern Recognition, 81, 268–279. https://doi.org/10.1016/j.patcog.2018.04.003

Hasibuan, L. S., & Novialdi, Y. (2022). Prediksi harga minyak goreng curah dan kemasan menggunakan algoritme long short-term memory (LSTM). JIKA: Jurnal Ilmu Komputer Dan Aplikasinya, 9(2), 149–157. https://doi.org/10.29244/jika.9.2.149-157

Hassani, H., Marvian, L., Yarmohammadi, M., & Yeganegi, M. R. (2024). Unraveling time series dynamics: Evaluating partial autocorrelation function distribution and its implications. Mathematical and Computational Applications, 29(4), 58. https://doi.org/10.3390/mca29040058

Hegg, J. C., & Kennedy, B. P. (2021). Let’s do the time warp again: Non-linear time series matching as a tool for sequentially structured data in ecology. Ecosphere, 12(9), e03742. https://doi.org/10.1002/ecs2.3742

Herrmann, M., Tan, C. W., & Webb, G. I. (2023). Parameterizing the cost function of dynamic time warping with application to time series classification. Data Mining and Knowledge Discovery, 37(5), 2024–2045. https://doi.org/10.1007/s10618-023-00926-8

Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development, 15(14), 5481–5487. https://doi.org/10.5194/gmd-15-5481-2022

Ismail, W. N., Alsalamah, H. A., & Mohamed, E. (2023). GA-stacking: A new stacking-based ensemble learning method to forecast the COVID-19 outbreak. Computers, Materials and Continua, 74(2), 3945–3976. https://doi.org/10.32604/cmc.2023.031194

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Moving beyond linearity. In G. James, D. Witten, T. Hastie, & R. Tibshirani (Eds.), An introduction to statistical learning: With applications in R (pp. 265–301). Springer. https://doi.org/10.1007/978-1-4614-7138-7_7

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. V. N. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. https://api.semanticscholar.org/CorpusID:3815895

Khan, W., Walker, S., & Zeiler, W. (2022). Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy, 240, 122812. https://doi.org/10.1016/j.energy.2021.122812

Kumar, S., Pant, M., & Nagar, A. (2024). Forecasting the sugarcane yields based on meteorological data through ensemble learning. IEEE Access, 12, 176539–176553. https://doi.org/10.1109/ACCESS.2024.3502547

Kwon, H., Park, J., & Lee, Y. (2019). Stacking ensemble technique for classifying breast cancer. Healthcare Informatics Research, 25(4), 283–288. https://doi.org/10.4258/hir.2019.25.4.283

Li, Q., Zhang, X., Ma, T., Liu, D., Wang, H., & Hu, W. (2022). A multi-step ahead photovoltaic power forecasting model based on TimeGAN, soft DTW-based K-medoids clustering, and a CNN-GRU hybrid neural network. Energy Reports, 8, 10346–10362. https://doi.org/10.1016/j.egyr.2022.08.180

Lukman, A. F., & Olatunji, A. (2018). Newly proposed estimator for ridge parameter: An application to the Nigerian economy. Pakistan Journal of Statistics, 34(2), 91–98. https://api.semanticscholar.org/CorpusID:172134763

Mandal, U., Chakraborty, A., Mahato, P., & Das, G. (2023). LinVec: A stacked ensemble machine learning architecture for analysis and forecasting of time-series data. Indian Journal of Science and Technology, 16(8), 570–582. https://doi.org/10.17485/IJST/v16i8.2197

Mardianto, M. F. F., Suliyanto, Effendy, F., Cahyasari, A. D., Purwoko, C. F. F., Aliffia, N., & Simamora, A. N. M. B. (2023, August). Mapping regencies and cities in East Java related food potential using the K-means method. The 8th International Conference and Workshop on Basic and Applied Science (ICOWOBAS) 2021. https://doi.org/10.1063/5.0103807

Maulidya, A., Sitorus, Z., Siahaan, A. P. U., & Iqbal, M. (2024). Analysis Of Increasing Student Service Satisfaction Using K-Means Clustering Algorithm and Gaussian Mixture Models (GMM). International Journal of Computer Sciences and Mathematics Engineering, 1(1), 29–35. https://doi.org/10.61306/ijecom.v3i1.62

Muhaimin, A., Prastyo, D. D., & Lu, H.-S. H. (2021). Forecasting with recurrent neural network in intermittent demand data. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 802–809. https://doi.org/10.1109/Confluence51648.2021.9376880

Pavlou, M., Omar, R. Z., & Ambler, G. (2024). Penalized regression methods with modified cross-validation and bootstrap tuning produce better prediction models. Biometrical Journal, 66(5), e202300245. https://doi.org/10.1002/bimj.202300245

Putri Z, R. W., Al Maududi, R., & Hartuti, P. M. (2024). Peramalan harga bahan pangan menggunakan fuzzy time series. Journal of Science and Technology, 4(2), 177–188. https://doi.org/10.15548/jostech.v4i2.9728

Renju, K., & Brunda, V. (2024). Optimizing crop yield prediction through multiple models: An ensemble stacking approach. International Journal of Data Informatics and Intelligent Computing, 3(2), 52–58. https://doi.org/10.59461/ijdiic.v3i2.120

Sardá-Espinosa, A. (2019). Time-series clustering in R using the dtwclust package. The R Journal, 11(1), 22–43. https://doi.org/10.32614/RJ-2019-023

Shahapure, K. R., & Nicholas, C. (2020). Cluster quality analysis using silhouette score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 747–748. https://doi.org/10.1109/DSAA49011.2020.00096

Šťastný, T., Koudelka, J., Bílková, D., & Marek, L. (2022). Clustering and modelling of the top 30 cryptocurrency prices using dynamic time warping and machine learning methods. Mathematics, 10(19), 3672. https://doi.org/10.3390/math10193672

Suresh, K., Meghana, J., & Pooja, M. E. (2021). Predicting the e-learners learning style by using support vector regression technique. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 350–355. https://doi.org/10.1109/ICAIS50930.2021.9396018

Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & Melo de Sales, L. (2021). Implementation of stacking based ARIMA model for prediction of COVID-19 cases in India. Journal of Biomedical Informatics, 121. https://doi.org/10.1016/j.jbi.2021.103887

Talekar, B. (2020). A detailed review on decision tree and random forest. Bioscience Biotechnology Research Communications, 13(14), 245–248. https://doi.org/10.21786/bbrc/13.14/57

Wang, J., Wang, Z., Li, X., & Zhou, H. (2022). Artificial bee colony-based combination approach to forecasting agricultural commodity prices. International Journal of Forecasting, 38(1), 21–34. https://doi.org/10.1016/j.ijforecast.2019.08.006

Xu, M., Garg, S., Milford, M., & Gould, S. (2023). Deep declarative dynamic time warping for end-to-end learning of alignment paths. Proceedings of the International Conference on Learning Representations (ICLR).

Yohansa, M., Notodiputro, K. A., & Erfiani. (2022). Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta. ComTech: Computer, Mathematics and Engineering Applications, 13(2), 63–73. https://doi.org/10.21512/comtech.v13i2.7413

Zen, M. A., Wahyuningsih, S., & Dani, A. T. R. (2022). Aplikasi pendekatan agglomerative hierarchical time series clustering untuk peramalan data harga minyak goreng di Indonesia. Seminar Nasional Official Statistics, 293–302. https://doi.org/10.34123/semnasoffstat.v2022i1.1394

Downloads

Published

2026-02-02

How to Cite

Aviolla Terza Damaliana, Amri Muhaimin, Nabilah Selayanti, Shafira Amanda Putri, & Muhammad Nasrudin. (2026). Forecasting Food Prices in East Java Using Stacking Ensemble Learning via K-MEANS. ComTech: Computer, Mathematics and Engineering Applications, 17(1). https://doi.org/10.21512/comtech.v17i1.14218
Abstract 92  .
PDF downloaded 10  .