Forecasting Food Prices in East Java Using Stacking Ensemble Learning via K-MEANS

Authors

  • Aviolla Terza Damaliana Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Amri Muhaimin Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Nabilah Selayanti Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Shafira Amanda Putri Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur
  • Muhammad Nasrudin Faculty of Computer Science, Universitas Pembangunan Nasional "Veteran" Jawa Timur

Keywords:

clustering, ensemble, food commodity, price, time-series

Abstract

Food commodities are essential in developing countries, such as Indonesia. The government rules the food commodity prices in every province. Yet, somehow, there are some issues in certain provinces. Data science and statistics techniques can help the government control food commodity prices. The proposed method applied to predict the commodity price index in East Java is the STACKEL K-MEANS method. This proposed method is a collaborative framework that utilizes cluster analysis and stacking ensemble learning to predict data. Cluster analysis is performed first, using the distance that suits time series data, which is Dynamic Time Warping. Two clusters are formed from each commodity (Rice, Oil, and Flour). Then, the stacking model consists of a base learner and a meta learner. The base learner models used are Ridge Regression, Random Forest, and Support Vector Regression, while the meta learner is Light Gradient Boosting Method. To optimize the parameter, we used a grid search. Following the evaluation process, we compare the proposed method with auto ARIMA from Python. In training and testing data, The proposed method yields superior results to the ARIMA model across all three error metrics: MAPE, MAE, and RMSE. The following scores for flour commodities are 0.042% compared to 0.328%, 4.715 compared to 37.57, and 6.34 compared to 523.99. For rice commodities, the scores are 0.261% compared to 0.392%, 31.585 compared to 48.142, and 41.92 compared to 56.068 For oil commodities, the scores are 0.185% compared to 0.250%, 33.02 compared to 47.571, and 39.35 compared to 56.060.

Dimensions

References

Amatullah, F. F., Ilmani, E. A., Fitrianto, A., Erfiani, & Jumansyah, L. M. R. D. (2025). Clustering time series forecasting model for grouping provinces in Indonesia based on granulated sugar prices. Journal of Applied Informatics and Computing (JAIC), 9(1), 121–130. https://doi.org/10.30871/jaic.v9i1.8840

Cuturi, M., & Blondel, M. (2017). Soft-DTW: a differentiable loss function for time-series. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 894–903. https://doi.org/10.48550/arXiv.1703.01541

Farisi, O. I. R., Jannah, N., & Insania, R. (2022). Prediksi Harga Komoditas Pangan di Indonesia Menggunakan Backpropagation. COREAI Jurnal Kecerdasan Buatan, Komputasi Dan Teknologi Informasi, 3(1), 91–101. https://doi.org/10.33650/coreai.v3i1.4282

Folgado, D., Barandas, M., Matias, R., Martins, R., Carvalho, M., & Gamboa, H. (2018). Time alignment measurement for time series. Pattern Recognition, 81, 268–279. https://doi.org/10.1016/j.patcog.2018.04.003

Hasibuan, L. S., & Novialdi, Y. (2022). Prediksi harga minyak goreng curah dan kemasan menggunakan algoritme long short-term memory (LSTM). JIKA: Jurnal Ilmu Komputer Dan Aplikasinya, 9(2), 149–157. https://doi.org/10.29244/jika.9.2.149-157

Hassani, H., Marvian, L., Yarmohammadi, M., & Yeganegi, M. R. (2024). Unraveling time series dynamics: Evaluating partial autocorrelation function distribution and its implications. Mathematical and Computational Applications, 29(4), 58. https://doi.org/10.3390/mca29040058

Hegg, J. C., & Kennedy, B. P. (2021). Let’s do the time warp again: Non-linear time series matching as a tool for sequentially structured data in ecology. Ecosphere, 12(9), e03742. https://doi.org/10.1002/ecs2.3742

Herrmann, M., Tan, C. W., & Webb, G. I. (2023). Parameterizing the cost function of dynamic time warping with application to time series classification. Data Mining and Knowledge Discovery, 37(5), 2024–2045. https://doi.org/10.1007/s10618-023-00926-8

Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development, 15(14), 5481–5487. https://doi.org/10.5194/gmd-15-5481-2022

Ismail, W. N., Alsalamah, H. A., & Mohamed, E. (2023). GA-stacking: A new stacking-based ensemble learning method to forecast the COVID-19 outbreak. Computers, Materials and Continua, 74(2), 3945–3976. https://doi.org/10.32604/cmc.2023.031194

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Moving beyond linearity. In G. James, D. Witten, T. Hastie, & R. Tibshirani (Eds.), An introduction to statistical learning: With applications in R (pp. 265–301). Springer. https://doi.org/10.1007/978-1-4614-7138-7_7

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. V. N. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. https://api.semanticscholar.org/CorpusID:3815895

Khan, W., Walker, S., & Zeiler, W. (2022). Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy, 240, 122812. https://doi.org/10.1016/j.energy.2021.122812

Kumar, S., Pant, M., & Nagar, A. (2024). Forecasting the sugarcane yields based on meteorological data through ensemble learning. IEEE Access, 12, 176539–176553. https://doi.org/10.1109/ACCESS.2024.3502547

Kwon, H., Park, J., & Lee, Y. (2019). Stacking ensemble technique for classifying breast cancer. Healthcare Informatics Research, 25(4), 283–288. https://doi.org/10.4258/hir.2019.25.4.283

Li, Q., Zhang, X., Ma, T., Liu, D., Wang, H., & Hu, W. (2022). A multi-step ahead photovoltaic power forecasting model based on TimeGAN, soft DTW-based K-medoids clustering, and a CNN-GRU hybrid neural network. Energy Reports, 8, 10346–10362. https://doi.org/10.1016/j.egyr.2022.08.180

Lukman, A. F., & Olatunji, A. (2018). Newly proposed estimator for ridge parameter: An application to the Nigerian economy. Pakistan Journal of Statistics, 34(2), 91–98. https://api.semanticscholar.org/CorpusID:172134763

Mandal, U., Chakraborty, A., Mahato, P., & Das, G. (2023). LinVec: A stacked ensemble machine learning architecture for analysis and forecasting of time-series data. Indian Journal of Science and Technology, 16(8), 570–582. https://doi.org/10.17485/IJST/v16i8.2197

Mardianto, M. F. F., Suliyanto, Effendy, F., Cahyasari, A. D., Purwoko, C. F. F., Aliffia, N., & Simamora, A. N. M. B. (2023, August). Mapping regencies and cities in East Java related food potential using the K-means method. The 8th International Conference and Workshop on Basic and Applied Science (ICOWOBAS) 2021. https://doi.org/10.1063/5.0103807

Maulidya, A., Sitorus, Z., Siahaan, A. P. U., & Iqbal, M. (2024). Analysis Of Increasing Student Service Satisfaction Using K-Means Clustering Algorithm and Gaussian Mixture Models (GMM). International Journal of Computer Sciences and Mathematics Engineering, 1(1), 29–35. https://doi.org/10.61306/ijecom.v3i1.62

Muhaimin, A., Prastyo, D. D., & Lu, H.-S. H. (2021). Forecasting with recurrent neural network in intermittent demand data. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 802–809. https://doi.org/10.1109/Confluence51648.2021.9376880

Pavlou, M., Omar, R. Z., & Ambler, G. (2024). Penalized regression methods with modified cross-validation and bootstrap tuning produce better prediction models. Biometrical Journal, 66(5), e202300245. https://doi.org/10.1002/bimj.202300245

Putri Z, R. W., Al Maududi, R., & Hartuti, P. M. (2024). Peramalan harga bahan pangan menggunakan fuzzy time series. Journal of Science and Technology, 4(2), 177–188. https://doi.org/10.15548/jostech.v4i2.9728

Renju, K., & Brunda, V. (2024). Optimizing crop yield prediction through multiple models: An ensemble stacking approach. International Journal of Data Informatics and Intelligent Computing, 3(2), 52–58. https://doi.org/10.59461/ijdiic.v3i2.120

Sardá-Espinosa, A. (2019). Time-series clustering in R using the dtwclust package. The R Journal, 11(1), 22–43. https://doi.org/10.32614/RJ-2019-023

Shahapure, K. R., & Nicholas, C. (2020). Cluster quality analysis using silhouette score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 747–748. https://doi.org/10.1109/DSAA49011.2020.00096

Šťastný, T., Koudelka, J., Bílková, D., & Marek, L. (2022). Clustering and modelling of the top 30 cryptocurrency prices using dynamic time warping and machine learning methods. Mathematics, 10(19), 3672. https://doi.org/10.3390/math10193672

Suresh, K., Meghana, J., & Pooja, M. E. (2021). Predicting the e-learners learning style by using support vector regression technique. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 350–355. https://doi.org/10.1109/ICAIS50930.2021.9396018

Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & Melo de Sales, L. (2021). Implementation of stacking based ARIMA model for prediction of COVID-19 cases in India. Journal of Biomedical Informatics, 121. https://doi.org/10.1016/j.jbi.2021.103887

Talekar, B. (2020). A detailed review on decision tree and random forest. Bioscience Biotechnology Research Communications, 13(14), 245–248. https://doi.org/10.21786/bbrc/13.14/57

Wang, J., Wang, Z., Li, X., & Zhou, H. (2022). Artificial bee colony-based combination approach to forecasting agricultural commodity prices. International Journal of Forecasting, 38(1), 21–34. https://doi.org/10.1016/j.ijforecast.2019.08.006

Xu, M., Garg, S., Milford, M., & Gould, S. (2023). Deep declarative dynamic time warping for end-to-end learning of alignment paths. Proceedings of the International Conference on Learning Representations (ICLR).

Yohansa, M., Notodiputro, K. A., & Erfiani. (2022). Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta. ComTech: Computer, Mathematics and Engineering Applications, 13(2), 63–73. https://doi.org/10.21512/comtech.v13i2.7413

Zen, M. A., Wahyuningsih, S., & Dani, A. T. R. (2022). Aplikasi pendekatan agglomerative hierarchical time series clustering untuk peramalan data harga minyak goreng di Indonesia. Seminar Nasional Official Statistics, 293–302. https://doi.org/10.34123/semnasoffstat.v2022i1.1394

Published

2026-02-02

How to Cite

Aviolla Terza Damaliana, Amri Muhaimin, Nabilah Selayanti, Shafira Amanda Putri, & Muhammad Nasrudin. (2026). Forecasting Food Prices in East Java Using Stacking Ensemble Learning via K-MEANS. ComTech: Computer, Mathematics and Engineering Applications, 17(1). Retrieved from https://journal.binus.ac.id/index.php/comtech/article/view/14218
Abstract 4  .