Forecasting Food Prices in East Java Using Stacking Ensemble Learning via K-MEANS
DOI:
https://doi.org/10.21512/comtech.v17i1.14218Keywords:
clustering, ensemble, food commodity, price, time-seriesAbstract
Food commodities are essential in developing countries such as Indonesia, and the government regulates food commodity prices in every province. However, price instability issues persist in certain provinces, creating challenges for effective policy control. Data science and statistical techniques play an important role in supporting the government’s efforts to monitor and manage food commodity prices. This study proposes the Stackelberg-K-Means method to predict the commodity price index in East Java. The proposed method is a collaborative framework that combines cluster analysis and stacking ensemble learning for time-series prediction. Cluster analysis is conducted first using Dynamic Time Warping as the distance measure, which is suitable for time-series data, resulting in two clusters for each commodity: rice, oil, and flour. The stacking model consists of base learners and a meta-learner. The base learner models include Ridge Regression, Random Forest, and Support Vector Regression, while the meta-learner uses Light Gradient Boosting. Parameter optimization is performed using grid search, and the proposed method is evaluated against AutoARIMA implemented in Python using both training and testing data. The results show that the proposed method outperforms the ARIMA model across all three error metrics: MAPE, MAE, and RMSE. For flour commodities, the scores are 0.042% versus 0.328%, 4.715 versus 37.57, and 6.34 versus 523.99, respectively. For rice commodities, the scores are 0.261% compared to 0.392%, 31.585 compared to 48.142, and 41.92 compared to 56.068. For oil commodities, the scores are 0.185% compared to 0.250%, 33.02 compared to 47.571, and 39.35 compared to 56.060.
References
Amatullah, F. F., Ilmani, E. A., Fitrianto, A., Erfiani, & Jumansyah, L. M. R. D. (2025). Clustering time series forecasting model for grouping provinces in Indonesia based on granulated sugar prices. Journal of Applied Informatics and Computing (JAIC), 9(1), 121–130. https://doi.org/10.30871/jaic.v9i1.8840
Cuturi, M., & Blondel, M. (2017). Soft-DTW: a differentiable loss function for time-series. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 894–903. https://doi.org/10.48550/arXiv.1703.01541
Farisi, O. I. R., Jannah, N., & Insania, R. (2022). Prediksi Harga Komoditas Pangan di Indonesia Menggunakan Backpropagation. COREAI Jurnal Kecerdasan Buatan, Komputasi Dan Teknologi Informasi, 3(1), 91–101. https://doi.org/10.33650/coreai.v3i1.4282
Folgado, D., Barandas, M., Matias, R., Martins, R., Carvalho, M., & Gamboa, H. (2018). Time alignment measurement for time series. Pattern Recognition, 81, 268–279. https://doi.org/10.1016/j.patcog.2018.04.003
Hasibuan, L. S., & Novialdi, Y. (2022). Prediksi harga minyak goreng curah dan kemasan menggunakan algoritme long short-term memory (LSTM). JIKA: Jurnal Ilmu Komputer Dan Aplikasinya, 9(2), 149–157. https://doi.org/10.29244/jika.9.2.149-157
Hassani, H., Marvian, L., Yarmohammadi, M., & Yeganegi, M. R. (2024). Unraveling time series dynamics: Evaluating partial autocorrelation function distribution and its implications. Mathematical and Computational Applications, 29(4), 58. https://doi.org/10.3390/mca29040058
Hegg, J. C., & Kennedy, B. P. (2021). Let’s do the time warp again: Non-linear time series matching as a tool for sequentially structured data in ecology. Ecosphere, 12(9), e03742. https://doi.org/10.1002/ecs2.3742
Herrmann, M., Tan, C. W., & Webb, G. I. (2023). Parameterizing the cost function of dynamic time warping with application to time series classification. Data Mining and Knowledge Discovery, 37(5), 2024–2045. https://doi.org/10.1007/s10618-023-00926-8
Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development, 15(14), 5481–5487. https://doi.org/10.5194/gmd-15-5481-2022
Ismail, W. N., Alsalamah, H. A., & Mohamed, E. (2023). GA-stacking: A new stacking-based ensemble learning method to forecast the COVID-19 outbreak. Computers, Materials and Continua, 74(2), 3945–3976. https://doi.org/10.32604/cmc.2023.031194
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Moving beyond linearity. In G. James, D. Witten, T. Hastie, & R. Tibshirani (Eds.), An introduction to statistical learning: With applications in R (pp. 265–301). Springer. https://doi.org/10.1007/978-1-4614-7138-7_7
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. V. N. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. https://api.semanticscholar.org/CorpusID:3815895
Khan, W., Walker, S., & Zeiler, W. (2022). Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy, 240, 122812. https://doi.org/10.1016/j.energy.2021.122812
Kumar, S., Pant, M., & Nagar, A. (2024). Forecasting the sugarcane yields based on meteorological data through ensemble learning. IEEE Access, 12, 176539–176553. https://doi.org/10.1109/ACCESS.2024.3502547
Kwon, H., Park, J., & Lee, Y. (2019). Stacking ensemble technique for classifying breast cancer. Healthcare Informatics Research, 25(4), 283–288. https://doi.org/10.4258/hir.2019.25.4.283
Li, Q., Zhang, X., Ma, T., Liu, D., Wang, H., & Hu, W. (2022). A multi-step ahead photovoltaic power forecasting model based on TimeGAN, soft DTW-based K-medoids clustering, and a CNN-GRU hybrid neural network. Energy Reports, 8, 10346–10362. https://doi.org/10.1016/j.egyr.2022.08.180
Lukman, A. F., & Olatunji, A. (2018). Newly proposed estimator for ridge parameter: An application to the Nigerian economy. Pakistan Journal of Statistics, 34(2), 91–98. https://api.semanticscholar.org/CorpusID:172134763
Mandal, U., Chakraborty, A., Mahato, P., & Das, G. (2023). LinVec: A stacked ensemble machine learning architecture for analysis and forecasting of time-series data. Indian Journal of Science and Technology, 16(8), 570–582. https://doi.org/10.17485/IJST/v16i8.2197
Mardianto, M. F. F., Suliyanto, Effendy, F., Cahyasari, A. D., Purwoko, C. F. F., Aliffia, N., & Simamora, A. N. M. B. (2023, August). Mapping regencies and cities in East Java related food potential using the K-means method. The 8th International Conference and Workshop on Basic and Applied Science (ICOWOBAS) 2021. https://doi.org/10.1063/5.0103807
Maulidya, A., Sitorus, Z., Siahaan, A. P. U., & Iqbal, M. (2024). Analysis Of Increasing Student Service Satisfaction Using K-Means Clustering Algorithm and Gaussian Mixture Models (GMM). International Journal of Computer Sciences and Mathematics Engineering, 1(1), 29–35. https://doi.org/10.61306/ijecom.v3i1.62
Muhaimin, A., Prastyo, D. D., & Lu, H.-S. H. (2021). Forecasting with recurrent neural network in intermittent demand data. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 802–809. https://doi.org/10.1109/Confluence51648.2021.9376880
Pavlou, M., Omar, R. Z., & Ambler, G. (2024). Penalized regression methods with modified cross-validation and bootstrap tuning produce better prediction models. Biometrical Journal, 66(5), e202300245. https://doi.org/10.1002/bimj.202300245
Putri Z, R. W., Al Maududi, R., & Hartuti, P. M. (2024). Peramalan harga bahan pangan menggunakan fuzzy time series. Journal of Science and Technology, 4(2), 177–188. https://doi.org/10.15548/jostech.v4i2.9728
Renju, K., & Brunda, V. (2024). Optimizing crop yield prediction through multiple models: An ensemble stacking approach. International Journal of Data Informatics and Intelligent Computing, 3(2), 52–58. https://doi.org/10.59461/ijdiic.v3i2.120
Sardá-Espinosa, A. (2019). Time-series clustering in R using the dtwclust package. The R Journal, 11(1), 22–43. https://doi.org/10.32614/RJ-2019-023
Shahapure, K. R., & Nicholas, C. (2020). Cluster quality analysis using silhouette score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 747–748. https://doi.org/10.1109/DSAA49011.2020.00096
Šťastný, T., Koudelka, J., Bílková, D., & Marek, L. (2022). Clustering and modelling of the top 30 cryptocurrency prices using dynamic time warping and machine learning methods. Mathematics, 10(19), 3672. https://doi.org/10.3390/math10193672
Suresh, K., Meghana, J., & Pooja, M. E. (2021). Predicting the e-learners learning style by using support vector regression technique. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 350–355. https://doi.org/10.1109/ICAIS50930.2021.9396018
Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & Melo de Sales, L. (2021). Implementation of stacking based ARIMA model for prediction of COVID-19 cases in India. Journal of Biomedical Informatics, 121. https://doi.org/10.1016/j.jbi.2021.103887
Talekar, B. (2020). A detailed review on decision tree and random forest. Bioscience Biotechnology Research Communications, 13(14), 245–248. https://doi.org/10.21786/bbrc/13.14/57
Wang, J., Wang, Z., Li, X., & Zhou, H. (2022). Artificial bee colony-based combination approach to forecasting agricultural commodity prices. International Journal of Forecasting, 38(1), 21–34. https://doi.org/10.1016/j.ijforecast.2019.08.006
Xu, M., Garg, S., Milford, M., & Gould, S. (2023). Deep declarative dynamic time warping for end-to-end learning of alignment paths. Proceedings of the International Conference on Learning Representations (ICLR).
Yohansa, M., Notodiputro, K. A., & Erfiani. (2022). Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta. ComTech: Computer, Mathematics and Engineering Applications, 13(2), 63–73. https://doi.org/10.21512/comtech.v13i2.7413
Zen, M. A., Wahyuningsih, S., & Dani, A. T. R. (2022). Aplikasi pendekatan agglomerative hierarchical time series clustering untuk peramalan data harga minyak goreng di Indonesia. Seminar Nasional Official Statistics, 293–302. https://doi.org/10.34123/semnasoffstat.v2022i1.1394
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Aviolla Terza Damaliana, Amri Muhaimin, Nabilah Selayanti, Shafira Amanda Putri, Muhammad Nasrudin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
 USER RIGHTS
 All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows:

















