Forecasting Food Prices in East Java Using Stacking Ensemble Learning via K-MEANS
Keywords:
clustering, ensemble, food commodity, price, time-seriesAbstract
Food commodities are essential in developing countries, such as Indonesia. The government rules the food commodity prices in every province. Yet, somehow, there are some issues in certain provinces. Data science and statistics techniques can help the government control food commodity prices. The proposed method applied to predict the commodity price index in East Java is the STACKEL K-MEANS method. This proposed method is a collaborative framework that utilizes cluster analysis and stacking ensemble learning to predict data. Cluster analysis is performed first, using the distance that suits time series data, which is Dynamic Time Warping. Two clusters are formed from each commodity (Rice, Oil, and Flour). Then, the stacking model consists of a base learner and a meta learner. The base learner models used are Ridge Regression, Random Forest, and Support Vector Regression, while the meta learner is Light Gradient Boosting Method. To optimize the parameter, we used a grid search. Following the evaluation process, we compare the proposed method with auto ARIMA from Python. In training and testing data, The proposed method yields superior results to the ARIMA model across all three error metrics: MAPE, MAE, and RMSE. The following scores for flour commodities are 0.042% compared to 0.328%, 4.715 compared to 37.57, and 6.34 compared to 523.99. For rice commodities, the scores are 0.261% compared to 0.392%, 31.585 compared to 48.142, and 41.92 compared to 56.068 For oil commodities, the scores are 0.185% compared to 0.250%, 33.02 compared to 47.571, and 39.35 compared to 56.060.
References
Amatullah, F. F., Ilmani, E. A., Fitrianto, A., Erfiani, & Jumansyah, L. M. R. D. (2025). Clustering time series forecasting model for grouping provinces in Indonesia based on granulated sugar prices. Journal of Applied Informatics and Computing (JAIC), 9(1), 121–130. https://doi.org/10.30871/jaic.v9i1.8840
Cuturi, M., & Blondel, M. (2017). Soft-DTW: a differentiable loss function for time-series. Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 894–903. https://doi.org/10.48550/arXiv.1703.01541
Farisi, O. I. R., Jannah, N., & Insania, R. (2022). Prediksi Harga Komoditas Pangan di Indonesia Menggunakan Backpropagation. COREAI Jurnal Kecerdasan Buatan, Komputasi Dan Teknologi Informasi, 3(1), 91–101. https://doi.org/10.33650/coreai.v3i1.4282
Folgado, D., Barandas, M., Matias, R., Martins, R., Carvalho, M., & Gamboa, H. (2018). Time alignment measurement for time series. Pattern Recognition, 81, 268–279. https://doi.org/10.1016/j.patcog.2018.04.003
Hasibuan, L. S., & Novialdi, Y. (2022). Prediksi harga minyak goreng curah dan kemasan menggunakan algoritme long short-term memory (LSTM). JIKA: Jurnal Ilmu Komputer Dan Aplikasinya, 9(2), 149–157. https://doi.org/10.29244/jika.9.2.149-157
Hassani, H., Marvian, L., Yarmohammadi, M., & Yeganegi, M. R. (2024). Unraveling time series dynamics: Evaluating partial autocorrelation function distribution and its implications. Mathematical and Computational Applications, 29(4), 58. https://doi.org/10.3390/mca29040058
Hegg, J. C., & Kennedy, B. P. (2021). Let’s do the time warp again: Non-linear time series matching as a tool for sequentially structured data in ecology. Ecosphere, 12(9), e03742. https://doi.org/10.1002/ecs2.3742
Herrmann, M., Tan, C. W., & Webb, G. I. (2023). Parameterizing the cost function of dynamic time warping with application to time series classification. Data Mining and Knowledge Discovery, 37(5), 2024–2045. https://doi.org/10.1007/s10618-023-00926-8
Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development, 15(14), 5481–5487. https://doi.org/10.5194/gmd-15-5481-2022
Ismail, W. N., Alsalamah, H. A., & Mohamed, E. (2023). GA-stacking: A new stacking-based ensemble learning method to forecast the COVID-19 outbreak. Computers, Materials and Continua, 74(2), 3945–3976. https://doi.org/10.32604/cmc.2023.031194
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Moving beyond linearity. In G. James, D. Witten, T. Hastie, & R. Tibshirani (Eds.), An introduction to statistical learning: With applications in R (pp. 265–301). Springer. https://doi.org/10.1007/978-1-4614-7138-7_7
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. V. N. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. https://api.semanticscholar.org/CorpusID:3815895
Khan, W., Walker, S., & Zeiler, W. (2022). Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy, 240, 122812. https://doi.org/10.1016/j.energy.2021.122812
Kumar, S., Pant, M., & Nagar, A. (2024). Forecasting the sugarcane yields based on meteorological data through ensemble learning. IEEE Access, 12, 176539–176553. https://doi.org/10.1109/ACCESS.2024.3502547
Kwon, H., Park, J., & Lee, Y. (2019). Stacking ensemble technique for classifying breast cancer. Healthcare Informatics Research, 25(4), 283–288. https://doi.org/10.4258/hir.2019.25.4.283
Li, Q., Zhang, X., Ma, T., Liu, D., Wang, H., & Hu, W. (2022). A multi-step ahead photovoltaic power forecasting model based on TimeGAN, soft DTW-based K-medoids clustering, and a CNN-GRU hybrid neural network. Energy Reports, 8, 10346–10362. https://doi.org/10.1016/j.egyr.2022.08.180
Lukman, A. F., & Olatunji, A. (2018). Newly proposed estimator for ridge parameter: An application to the Nigerian economy. Pakistan Journal of Statistics, 34(2), 91–98. https://api.semanticscholar.org/CorpusID:172134763
Mandal, U., Chakraborty, A., Mahato, P., & Das, G. (2023). LinVec: A stacked ensemble machine learning architecture for analysis and forecasting of time-series data. Indian Journal of Science and Technology, 16(8), 570–582. https://doi.org/10.17485/IJST/v16i8.2197
Mardianto, M. F. F., Suliyanto, Effendy, F., Cahyasari, A. D., Purwoko, C. F. F., Aliffia, N., & Simamora, A. N. M. B. (2023, August). Mapping regencies and cities in East Java related food potential using the K-means method. The 8th International Conference and Workshop on Basic and Applied Science (ICOWOBAS) 2021. https://doi.org/10.1063/5.0103807
Maulidya, A., Sitorus, Z., Siahaan, A. P. U., & Iqbal, M. (2024). Analysis Of Increasing Student Service Satisfaction Using K-Means Clustering Algorithm and Gaussian Mixture Models (GMM). International Journal of Computer Sciences and Mathematics Engineering, 1(1), 29–35. https://doi.org/10.61306/ijecom.v3i1.62
Muhaimin, A., Prastyo, D. D., & Lu, H.-S. H. (2021). Forecasting with recurrent neural network in intermittent demand data. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 802–809. https://doi.org/10.1109/Confluence51648.2021.9376880
Pavlou, M., Omar, R. Z., & Ambler, G. (2024). Penalized regression methods with modified cross-validation and bootstrap tuning produce better prediction models. Biometrical Journal, 66(5), e202300245. https://doi.org/10.1002/bimj.202300245
Putri Z, R. W., Al Maududi, R., & Hartuti, P. M. (2024). Peramalan harga bahan pangan menggunakan fuzzy time series. Journal of Science and Technology, 4(2), 177–188. https://doi.org/10.15548/jostech.v4i2.9728
Renju, K., & Brunda, V. (2024). Optimizing crop yield prediction through multiple models: An ensemble stacking approach. International Journal of Data Informatics and Intelligent Computing, 3(2), 52–58. https://doi.org/10.59461/ijdiic.v3i2.120
Sardá-Espinosa, A. (2019). Time-series clustering in R using the dtwclust package. The R Journal, 11(1), 22–43. https://doi.org/10.32614/RJ-2019-023
Shahapure, K. R., & Nicholas, C. (2020). Cluster quality analysis using silhouette score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 747–748. https://doi.org/10.1109/DSAA49011.2020.00096
Šťastný, T., Koudelka, J., Bílková, D., & Marek, L. (2022). Clustering and modelling of the top 30 cryptocurrency prices using dynamic time warping and machine learning methods. Mathematics, 10(19), 3672. https://doi.org/10.3390/math10193672
Suresh, K., Meghana, J., & Pooja, M. E. (2021). Predicting the e-learners learning style by using support vector regression technique. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 350–355. https://doi.org/10.1109/ICAIS50930.2021.9396018
Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & Melo de Sales, L. (2021). Implementation of stacking based ARIMA model for prediction of COVID-19 cases in India. Journal of Biomedical Informatics, 121. https://doi.org/10.1016/j.jbi.2021.103887
Talekar, B. (2020). A detailed review on decision tree and random forest. Bioscience Biotechnology Research Communications, 13(14), 245–248. https://doi.org/10.21786/bbrc/13.14/57
Wang, J., Wang, Z., Li, X., & Zhou, H. (2022). Artificial bee colony-based combination approach to forecasting agricultural commodity prices. International Journal of Forecasting, 38(1), 21–34. https://doi.org/10.1016/j.ijforecast.2019.08.006
Xu, M., Garg, S., Milford, M., & Gould, S. (2023). Deep declarative dynamic time warping for end-to-end learning of alignment paths. Proceedings of the International Conference on Learning Representations (ICLR).
Yohansa, M., Notodiputro, K. A., & Erfiani. (2022). Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta. ComTech: Computer, Mathematics and Engineering Applications, 13(2), 63–73. https://doi.org/10.21512/comtech.v13i2.7413
Zen, M. A., Wahyuningsih, S., & Dani, A. T. R. (2022). Aplikasi pendekatan agglomerative hierarchical time series clustering untuk peramalan data harga minyak goreng di Indonesia. Seminar Nasional Official Statistics, 293–302. https://doi.org/10.34123/semnasoffstat.v2022i1.1394
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Aviolla Terza Damaliana, Amri Muhaimin, Nabilah Selayanti, Shafira Amanda Putri, Muhammad Nasrudin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
 USER RIGHTS
 All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows:

















