Adaptive Fuel Subsidy Optimization Using Deep Q-Learning and Bandit-Based Policy Selection: A Simulation Study

Pandu Dwi Luhur Pambudi

doi:10.21512/emacsjournal.v7i2.13419

Authors

Pandu Dwi Luhur Pambudi Bina Nusantara University

DOI:

https://doi.org/10.21512/emacsjournal.v7i2.13419

Keywords:

Reinforcement Learning, Bandit Algorithm, Fuel Subsidy, Policy Simulation, Q-learning

Abstract

Designing effective fuel subsidy policies is a major challenge for governments seeking to balance energy affordability, fiscal sustainability, and environmental goals. This study introduces an adaptive simulation framework combining Deep Q-Learning and a multi-armed bandit algorithm to model fuel consumption behavior and optimize subsidy distribution strategies. Moreover, this paper simulates a dual-agent system in which a DQN-based consumer interacts with a bandit driven government selecting among three subsidy policies: universal, quota-based, and targeted. By simulating consumer responses to universal, quota-based, and targeted subsidies over 1,000 episodes, the framework demonstrates how policy can adapt in real-time to maximize social welfare and reduce inefficient spending. Results show that while universal subsidies often deliver the highest consumer satisfaction, they incur significant fiscal costs, whereas quota and targeted approaches can yield more balanced trade-offs. The study highlights the potential of reinforcement learning to enhance adaptive policymaking in complex economic systems. To strengthen the analysis, the simulation tracks both consumer and government rewards across scenarios, capturing the trade-off between satisfaction and fiscal burden. Evaluation results reveal that targeted subsidies, though less popular, often provide more sustainable outcomes. The agent-based approach enables the system to dynamically adjust policy decisions based on real-time feedback, reflecting the evolving nature of economic behavior. These findings underscore the usefulness of AI-driven simulations as decision-support tools in designing responsive and cost-efficient public policies.

Dimensions

Plum Analytics

Author Biography

Pandu Dwi Luhur Pambudi, Bina Nusantara University

Computer Science Department, BINUS Online Learning

References

Black, S., Parry, I., & Vernon, N. (2023). Fossil fuel subsidies surged to record $7 trillion. IMF blog, 24. Retrieved from https://www.imf.org/en/Publications/WP/Issues/2023/08/22/IMF-Fossil-Fuel-Subsidies-Data-2023-Update-537281

Bloomberg. (2024). Malaysia minister braces for backlash over fuel subsidy revamp. Retrieved from https://www.bloomberg.com/news/articles/2024-10-20/malaysia-minister-braces-for-backlash-over-fuel-subsidy-revamp?embedded-checkout=true

Coady, M. D., Parry, I. W., Sears, L., & Shang, B. (2015). How large are global energy subsidies? International Monetary Fund.

Cunha, R. F., Gonçalves, T. R., Varma, V. S., Elayoubi, S. E., & Cao, M. (2022). Reducing fuel consumption in platooning systems through reinforcement learning. IFAC-PapersOnLine, 55(15), 99-104.

Durairaj, D., Wroblewski, L., Sheela, A., Hariharasudan, A., & Urbanski, M. (2022). Random forest based power sustainability and cost optimization in smart grid. Production Engineering Archives, 28(1), 82–92.

Herman, R., Nistor, C., & Jula, N. M. (2023). The influence of the increase in energy prices on the profitability of companies in the European Union. Sustainability, 15(21), 15404.

Hulshof, D., & Mulder, M. (2020). The impact of renewable energy use on firm profit. Energy Economics, 92, 104957.

Kalatzantonakis, P., Sifaleras, A., & Samaras, N. (2023). A reinforcement learning-variable neighborhood search method for the capacitated vehicle routing problem. Expert Systems with Applications, 213, 118812.

Kuyoro, A. O., Ogunyolu, O. A., Ayanwola, T. G., & Ayankoya, F. Y. (2022). Dynamic effectiveness of random forest algorithm in financial credit risk management for improving output accuracy and loan classification prediction. Ingenierie des systèmes d’information, 27(5), 815–821.

Lee, S., Liebana, S., Clopath, C., & Dabney, W. (2024). Lifelong reinforcement learning via neuromodulation. arXiv preprint arXiv:2408.08446.

Li, J., & Yu, T. (2021). Optimal adaptive control for solid oxide fuel cell with operating constraints via large-scale deep reinforcement learning. Control Engineering Practice, 117, 104951.

Liu, C., Chan, Y., Kazmi, S. H. A., & Fu, H. (2015). Financial fraud detection model: Based on random forest. International Journal of Economics and Finance, 7(7), 27–35.

Mui, J., Lin, F., & Dewan, M. A. A. (2021). Multi-armed bandit algorithms for adaptive learning: A survey. In International Conference on Artificial Intelligence in Education (pp. 273-278). Cham: Springer International Publishing.

Ni, H., Xu, H., Ma, D., & Fan, J. (2023). Contextual combinatorial bandit on portfolio management. Expert Systems with Applications, 221, 119677.

Nadkarni, S. B., Vijay, G., & Kamath, R. C. (2023). Comparative study of random forest and gradient boosting algorithms to predict airfoil self-noise. Engineering Proceedings, 59(1), 24.

Oda, A., Mihana, T., Kanno, K., Naruse, M., & Uchida, A. (2022). Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities. Nonlinear Theory and Its Applications, IEICE, 13(1), 112-122.

Ogunleye, O., Adeniji, S., Onih, V., Simo, Y., Elom, E., Kanu, E., ... & Ejiofor, O. (2024). Improving resilience and efficiency in the energy sector: A perspective on cybersecurity and renewable energy storage. Valley International Journal Digital Library, 502-513.

Reuters. (2024). Indonesia weighs plan to phase out fuel subsidies by 2027. Retrieved from https://www.reuters.com/business/energy/indonesia-conducting-thorough-exercise-reform-fuel-subsidy-scheme-minister-says-2024-11-04/

Reuters. (2025). India's budget likely to raise major subsidies by 8% to $47 billion in next fiscal. Retrieved from https://www.reuters.com/world/india/indias-budget-likely-raise-major-subsidies-by-8-47-bln-next-fiscal-2025-01-22/

Rubio, F., Llopis-Albert, C., & Valero, F. (2021). Multi-objective optimization of costs and energy efficiency associated with autonomous industrial processes for sustainable growth. Technological Forecasting and Social Change, 173.

Saeed, S., Gull, H., Aldossary, M. M., Altamimi, A. F., Alshahrani, M. S., Saqib, M., ... & Almuhaideb, A. M. (2024). Digital transformation in energy sector: Cybersecurity challenges and implications. Information, 15(12), 764.

Sahu, U. K., & Pradhan, A. K. (2024). Discovering the determinants of energy intensity of Indian manufacturing firms: A panel data approach. Discover Sustainability, 5(1), 139.

Siavvas, M., Marantos, C., Papadopoulos, L., Kehagias, D., Soudris, D., & Tzovaras, D. (2020). On the relationship between software security and energy consumption. IEEE Transactions on Emerging Topics in Computing, 8(3), 535–545.

Thibodeau, J., Nekoei, H., Taïk, A., Rajendran, J., & Farnadi, G. (2024). Fairness incentives in response to unfair dynamic pricing. arXiv preprint arXiv:2404.14620.

Wu, C. H., & Pambudi, P. D. L. (2025). Digital transformation in fintech: Choosing between application and Software as a Service (SaaS). Asia Pacific Management Review, 30(2), 100342.

Wu, C. H., & Pambudi, P. D. L. (2024). On-premise software vs. cloud-based software under the presence of product bundling. Procedia Computer Science.

Wu, C. H., & Pambudi, P. D. L. (2023). Exploring security service on information product’s pricing decisions. In 2023 5th International Conference on Management Science and Industrial Engineering (pp. 159-163). New York, NY, USA: ACM.

Xu, S., Wang, F., Wang, H., & Romberg, J. (2020). In-field performance optimization for mm-wave mixed-signal Doherty power amplifiers: A bandit approach. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(12), 5302-5315.

Zhang, K. W., Closser, N., Trella, A. L., & Murphy, S. A. (2024). Replicable bandits for digital health interventions. arXiv preprint arXiv:2407.15377.