Intelligent Resource Orchestration System for AI-Driven Digital Commerce Platforms Using Reinforcement Learning

amelia sholikhaq

doi:10.31004/riggs.v4i4.4137

Authors

amelia sholikhaq Universitas Tangerang Raya

DOI:

https://doi.org/10.31004/riggs.v4i4.4137

Keywords:

Reinforcement Learning, Resource Orchestration, Digital Commerce Platforms, Proximal Policy Optimization (PPO)

Abstract

his study proposes an intelligent resource orchestration system for AI-driven digital commerce platforms using a reinforcement learning (RL) framework to address growing challenges in dynamic workload management, latency reduction, and service efficiency. Grounded in contemporary advances in machine learning–based cloud orchestration, the research investigates the effectiveness of the Proximal Policy Optimization (PPO) algorithm in optimizing resource allocation under complex, non-stationary platform conditions. A simulation-based experimental design was employed, incorporating real-world platform logs and synthetic workload scenarios to evaluate system responsiveness, throughput, and cost efficiency relative to heuristic and threshold-based baselines. The findings demonstrate that the RL-driven orchestrator consistently outperforms conventional methods, achieving superior latency reductions, improved throughput stability, and enhanced adaptability during peak-load fluctuations. The results further show that the agent effectively learns optimal policies despite environmental uncertainty, validating the feasibility of model-free RL for large-scale digital commerce environments. The study contributes theoretically by extending sequential decision-making models to digital commerce orchestration and practically by offering a scalable, autonomous solution that enhances platform performance. Limitations include the controlled simulation environment and the focus on a single RL algorithm, suggesting the need for real-world deployment and exploration of alternative RL variants in future research. Overall, the study strengthens the case for adopting RL-based orchestration as a foundational architecture for next-generation intelligent digital commerce systems.

Downloads

Download data is not yet available.

References

E. Brynjolfsson and A. McAfee, The second machine age: Work, progress, and prosperity in a time of brilliant technologies. W. W. Norton & Company, 2014.

M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science (1979), vol. 349, no. 6245, pp. 255–260, Jul. 2015, doi: 10.1126/science.aaa8415.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT Press, 2018.

Z. Ghahramani, “Probabilistic machine learning and artificial intelligence,” Nature, vol. 521, no. 7553, pp. 452–459, May 2015, doi: 10.1038/nature14541.

H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource Management with Deep Reinforcement Learning,” in Proceedings of the 15th ACM Workshop on Hot Topics in Networks, New York, NY, USA: ACM, Nov. 2016, pp. 50–56. doi: 10.1145/3005745.3005750.

G. Zhou, W. Tian, R. Buyya, R. Xue, and L. Song, “Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions,” Artif Intell Rev, vol. 57, no. 5, p. 124, Apr. 2024, doi: 10.1007/s10462-024-10756-9.

T. Temizöz, C. Imdahl, R. Dijkman, D. Lamghari-Idrissi, and W. van Jaarsveld, “Deep Controlled Learning for Inventory Control,” Eur J Oper Res, vol. 324, no. 1, pp. 104–117, Jul. 2025, doi: 10.1016/j.ejor.2025.01.026.

L. T. Hoang, C. T. Nguyen, and A. T. Pham, “Deep Reinforcement Learning-Based Online Resource Management for UAV-Assisted Edge Computing With Dual Connectivity,” IEEE/ACM Transactions on Networking, vol. 31, no. 6, pp. 2761–2776, Dec. 2023, doi: 10.1109/TNET.2023.3263538.

D. Kuizinienė, T. Krilavičius, R. Damaševičius, and R. Maskeliūnas, “Systematic Review of Financial Distress Identification using Artificial Intelligence Methods,” Applied Artificial Intelligence, vol. 36, no. 1, Dec. 2022, doi: 10.1080/08839514.2022.2138124.

J. Schulma, F. Wolski, and P. Dhariwal, “Proximal Policy Optimization Algorithms,” Computer Science, 2017.

J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” Computer (Long Beach Calif), vol. 36, no. 1, pp. 41–50, Jan. 2003, doi: 10.1109/MC.2003.1160055.

J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury, Feedback control of computing systems. Wiley, 2004.

M. Armbrust et al., “A view of cloud computing,” Commun ACM, vol. 53, no. 4, pp. 50–58, Apr. 2010, doi: 10.1145/1721654.1721672.

Yue. Wang and S. Zhou, “Policy Gradient Method For Robust Reinforcement Learning,” in Proceedings of the 39 th International Conference on Machine Learning, University of Buffalo, 2022.

K. V. S. R. P. Kumar, Bechoo Lal, and Bysani Venkata Srinivasulu, “Adaptive reinforcement learning for dynamic resource allocation: Minimising cost and maximising sla compliance,” International Journal of Data Science and IoT Management System, vol. 4, no. 3, pp. 364–374, Sep. 2025, doi: 10.64751/ijdim.2025.v4.n3.pp364-374.

Y. Garí, D. A. Monge, E. Pacini, C. Mateos, and C. García Garino, “Reinforcement learning-based application Autoscaling in the Cloud: A survey,” Eng Appl Artif Intell, vol. 102, p. 104288, Jun. 2021, doi: 10.1016/j.engappai.2021.104288.

D. Hortelano et al., “A comprehensive survey on reinforcement-learning-based computation offloading techniques in Edge Computing Systems,” Journal of Network and Computer Applications, vol. 216, p. 103669, Jul. 2023, doi: 10.1016/j.jnca.2023.103669.