Articles | Open Access |

Integrating Machine Learning and Deep Reinforcement Learning for Anomaly Detection and Autonomous Resource Allocation in Distributed Microservice Ecosystems: A Comprehensive Analysis of System Reliability and Sustainability

Dr. Alistair Sterling , Department of Computer Science and Information Systems, University of Melbourne, Australia

Abstract

The rapid evolution of cloud computing, transition toward 6G networking, and the proliferation of microservice architectures have necessitated a paradigm shift in how system reliability and resource efficiency are managed. Traditional monolithic monitoring systems are increasingly inadequate for the dynamic, decoupled nature of modern distributed environments. This research provides an exhaustive exploration of the integration of machine learning and deep reinforcement learning (DRL) for detecting execution anomalies and optimizing resource distribution. By synthesizing methodologies ranging from time-weighted control flow graph mining to recurrent neural network attention mechanisms, this study develops a theoretical framework for proactive system maintenance. We analyze the role of log-based anomaly detection, such as DeepLog and Logsed, in predicting failures before they impact service level agreements (SLAs). Furthermore, the paper investigates the intersection of environmental sustainability and system performance, examining how self-adaptive approaches can harness renewable energy in cloud ecosystems. The research also delves into the complexities of multi-domain service deployment within 5G and 6G frameworks, emphasizing the necessity of reliability-aware algorithms. Through an extensive review of existing literature and the proposition of a multi-level self-adaptation model, this article demonstrates that the future of resilient distributed systems lies in the convergence of automated boundary detection, intelligent workload scheduling, and adaptive flow control mechanisms. The findings suggest that while deep learning offers unprecedented accuracy in anomaly diagnosis, the integration of human-centric design-moving from Industry 4.0 toward Society 5.0-remains critical for the ethical and practical deployment of autonomous IT infrastructures.

Keywords

Microservices, Anomaly Detection, Deep Reinforcement Learning

References

Brown, A.; Tuor, A.; Hutchinson, B.; Nichols, N. Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In Proceedings of the First Workshop on Machine Learning for Computing Systems, Tempe, AZ, USA, 12 June 2018; pp. 1–8.

Cao B, Zhang L, Li Y, Feng D, Cao W (2019) Intelligent offloading in multi-access edge computing: a state-of-the-art review and framework. IEEE Commun Magaz 57(3):56–62.

Du, M.; Li, F.; Zheng, G.; Srikumar, V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1285–1298.

Fu, Q.; Lou, J.G.; Wang, Y.; Li, J. Execution anomaly detection in distributed systems through unstructured log analysis. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, FL, USA, 6–9 December 2009; pp. 149–158.

Giordani M, Polese M, Mezzavilla M, Rangan S, Zorzi M (2020) Toward 6g networks: use cases and technologies. IEEE Commun Magaz 58(3):55–61.

K. S. Hebbar, “MACHINE LEARNING-ASSISTED SERVICE BOUNDARY DETECTION FOR MODULARIZING LEGACY SYSTEMS,” International Journal of Applied Engineering & Technology, vol. 04,no.02, pp. 401-414, Sep. 2022, https://romanpub.com/resources/ijaet-v4-2-2022-48.pdf

Jia, T.; Yang, L.; Chen, P.; Li, Y.; Meng, F.; Xu, J. Logsed: Anomaly diagnosis through mining time-weighted control flow graph in logs. In Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), Honololu, HI, USA, 25–30 June 2017; pp. 447–455.

Kibalya G, Serrat J, Gorricho J-L, Okello D, Zhang P (2020) A deep reinforcement learning-based algorithm for reliability-aware multi-domain service deployment in smart ecosystems, Neural Computing and Applications 1–23.

Kibalya G, Serrat J, Gorricho J-L, Pasquini R, Yao H, Zhang P (2019) A reinforcement learning based approach for 5g network slicing across multiple domains, In: 2019 15th International Conference on Network and Service Management (CNSM), IEEE, pp. 1–5.

Maier M (2021) 6g as if people mattered: From industry 4.0 toward society 5.0, In: 2021 International Conference on Computer Communications and Networks (ICCCN), IEEE, pp. 1–10.

Mao Y, You C, Zhang J, Huang K, Letaief KB (2017) A survey on mobile edge computing: the communication perspective. IEEE Commun Surveys Tutor 19(4):2322–2358.

Nandi, A.; Mandal, A.; Atreja, S.; Dasgupta, G.B.; Bhattacharya, S. Anomaly detection using program control flow graph mining from execution logs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 215–224.

Sharma, B.; Jayachandran, P.; Verma, A.; Das, C.R. CloudPD: Problem determination and diagnosis in shared dynamic clouds. In Proceedings of the 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, 24–27 June 2013; pp. 1–12.

Xu, M., Toosi, A. N., and Buyya, R. (2020). A Self-adaptive Approach for Managing Applications and Harnessing Renewable Energy for Sustainable Cloud Computing. IEEE Transactions on Sustainable Computing.

Yagoub, I.; Khan, M.A.; Jiyun, L. IT equipment monitoring and analyzing system for forecasting and detecting anomalies in log files utilizing machine learning techniques. In Proceedings of the 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 6–7 August 2018; pp. 1–6.

Yang, Z., Nguyen, P., Jin, H., and Nahrstedt, K. (2019). MIRAS: Model-based Reinforcement Learning for Microservice Resource Allocation over Scientific Workflows. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 122–132.

Yudong, L., Yuqing, Z., and Zhangbin, Z. (2020). Service Availability Guarantee with Adaptive Automatic Flow Control. In 2020 IEEE World Congress on Services (SERVICES), pp. 101–105.

Zang, X., Chen, W., Zou, J., Zhou, S., Lisong, H., and Ruigang, L. (2018). A Fault Diagnosis Method for Microservices Based on Multi-Factor Self-Adaptive Heartbeat Detection Algorithm. In 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), pp. 1–6.

Zhang, S., Zhang, M., Ni, L., and Liu, P. (2019). A Multi-Level Self-Adaptation Approach For Microservice Systems. In 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 498–502.

Zheng, T., Wan, J., Zhang, J., Jiang C (2022) Deep reinforcement learning-based workload scheduling for edge computing. J Cloud Comput 11(1):3.

Zheng, T., Zheng, X., Zhang, Y., Deng, Y., Dong, E., Zhang, R., and Liu, X. (2019). SmartVM: a SLA-aware microservice deployment framework. World Wide Web, 22(1):275–293.

Article Statistics

Copyright License

Download Citations

How to Cite

Dr. Alistair Sterling. (2025). Integrating Machine Learning and Deep Reinforcement Learning for Anomaly Detection and Autonomous Resource Allocation in Distributed Microservice Ecosystems: A Comprehensive Analysis of System Reliability and Sustainability. American Journal of Applied Science and Technology, 5(09), 136–141. Retrieved from https://theusajournals.com/index.php/ajast/article/view/9273