Engineering Practices For Ensuring Resilience In Scalable Cloud Systems

Damir Rakhmaev

doi:10.37547/ijmef/Volume06Issue01-07

PDF

Articles | Open Access | https://doi.org/10.37547/ijmef/Volume06Issue01-07

Engineering Practices For Ensuring Resilience In Scalable Cloud Systems

Damir Rakhmaev , Staff software engineer, Russia

Download PDF

Published Date 2026-01-21

Pages 63-67

0

Abstract

The article systematizes engineering practices that ensure resilience in scalable cloud architectures: designing for failures, automated recovery, observability, reliability management through SLOs and error budgets, as well as experimental verification of stability using chaos methods Engineering. A practice-oriented taxonomy of approaches is proposed and how to link technical measures with manageable reliability goals is demonstrated.

Keywords

Sustainability, reliability, cloud systems

References

AWS. AWS Well-Architected Framework - Reliability Pillar [Electronic resource]. - Amazon Web Services, 2024. - Mode Access: https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html

Beyer B., Jones C., Petoff J., Murphy NR Site Reliability Engineering: How Google Runs Production Systems [Electronic resource]. - Google Research, 2016. - Mode access: https://research.google/pubs/site-reliability-engineering-how-google-runs-production-systems/

Majors C., Fong L., Miranda G. Observability Engineering: Achieving Production Excellence. - Sebastopol: O'Reilly Media, 2022. - 432 p.

Basiri A., Behl A., De Rooij R., Hochstein L., Kosewski L., Reynolds J., Rosenthal C. Chaos Engineering // IEEE Software. - 2016. - Vol. 33, No. 3. - P. 35-41. - DOI: 10.1109/MS.2016.60.

Nygard MT Release It! (2nd ed.): Design and Deploy Production-Ready Software. - Raleigh: Pragmatic Bookshelf, 2018. - 368 p.

Rosenthal C., Jones N., Basiri A., et al. Chaos Engineering: Building Confidence in System Behavior through Experiments. - Sebastopol: O'Reilly Media, 2017. - 304 p.

Article Statistics

Copyright License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download Citations

How to Cite

Rakhmaev, D. (2026). Engineering Practices For Ensuring Resilience In Scalable Cloud Systems. International Journal Of Management And Economics Fundamental, 6(01), 63–67. https://doi.org/10.37547/ijmef/Volume06Issue01-07

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

Engineering Practices For Ensuring Resilience In Scalable Cloud Systems

Abstract

Keywords

References

Article Statistics

Copyright License

Download Citations

How to Cite

Download Citation

Search article, authors.....