Articles | Open Access |

The Convergence of Machine Learning and Large Language Models in Software Architecture: A Comprehensive Analysis of Defect Prediction, Design Patterns, And Automated Recovery

Jiya Nelson , Department of Software Engineering, University of Melbourne, Australia

Abstract

The integration of Machine Learning (ML) and Large Language Models (LLMs) into the software development lifecycle has catalyzed a paradigm shift in how architectural integrity and code quality are maintained. This research article provides an extensive investigation into the evolution of automated software engineering, spanning from classical ML-based defect prediction and code smell detection to contemporary LLM-driven architectural recovery and design pattern adoption. By synthesizing evidence from systematic mapping studies and recent empirical evaluations, the study explores the transition from discriminative models-used for identifying package-level clones and software vulnerabilities-to generative architectures capable of end-to-end program repair and high-level component summarization. The research further examines the efficacy of Chain-of-Thought (CoT) prompting and abstract syntax tree (AST) representations in enhancing the semantic understanding of complex codebases. Through a detailed analysis of cross-project defect prediction, modularization of legacy systems, and the extraction of architectural information from informal specifications, this article highlights the transformative potential of AI-driven practices. The findings suggest that while LLMs possess significant architectural knowledge, their integration requires precise probing techniques and formal rule learning to mitigate risks associated with hallucinations and maintain conformance in continuous integration environments.

Keywords

Machine Learning, Large Language Models, Software Architecture, Defect Prediction

References

Caram Frederico Luiz, Rodrigues Bruno Rafael De Oliveira, Campanelli Amadeu Silveira, Parreiras Fernando Silva. Machine learning techniques for code smells detection: A systematic mapping study. Int. J. Softw. Eng. Knowl. Eng., 29 (02) (2019), pp. 285-316, 10.1142/S021819401950013X.

Cesare Silvio, Xiang Yang, Zhang Jun. Clonewise – detecting package-level clones using machine learning. Zia Tanveer, Zomaya Albert, Varadharajan Vijay, Mao Morley (Eds.), Security and Privacy in Communication Networks, 978-3-319-04283-1 (2013), pp. 197-215.

Cetiner M., Sahingoz O.K. A comparative analysis for machine learning based software defect prediction systems. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2020), pp. 1-7, 10.1109/ICCCNT49239.2020.9225352.

Ceylan E., Kutlubay F.O., Bener A.B. Software defect identification using machine learning techniques. 32nd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO’06) (2006), pp. 240-247, 10.1109/EUROMICRO.2006.56.

Chakraborty S., Ding Y., Allamanis M., Ray B. CODIT: Code editing with tree-based neural models. IEEE Trans. Softw. Eng. (2020), p. 1, 10.1109/TSE.2020.3020502.

Chakraborty Saikat, Ding Yangruibo, Allamanis Miltiadis, Ray Baishakhi. CODIT: Code editing with tree-based neural models. IEEE Trans. Softw. Eng., 48 (4) (2022), pp. 1385-1399, 10.1109/TSE.2020.3020502.

Chakraborty Saikat, Ray Baishakhi. On multi-modal learning of editing source code. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2021), pp. 443-455, 10.1109/ASE51524.2021.1003_Chakraborty2021.

Challagulla Venkata Udaya B., Bastani Farokh B., Yen I-Ling, Paul Raymond A. Empirical assessment of machine learning based software defect prediction techniques. Int. J. Artif. Intell. Tools, 17 (02) (2008), pp. 389-400, 10.1142/S0218213008003947.

Chappelly T., Cifuentes C., Krishnan P., Gevay S. Machine learning for finding bugs: An initial report. 2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE) (2017), pp. 21-26, 10.1109/MALTESQUE.2017.7882012.

Chaturvedi Shivam, Chaturvedi Amrita, Tiwari Anurag, Agarwal Shalini. Design pattern detection using machine learning techniques. 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), IEEE (2018), pp. 1-6.

Chen Deyu, Chen Xiang, Li Hao, Xie Junfeng, Mu Yanzhou. Deepcpdp: Deep learning based cross-project defect prediction. IEEE Access, 7 (2019), pp. 184832-184848.

Chen Qiuyuan, Hu Han, Liu Zhaoyi. Code summarization with abstract syntax tree. Gedeon Tom, Wong Kok Wai, Lee Minho (Eds.), Neural Information Processing, 978-3-030-36802-9 (2019), pp. 652-660.

Chen Jinyin, Hu Keke, Yu Yue, Chen Zhuangzhi, Xuan Qi, Liu Yi, Filkov Vladimir. Software visualization and deep transfer learning for effective software defect prediction. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE ’20, 9781450371216 (2020), pp. 578-589, 10.1145/3377811.3380389.

Chen Fuxiang, Kim Mijung, Choo Jaegul. Novel natural language summarization of program code via leveraging multiple input representations. Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic (2021), pp. 2510-2520, 10.18653/v1/2021.findings-emnlp.214.

Chen Z., Kommrusch S.J., Tufano M., Pouchet L., Poshyvanyk D., Monperrus M. SEQUENCER: Sequence-to-sequence learning for end-to-end program repair. IEEE Trans. Softw. Eng. (2019), p. 1, 10.1109/TSE.2019.2940179.

Chen Xinyun, Liu Chang, Shin Richard, Song Dawn, Chen Mingcheng. Latent attention for if-then program synthesis. Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS ’16, 9781510838819 (2016), pp. 4581-4589.

Prompts_PlantUML_Similarity_Score. https://github.com/renjith-r02/prompts-plantuml-similarityscore/blob/main/Prompts_PlantUML_Similarity_Score_v2.pdf.

P. Raghavan. Ipek ozkaya on generative ai for software architecture. IEEE Softw., 41 (2024), pp. 141-144.

Rejithkumar, G., Anish, P.R., Shukla, J., Ghaisas, S., 2024. Probing with precision: Probing question generation for architectural information elicitation. In: 2024 IEEE/ACM Workshop on Multi-Disciplinary, Open, and RElevant Requirements Engineering. MO2RE, pp. 8–14.

R. Rubei, A. Di Salle, A. Bucaioni. Llm-based recommender systems for violation resolutions in continuous architectural conformance. 2025 IEEE 22nd International Conference on Software Architecture Companion, ICSA-C, IEEE Computer Society, Los Alamitos, CA, USA (2025), pp. 404-409.

Rukmono, S.A., Ochoa, L., Chaudron, M.R., 2023. Achieving high-level software component summarization via hierarchical chain-of-thought prompting and static code analysis. In: 2023 IEEE International Conference on Data and Software Engineering. ICoDSE, pp. 7–12.

S.A. Rukmono, L. Ochoa, M. Chaudron. Deductive software architecture recovery via chain-of-thought prompting. Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER’24, Association for Computing Machinery, New York, NY, USA (2024), pp. 92-96.

L. Saarinen. Generative ai in software development. Inf. Technol. (2024).

C. Schindler, A. Rausch. Formal software architecture rule learning: A comparative investigation between large language models and inductive techniques. Electronics, 13 (2024).

Sharma, T., 2024. Llms for code: The potential, prospects, and problems. In: 2024 IEEE 21st International Conference on Software Architecture Companion. ICSA-C, pp. 373–374.

V. Singh, C. Korlu, O. Orcun, W.K. Assunçao. Experiences on using large language models to re-engineer a legacy system at volvo group. Methods, 13 (2024), p. 14.

M. Soliman, J. Keim. Do large language models contain software architectural knowledge? : An exploratory case study with gpt. 2025 IEEE 22nd International Conference on Software Architecture, ICSA, IEEE Computer Society, Los Alamitos, CA, USA (2025), pp. 13-24.

V. Supekar, P. MIT WPU, R. Khande. Improving software engineering practices: Ai-driven adoption of design patterns (2024).

Tagliaferro, S. Corboe, B. Guindani. Leveraging llms to automate software architecture design from informal specifications. 2025 IEEE 22nd International Conference on Software Architecture Companion, ICSA-C, IEEE Computer Society, Los Alamitos, CA, USA (2025), pp. 291-299.

Tang, S., Chen, X., Xiao, H., Wei, J., Li, Z., 2023. Using problem frames approach for key information extraction from natural language requirements. In: 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion. QRS-C, pp. 330–339.

K. S. Hebbar, “MACHINE LEARNING-ASSISTED SERVICE BOUNDARY DETECTION FOR MODULARIZING LEGACY SYSTEMS,” International Journal of Applied Engineering & Technology, vol. 04, no.02, pp. 401-414, Sep. 2022, https://romanpub.com/resources/ijaet-v4-2-2022-48.pdf

Article Statistics

Copyright License

Download Citations

How to Cite

Jiya Nelson. (2024). The Convergence of Machine Learning and Large Language Models in Software Architecture: A Comprehensive Analysis of Defect Prediction, Design Patterns, And Automated Recovery. American Journal of Applied Science and Technology, 4(11), 106–113. Retrieved from https://theusajournals.com/index.php/ajast/article/view/9529