
Challenges of Syntactic Markup in The Uzbek Language
Abstract
This article explores the key challenges related to syntactic markup in the Uzbek language. As an agglutinative language with free word order, Uzbek presents significant difficulties for natural language processing (NLP). Particular attention is paid to morphological ambiguity, the lack of large-scale annotated corpora, and the insufficient adaptation of algorithms to the linguistic specificities of Uzbek. Solutions are proposed, including the development of specialized corpora, adaptation of existing machine learning models, and the creation of new markup algorithms tailored to the language.
Keywords
Syntactic markup, Uzbek language, morphology
References
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In ACL (System Demonstrations).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
Universal Dependencies Project. (2024). https://universaldependencies.org/
Tursunov, B. (2019). Linguistic features of the Uzbek language: Morphology and syntax. Tashkent State University Journal.
Nasriddinov, S., & Abdullaeva, M. (2022). Building NLP Tools for Uzbek Language: Resources and Challenges. In Proceedings of the Turkic Languages NLP Workshop.
Özçelik, O., & Güngör, T. (2018). A comparative study on syntactic markup of agglutinative languages. In Journal of Language Modelling.
Khamdamova D. Comparative analysis of syntactic markup in English and Uzbek: achievements, challenges, and prospects. Journal of University of Mangement and Future Technologies
Article Statistics
Copyright License
Copyright (c) 2025 Dilnoza Khamdamova

This work is licensed under a Creative Commons Attribution 4.0 International License.