Evaluating the Impact of Domain Adaptation on Transformer-based Models for Low-Resource Purépecha-Spanish Translation
DOI:
https://doi.org/10.61467/2007.1558.2026.v17i2.1265Keywords:
Purépecha, low-resource NMT, domain adaptation, Transformer, mBART-50, Marian (OPUS-MT), BLEU, ROUGEAbstract
This work evaluates how domain adaptation affects Transformer-based neural machine translation (NMT) for the low-resource Purépecha–Spanish pair. Building on a system fine-tuned on a verse-aligned Bible corpus, we introduce an out-of-domain grammar-book dataset (1,626 sentence pairs: 1,297 used for adaptation, 329 held out for testing) to quantify (A) zero-shot transfer (Bible→G-test) versus (B) adaptation (Bible+G-train→G-test). Using BLEU and ROUGE, zero-shot performance is weak for Marian (BLEU=0.2272) and mBART-50 (BLEU=1.9992), revealing substantial domain mismatch. After adaptation, scores rise sharply: Marian reaches BLEU=21.2699, mBART-50 achieves BLEU=28.8776, with parallel gains in ROUGE (e.g., mBART-50 ROUGE-L=0.5791). Qualitatively, adaptation reduces repetitive/degenerate outputs and improves handling of metalinguistic terminology and everyday constructions. These results show that multilingual pretrained Transformers + lightweight in-domain data provide strong improvements for low-resource NMT under domain shift and highlight the value of diverse domains and speaker-informed evaluation.
Smart citations: https://scite.ai/reports/10.61467/2007.1558.2026.v17i2.1265
Dimensions.
Open Alex.
References
Abrego-Mendoza, S., Angel, J., Meque, A. G. M., Maldonado-Sifuentes, C., Sidorov, G., & Gelbukh, A. (2023). Comparison of translation models for low-resource languages. In Mexican International Conference on Artificial Intelligence (MICAI 2023).
Aycock, S., Stap, D., Wu, D., Monz, C., & Sima’an, K. (2024). Can LLMs really learn to translate a low-resource language from one grammar book? arXiv. https://arxiv.org/abs/2409.19151
Bible.com. (n.d.). Bible.com. Retrieved February 19, 2024, from https://www.bible.com/
Chamoreau, C. (2009). Hablemos purépecha. Universidad Intercultural Indígena de Michoacán.
Chamoreau, C. (2009). Hablemos purépecha. Universidad Intercultural Indígena de Michoacán.
González-Servín, C., Maldonado-Sifuentes, C. E., Sidorov, G., Kolesnikova, O., & Nuñez-Prado, C. J. (2024). Neural approaches to translating Purépecha: A comprehensive study on indigenous language preservation using Transformer networks. Preprint.
Hernández, P. M. (2002). En torno a la traducción automática. Cervantes, 1(2), 101–117.
Huarcaya Taquiri, D. (2020). Traducción automática neuronal para lengua nativa peruana (Doctoral thesis, Universidad Peruana Unión).
Hugging Face. (2024). Hugging Face Transformers documentation. Retrieved February 6, 2024, from https://huggingface.co/docs/transformers/index
Instituto Nacional de Estadística, Geografía e Informática. (1996). Hablantes de lengua indígena: Perfil sociodemográfico. INEGI.
Joshi, R., Singla, K., Kamath, A., Kalani, R., Paul, R., Vaidya, U., Chauhan, S. S., Wartikar, N., & Long, E. (2024). Adapting multilingual LLMs to low-resource languages using continued pre-training and synthetic corpus. arXiv. https://arxiv.org/abs/2410.14815
Liao, Y.-C., Yu, C.-J., Lin, C.-Y., Yun, H.-F., Wang, Y.-H., Li, H.-M., & Fan, Y.-C. (2024). Learning-from-mistakes prompting for indigenous language translation. arXiv. https://arxiv.org/abs/2407.13343
Lin, C.-Y. (2004, July). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out (pp. 74–81). Association for Computational Linguistics. https://www.aclweb.org/anthology/W04-1013
Mager, M., & Meza, I. (2021). Retos en construcción de traductores automáticos para lenguas indígenas de México. Digital Scholarship in the Humanities, 36(Supplement_1), i43–i48. https://doi.org/10.1093/llc/fqz093
Merx, R., Mahmudi, A., Langford, K., de Araujo, L. A., & Vylomova, E. (2024). Low-resource machine translation through retrieval-augmented LLM prompting: A study on the Mambai language. arXiv. https://arxiv.org/abs/2404.04809
Nag, A., Mukherjee, A., Ganguly, N., & Chakrabarti, S. (2024). Cost performance optimization for processing low-resource language tasks using commercial LLMs. arXiv. https://arxiv.org/abs/2403.05434
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318). https://doi.org/10.3115/1073083.1073135
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318). https://doi.org/10.3115/1073083.1073135
Parra Escartín, C. (2018). ¿Cómo ha evolucionado la traducción automática en los últimos años? La Linterna del Traductor.
Tonja, A. L., Kolesnikova, O., Arif, M., Gelbukh, A., & Sidorov, G. (2022). Improving neural machine translation for low-resource languages using mixed training: The case of Ethiopian languages. In MICAI 2022 (pp. 30–40). Springer.
Tonja, A. L., Kolesnikova, O., Gelbukh, A., & Sidorov, G. (2023). Low-resource neural machine translation improvement using source-side monolingual data. Applied Sciences, 13(2), 1201.
Tonja, A. L., Maldonado-Sifuentes, C., Mendoza Castillo, D. A., Kolesnikova, O., Castro-Sánchez, N., Sidorov, G., & Gelbukh, A. (2023). Parallel corpus for indigenous language translation: Spanish–Mazatec and Spanish–Mixtec. arXiv. https://arxiv.org/abs/2305.17404
Tonja, A. L., Nigatu, H. H., Kolesnikova, O., Sidorov, G., Gelbukh, A., & Kalita, J. (2023). Enhancing translation for indigenous languages: Experiments with multilingual models. arXiv. https://arxiv.org/abs/2305.17406
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998–6008).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Combinatorial Optimization Problems and Informatics

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.