Evaluating the Impact of Domain Adaptation on Transformer-based Models for Low-Resource Purépecha-Spanish Translation

Authors

DOI:

https://doi.org/10.61467/2007.1558.2026.v17i2.1265

Keywords:

Purépecha, low-resource NMT, domain adaptation, Transformer, mBART-50, Marian (OPUS-MT), BLEU, ROUGE

Abstract

This work evaluates how domain adaptation affects Transformer-based neural machine translation (NMT) for the low-resource Purépecha–Spanish pair. Building on a system fine-tuned on a verse-aligned Bible corpus, we introduce an out-of-domain grammar-book dataset (1,626 sentence pairs: 1,297 used for adaptation, 329 held out for testing) to quantify (A) zero-shot transfer (Bible→G-test) versus (B) adaptation (Bible+G-train→G-test). Using BLEU and ROUGE, zero-shot performance is weak for Marian (BLEU=0.2272) and mBART-50 (BLEU=1.9992), revealing substantial domain mismatch. After adaptation, scores rise sharply: Marian reaches BLEU=21.2699, mBART-50 achieves BLEU=28.8776, with parallel gains in ROUGE (e.g., mBART-50 ROUGE-L=0.5791). Qualitatively, adaptation reduces repetitive/degenerate outputs and improves handling of metalinguistic terminology and everyday constructions. These results show that multilingual pretrained Transformers + lightweight in-domain data provide strong improvements for low-resource NMT under domain shift and highlight the value of diverse domains and speaker-informed evaluation.

 

Smart citations: https://scite.ai/reports/10.61467/2007.1558.2026.v17i2.1265
Dimensions.
Open Alex.

References

Abrego-Mendoza, S., Angel, J., Meque, A. G. M., Maldonado-Sifuentes, C., Sidorov, G., & Gelbukh, A. (2023). Comparison of translation models for low-resource languages. In Mexican International Conference on Artificial Intelligence (MICAI 2023).

Aycock, S., Stap, D., Wu, D., Monz, C., & Sima’an, K. (2024). Can LLMs really learn to translate a low-resource language from one grammar book? arXiv. https://arxiv.org/abs/2409.19151

Bible.com. (n.d.). Bible.com. Retrieved February 19, 2024, from https://www.bible.com/

Chamoreau, C. (2009). Hablemos purépecha. Universidad Intercultural Indígena de Michoacán.

Chamoreau, C. (2009). Hablemos purépecha. Universidad Intercultural Indígena de Michoacán.

González-Servín, C., Maldonado-Sifuentes, C. E., Sidorov, G., Kolesnikova, O., & Nuñez-Prado, C. J. (2024). Neural approaches to translating Purépecha: A comprehensive study on indigenous language preservation using Transformer networks. Preprint.

Hernández, P. M. (2002). En torno a la traducción automática. Cervantes, 1(2), 101–117.

Huarcaya Taquiri, D. (2020). Traducción automática neuronal para lengua nativa peruana (Doctoral thesis, Universidad Peruana Unión).

Hugging Face. (2024). Hugging Face Transformers documentation. Retrieved February 6, 2024, from https://huggingface.co/docs/transformers/index

Instituto Nacional de Estadística, Geografía e Informática. (1996). Hablantes de lengua indígena: Perfil sociodemográfico. INEGI.

Joshi, R., Singla, K., Kamath, A., Kalani, R., Paul, R., Vaidya, U., Chauhan, S. S., Wartikar, N., & Long, E. (2024). Adapting multilingual LLMs to low-resource languages using continued pre-training and synthetic corpus. arXiv. https://arxiv.org/abs/2410.14815

Liao, Y.-C., Yu, C.-J., Lin, C.-Y., Yun, H.-F., Wang, Y.-H., Li, H.-M., & Fan, Y.-C. (2024). Learning-from-mistakes prompting for indigenous language translation. arXiv. https://arxiv.org/abs/2407.13343

Lin, C.-Y. (2004, July). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out (pp. 74–81). Association for Computational Linguistics. https://www.aclweb.org/anthology/W04-1013

Mager, M., & Meza, I. (2021). Retos en construcción de traductores automáticos para lenguas indígenas de México. Digital Scholarship in the Humanities, 36(Supplement_1), i43–i48. https://doi.org/10.1093/llc/fqz093

Merx, R., Mahmudi, A., Langford, K., de Araujo, L. A., & Vylomova, E. (2024). Low-resource machine translation through retrieval-augmented LLM prompting: A study on the Mambai language. arXiv. https://arxiv.org/abs/2404.04809

Nag, A., Mukherjee, A., Ganguly, N., & Chakrabarti, S. (2024). Cost performance optimization for processing low-resource language tasks using commercial LLMs. arXiv. https://arxiv.org/abs/2403.05434

Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318). https://doi.org/10.3115/1073083.1073135

Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318). https://doi.org/10.3115/1073083.1073135

Parra Escartín, C. (2018). ¿Cómo ha evolucionado la traducción automática en los últimos años? La Linterna del Traductor.

Tonja, A. L., Kolesnikova, O., Arif, M., Gelbukh, A., & Sidorov, G. (2022). Improving neural machine translation for low-resource languages using mixed training: The case of Ethiopian languages. In MICAI 2022 (pp. 30–40). Springer.

Tonja, A. L., Kolesnikova, O., Gelbukh, A., & Sidorov, G. (2023). Low-resource neural machine translation improvement using source-side monolingual data. Applied Sciences, 13(2), 1201.

Tonja, A. L., Maldonado-Sifuentes, C., Mendoza Castillo, D. A., Kolesnikova, O., Castro-Sánchez, N., Sidorov, G., & Gelbukh, A. (2023). Parallel corpus for indigenous language translation: Spanish–Mazatec and Spanish–Mixtec. arXiv. https://arxiv.org/abs/2305.17404

Tonja, A. L., Nigatu, H. H., Kolesnikova, O., Sidorov, G., Gelbukh, A., & Kalita, J. (2023). Enhancing translation for indigenous languages: Experiments with multilingual models. arXiv. https://arxiv.org/abs/2305.17406

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998–6008).

Downloads

Published

2026-02-16

How to Cite

González Servín, C., Maldonado Sifuentes, C. E., Kolesnicova, O., & Sidorov, G. (2026). Evaluating the Impact of Domain Adaptation on Transformer-based Models for Low-Resource Purépecha-Spanish Translation. International Journal of Combinatorial Optimization Problems and Informatics, 17(2), 27–37. https://doi.org/10.61467/2007.1558.2026.v17i2.1265

Issue

Section

CINIAI

Most read articles by the same author(s)