How to Extend Multilingual Machine Translation Models to New Languages

Extending Multilingual Machine Translation

In a paper published on November 14, 2023, researchers from the Ludwig Maximilian University of Munich proposed a novel approach called Imit-MNMT to extend large-scale multilingual neural machine translation (MNMT) models to new languages without, as they claim, compromising the translation performance of the original language pairs.

The key challenge is to achieve this extension using only a parallel corpus between the new language and English, a scenario prone to performance deterioration in existing languages.

Wen Lai, Viktor Hangya, and Alexander Fraser explained that approximately 7,000 languages are spoken globally, and a substantial number of language pairs face challenges due to a shortage of resources essential for training machine translation (MT) models. “How to extend the existing MNMT models is a significant problem,” they highlighted.

Imit-MNMT employs a learning technique that mimics the behavior of an expert MNT model, widely used in various research areas — such as robot learning and computer vision — but less explored in natural language processing (NLP). The method involves constructing a pseudo multi-parallel corpus by pivoting through English and imitating the output distribution of the original MNMT model.

“This is the first work that extends the MNMT model using imitation learning,” the researchers emphasized.

Avoiding “Catastrophic Forgetting”

To avoid issues like catastrophic forgetting, where the training process prioritizes adaptation to the new language at the expense of the original language pair, the authors used separate expert and learner models for generating pseudo-corpora and updating model parameters. 

More specifically, the original MNMT model was treated as an expert and kept frozen. Instead of updating this expert model directly with the pseudo-corpus, a separate learner model was trained. The learner model can develop the ability to translate between the new language and the original language by weighting the importance of each language and imitating the translation behavior of the expert model.

According to the authors, this separation of the expert and learner models can mitigate the impact of noise (i.e., the introduction of disruptive elements in the form of inaccurate or irrelevant data in the pseudo-corpus for new languages) and ensure effective learning for the new languages without compromising the performance of the original language pairs.

“Our experiments show that the use of separate expert and learner models is crucial to avoid catastrophic forgetting and achieve good learning performance on the new languages,” the researchers stated.

The results revealed that Imit-MNMT outperformed baselines, demonstrating improved translation performance without catastrophic forgetting. “Our approach outperforms several robust baseline systems, showcasing its superior performance,” the researchers said. 

Furthermore, it effectively addresses common issues in large-scale MNMT models, namely the copying problem (where certain words are excessively copied by the models from the source side to the target side instead of being accurately translated) and the off-target problem (when the MNMT model translates the text into an incorrect language). 

The researchers concluded that Imit-MNMT is a “promising solution” for extending MNMT models to new languages, especially in scenarios with limited training resources.