As far back as 2019, at least, BNP Paribas’ Chief Data Officer, Hugues Even, has been discussing how the organization has been adapting to the AI revolution. “We are especially proud of our intelligent translation engine,” he wrote at the time, because it “allows us to preserve one of the primary interests of our customers: protecting their data, while producing translations that meet our expectations.”
Fast forward three years and on October 26, 2022, computer scientists at BNP Paribas published the paper, “Robust Domain Adaptation for Pre-trained Multilingual Neural Machine Translation Models.” One of the paper’s authors, Pirashanth Ratnamogan, Data Scientist at BNP Paribas, spoke to Slator about what motivated the study.
He told Slator that, as a global bank, BNP Paribas regularly needs to translate confidential documents; the contents of which are often so sensitive that any private information in them must not be shared. Thus, using such publicly available tools as Google Translate or DeepL is out of the question.
In addition, the vocabulary used is also very specific to the banking domain. So it was important for BNP Paribas to build a tool tailored to the banking domain and capable of managing specific jargon, Ratnamogan said.
Domain Adaptation for Multilingual Models
Another, more recent challenge, was the rise of multilingual models and how to incorporate them into a custom, internal workflow. “Based on open source packages (e.g., OpenNMT, fairseq), we first internally developed a model based on bilingual transformers, while recently, we switched to multilingual neural machine translation (mNMT) models for scalability purposes,” Ratnamogan explained.
Multilingual models have several advantages compared to their bilingual counterparts, such as reducing operational costs (i.e., a single model deployed for all language pairs) and improving translation quality, especially for low-resource languages, as the authors pointed out in the same paper.
These advantages make mNMT models ideal for real-world applications. However, they are not suitable for specialized industries that require domain-specific translation. As the BNP Paribas researchers pointed out, “Domain adaptation for multilingual models […] is key for real-world applications.”
Rare and Costly
They further noted that for many companies, it is “almost impossible” to train a model from scratch or to fine-tune all language pairs of a pre-trained mNMT model, because doing so calls for access to a large number of resources and specialized data. “In-domain data is rare and more costly to gather” they said, making specialized models (and, therefore, even more multilingual models) harder to train.
Nevertheless, it appears feasible to fine-tune a single pair of a pre-trained mNMT model in a specialized domain, the researchers said — although the risk of loss on generic domains and on other language pairs is still high.
The BNP Paribas team thus explored and presented how to fine-tune a pre-trained mNMT model in a single language pair on a specific domain without losing initial performances on other language pairs and generic data.
Avoiding Overfitting and Aggressive Forgetting
The team found that the best-performing approach is “fine-tuning a pre-trained model with initial layers freezing, for a few steps and with a small learning rate.” This framework, they said, “effectively avoids overfitting and aggressive forgetting on out-of-domain generic data while quickly adapting to in-domain data.”
The experiments relied on M2M100 and mBART using English-to-French data from the medical domain. As Ratnamogan told Slator, “We didn’t want to use our proprietary data to do these experiments.”
Although a dataset of the financial domain, Sedar, is available, “it is super complicated to collect it — especially for a non-academic organization,” he said. So they decided to use open-source medical data, “because it contains a lot of jargon and is extremely specialized.”
“To the best of our knowledge,” the BNP Paribas researchers wrote, “our work is the first exploring domain adaptation in the context of recent pre-trained multilingual neural machine translation systems, while focusing on keeping the model performant in out-of-domain data in all languages.”
They concluded that their framework is one solution for the incremental adaptation of mNMT models and “a call for more research in domain adaptation for multilingual models as it is key for real-world applications.”