Machine translation (MT) research is hitting record highs, large tech companies are accelerating MT product launches, and even Silicon Valley’s venture capitalists have discovered MT as a potentially lucrative niche.
While big tech and venture capital make the headlines, the language industry is working behind to scenes to deploy the technology in operations. Among the leaders in this roll-out is Paris-based Ubiqus, whose CEO Vincent Nguyen is arguably one of the most knowledgeable industry CEOs when it comes to the nitty gritty of how neural machine translation works and how it can be integrated into the translation supply chain.
Ubiqus generates annual revenues of around EUR 75m (USD 85.3m), providing translation, transcription, and summarization. Despite being one of Europe’s largest language service providers, the company has kept a relatively low profile since it was founded in France nearly three decades ago. That changed at SlatorCon Zurich 2018, where CEO Nguyen took advantage of his presentation slot to provide participants with an inside look at how they develop and implement neural MT in live projects and track performance gains.
While Nguyen has a knack for the intricacies of the technology, he understands that clients and linguists may not share his passion. “In the day to day work, people do not care about training and engines. They just want good quality output so that they can deliver the best possible document to the client,” he observed.
Opening his presentation with an introduction to the company, Nguyen explained that Ubiqus started in France in 1991 as a transcription and summarization company. Today, however, over 50% of the company’s work is in translation and interpretation.
In recent years, Nguyen said Ubiqus has been very active in research and development for natural language processing (NLP), which impacts their work in transcription, summarization, and translation. In January 2017, Ubiqus joined the Harvard NLP and Systran’s OpenNMT project, an open source NMT engine currently used by companies like Booking.com for automated translation.
“The improvements in the last 20 months has been so dramatic”
“It’s been 20 months now since [OpenNMT] launched,” Nguyen said. “It’s not a lot, but the improvements in those last 20 months have been so dramatic.”
He said that OpenNMT is the most popular open source NMT engine in the open source development platform GitHub. OpenNMT’s most recent implementation uses the transformer model as, according to Nguyen, it is currently the best framework. “Even Facebook, who used to have their own architecture based on convolutional networks, switched to the transformer from Google and now everyone works with the same technology,” he said.
Nguyen went on to explain that deploying NMT in operations meant facing human and technical challenges.
The human challenge lay in successfully convincing internal and external linguists to adopt the technology, and then introducing it seamlessly to clients. The technical challenge is making sure everything is integrated well and in so doing, support the advocacy of furthering NMT adoption across the organization. Nguyen explained that any disruptions to daily production as a result of introducing NMT may lead to friction and ultimately make it even more challenging to convince people of the value of NMT. For Ubiqus, what was most important once the technical work of building the NMT engine and training the model was done was making sure the technology was integrated in a way that minimized its impact and maintained business as usual.
“People are reluctant to change, everyone knows that,” he said. But it “makes things easier to accept if we put everything into the current workflow, in the way they work on a day to day basis. This is why we need to build a few extra layers and people will feel very at ease with the new processes.”
Devil in the Detail
Nguyen highlighted that a successful roll-out of cutting-edge technology is often at risk of failure due to seemingly minor issues: “I would just say that the devil is in the detail. For instance, everyone handles tags in documents. A tag for bold, for italic just within the document. [The question is] who will have to handle the tags? The translator would not want to reinsert all the tags, but the tags will not be in the output of the machine translation. [Researchers don’t care about tags] they will just say machine translation is plain text to plain text and that is the only way we judge machine translation.” By contrast, he said, “in production, it is completely different. We really have to adapt to the production environment and the workflow. So this is really a focus where we have to pay attention when we are in the business itself.”
“[Researchers don’t care about tags] they will just say machine translation is plain text to plain text and that is the only way we judge machine translation.”
Pushing NMT Adoption
Nguyen said that demonstrating NMT performance to people through system-generated reports is very important—as is listening to feedback when the output is indeed less than satisfactory.
For linguists, this means going as far as to provide reports on NMT performance even when they choose not to use NMT for a specific project. This allows Ubiqus to see just how well NMT would have fared otherwise for the specific project, language pairs, vertical, or client.
Ubiqus also developed its own NMT output score that is “a blended calculation between BLEU score, translation error rate, and Levenshtein edit distance,” Nguyen said. He broadly explained that, based on this metric, NMT output that scores above 80% is good, and above 90% is nearly perfect.
Aside from the scoring system and the reporting, Ubiqus also had to let linguists see the difference for themselves. For external linguists, they offered the same rate of pay for a certain number of weeks or months during which they would use and assess NMT output.
“I can tell you that the NMT output itself is the best advocate of NMT,” he said.
Just Another CAT
As for clients, Ubiqus’ usual approach was to explain to them “NMT is just another CAT tool in our workflow.” Otherwise, Nguyen said, some would not easily understand the difference, while others would associate NMT with low-quality output.
There are nonetheless some clients who completely reject the idea of using MT, in which case, “to us, it’s a discussion that we need to initiate and we are just at the beginning and we need to educate clients,” Nguyen said. “We think that for the next two years, the knowledge of the clients and buyers will become better and better.”
“NMT is just another CAT tool in our workflow.”
Ultimately, it took Ubiqus one and a half years to deploy NMT in operations. Some subsidiaries have already completed adoption, while others are about halfway done. “What’s important is that in house acceptance is highly dependent on ownership by the management team,” Nguyen commented.
Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.
Download Vincent Nguyen’s SlatorCon Zurich 2018 presentation here.