3 months ago
December 20, 2018
Amazon, Microsoft, And IBM Double Down on Custom Neural Machine Translation
Months after Google’s public beta release of its custom neural machine translation (NMT) offering with AutoML Translate in July 2018, other tech giants are pushing their own advances before the year ends.
While Amazon did not announce anything significant regarding Amazon Translate during its giant AWS Re:Invent 2018 conference, the AWS blog did post about the launch of Custom Terminology on November 27, 2018. As the name implies, Custom Terminology lets clients use company- and domain-specific vocabulary with Amazon’s NMT engines.
According to the announcement, when the Custom Terminology feature is used, Amazon Translate scans the client’s terminology files before providing the final output of a translation request. Any exact matches between a terminology entry in the source text is replaced with the user’s proposed translation of the term.
—Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment—
Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations
The announcement also clarified that the feature is pretty much a find and replace function: “at this point, the Custom Terminology feature is an override mechanism. It does NOT train a custom model based on your organization’s terminology.”
Microsoft Vs. Google in Customized NMT
Meanwhile, Microsoft announced on December 5, 2018 that the Microsoft Custom Translator has been released into “general availability.” Microsoft Custom Translator allows cloud service users to train the company’s stock neural machine translation (NMT) engine with their own data, creating customized, domain-adapted engines.
Custom Translator was originally announced in May 2018, during Microsoft’s Build 2018 event. However, the initial release was only a private beta, or what Microsoft calls a “preview.”
Microsoft’s pricing for Custom Translator follows its tiered strategy for NMT. Custom Translator is free for the first two million characters “of any combination of standard translation and custom training” per month. Beyond that, pricing follows a pay-as-you-go model with volume discounts starting at USD 10 per million characters of standard translation and USD 40 per million characters of custom translation.
Training is priced separately at USD 10 per million source and target characters of training data with a cap of USD 300 per training session. Meanwhile, it costs USD 10 per hosted custom translation model per region, per month. Microsoft Translator currently supports over 60 languages, with 41 marked as supported by NMT.
Custom Translator is a direct competitor of AutoML Translate. For comparison, Google’s Cloud Translation costs USD 20 per million characters for up to a billion characters monthly, while AutoML Translation costs USD 76 per hour of training time after the first two hours. Additionally, AutoML Translation’s “prediction” capability costs USD 80 per million characters after the first 500,000 characters.” Additionally, AutoML Translate supports 50 language pairs bidirectionally.
One difference between Custom Translator and AutoML Translate appears to be training data. AutoML Translate requires strictly parallel segments, while Custom Translator can take bilingual training data that has not been parallelized, as well as monolingual data to supplement the parallel training data. Allowing users to upload monolingual data may mean that Microsoft’s solution is particularly appealing to the many end-buyers of translation who do not themselves have access to a large corpus of parallel / bilingual data.
IBM Focuses on Format Preservation
While IBM has kept a relatively low profile in neural MT of late, the company does have its own NMT offering in Watson Language Translator. On December 6, IBM announced that Watson now preserves document formatting between source and target translations, a feature that should help NMT to be deployed in company operations.
The question of preserving formatting within machine translation output was also raised by Ubiqus CEO Vincent Nguyen during the speaker’s panel in SlatorCon Zurich 2018.
“As an LSP, you will focus on the details”, Nguyen said. “For instance, everyone handles tags in documents. A tag for bold, for italic just within the document. Who will have to handle the tags? The translator would not want to reinsert all the tags, but it will not be in the output of the machine translation,” he said, adding that researchers only focus on plain text output. “In a production area, it’s completely different. We really have to adapt to the production environment and the workflow,” the Ubiqus CEO concluded.
IBM’s announcement lists supported file types from Microsoft Office to Open-Office, as well as miscellaneous file types including PDF and HTML. The announcement also indicated support for 22 languages.
As for pricing, IBM distinguishes between Lite, Standard, and custom pricing plans. Lite lets users translate one million characters free every month, while the Standard plan allows users to translate 250,000 characters free every month and charges USD 2c per thousand characters above that.
This pricing structure attracted some attention on Twitter, with Prof. Andy Way of the EU ADAPT Center wondering about the “odd” configuration: “the Standard plan is more expensive than Lite if you translate less than a million chars. Why not give you that for free, and start charging after that?”