Microsoft is pulling the AI card. According to a recent blog post, Microsoft Translator now has new customization features that leverage the company’s vast AI resources. “Microsoft Translator is a direct implementation of Microsoft’s AI efforts in Machine Learning, with Deep Neural Networks and Long Short-Term Memory, and the [Microsoft Translator] Hub makes this accessible to anyone interested in building custom machine translation systems,” explained Chris Wendt. Wendt is Microsoft Research Machine Translation’s Principal Group Program Manager.
Slator reached out to Wendt to better understand the new AI-enabled customization features of Microsoft Translator. According to their announcement, the new features allow users to create category-trained statistical machine translation systems without the need for a large, existing corpus of previously translated material. Users basically merge what corpus they have with Microsoft’s own, which consists of billions of words, according to Wendt.
There are now four general levels of customization available for Microsoft Translator API users:
- Standard category use: Users can pick a standard category for which to train translation context. Right now the only categories are “tech” for computer-related content and “speech” for spoken text. The “speech” category was developed alongside the Skype Translator in the past 18 months. Users can still choose the default category setting. Microsoft plans to roll out more, but Wendt was not able to elaborate on future plans.
- Custom dictionary uploads: Users can upload their dictionaries to further train their systems for category-specific translation.
- Training material with 1,000 to 5,000 parallel sentences (pre-translated sentences from original to target language): Users can provide test sentences and Microsoft Translator will use 1,000 to 5,000 similar parallel sentences from its corpus to tune the internal parameters of their translation systems.
- Training material with 5,000 parallel sentences: A higher level of category-specific training. The Microsoft Translator Hub already allowed users to train their systems using 10,000 parallel sentences. Now the threshold has been lowered to 5,000.
Wendt told Slator that “the training material [for the new customization features] is composed of web documents, mostly from the Bing Search index, from licensed material, and from internally created material.”
He added that the custom features allow users “to build customized systems with far fewer parallel sentences than was possible before,” but that users can still train systems with large amounts of documents.
“The custom systems of all sizes automatically benefit from the algorithmic improvements of the Microsoft Translator engine, for instance the introduction of deep neural networks,” he said.
Asked about what sort of improvements in the efficiency of translation workflows users can expect from these new developments, Wendt said the new features effectively enable users “to build custom systems with less training material than was previously required, and still achieve a quality gain.” Speaking of quality gain, Wendt said Microsoft Translator improves steadily, month over month, through integrating new training material and improving algorithms for training and runtime.
He also noted, however, that quality gains differ by user scenarios: “Some post-editors report productivity gains of more than 50% for the given domain and language pair, and required target quality. Some report no productivity gain at all.”
In the end though, Wendt said there is an improvement in quality. “Hub users achieve an average improvement of 10 BLEU points over the uncustomized system, measured on a user’s own test set,” Wendt said. BLEU (bilingual evaluation understudy) is an algorithm that evaluates machine-translated text by comparing it with human translation. “There is enough evidence that a higher BLEU score is representative of higher human perceived quality and of increased productivity in post-editing,” according to Wendt.
“There are other self-service customization systems available,” Wendt said, “None of them with the breadth, depth, language, and domain coverage that come with the Microsoft Translator Hub by default.”
With these additional customization options, Microsoft’s Translator Hub may become a significant headache for a number of Moses-based statistical machine translation startups, whose unique selling point rests on their claim of extreme niche customization.
Image source: 360b / Shutterstock