Machine Translation Assessment of Major Providers Causes Stir

Palo Alto, California, March 15th, 2017

The language services industry now has an array of fast, scalable machine translation (MT) options, which is good news for businesses with increasingly demanding localization requirements. However, the industry lacks a vendor-neutral clearinghouse, so public evaluations of machine translation quality are uncommon. Marketing claims are seldom backed by quantitative results.

Last month, Lilt announced the launch of Lilt Labs, a collaborative effort between computational linguists, scientists, and language professionals where anyone can publish quality work, present CAT and MT evaluations, blog about insights, and post-academic research. Lilt Labs’ first article on Machine Translation Quality Evaluation ignited a conversation in the industry about the lack of quantitative assessments from major MT providers, while also highlighting the need for transparency in MT evaluation methods.

An objective assessment of MT is long overdue. In 2005, Common Sense Advisory (CSA Research) analyzed that year’s NIST shoot-out, won by Google. We outlined the need for standardized metrics to benchmark products, and predicted that “vendors will start adding features to beat the tests – as we saw in the 1980s SQL database industry.” What happened since then is that the NIST comparison has gone by the wayside, MT developers publish their own results if and only if they beat competitors, and gaming of MT benchmarks such as BLEU has become a common practice,” noted Don DePalma, Founder and Chief Strategist at CSA Research.

The Labs evaluation combined the standard evaluation protocol from the MT research community with a typical translation workflow in which a human translator progressively translates a document. It assessed not only baseline translation quality, but also the quality of adaptive systems that learn from human feedback. Systems from Google, Microsoft, SDL, SYSTRAN, and Lilt were included. Adaptation and neural models are the two most exciting recent developments in commercially available machine translation, and the Labs evaluation validated that both offer substantial translation quality improvements.

“We want to drastically increase levels of translation productivity by uniting artificial intelligence and translators together, in order to bring our customers into new markets,” said John DeNero, Co-Founder and Chief Scientist at Lilt, and Assistant Professor of Computer Science at UC Berkeley. “This benchmarking is an important step in showing the world that interactive systems are the way forward. We welcome the entire community to join us in publishing results and insights that will accelerate adoption of this powerful technology.”

Anyone wishing to join the discussion or register for more information can do so at

Press Contact: