logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
‘Human Parity Achieved’ in Machine Translation — Unpacking Microsoft’s Claim

3 years ago

March 15, 2018

‘Human Parity Achieved’ in Machine Translation — Unpacking Microsoft’s Claim

Technology ·

by Gino Diño

On March 15, 2018

3 years ago
Technology ·

by Gino Diño

On March 15, 2018

‘Human Parity Achieved’ in Machine Translation — Unpacking Microsoft’s Claim

A year and a half ago, Google first claimed that its new neural machine translation (NMT) systems can produce some translations that were “nearly indistinguishable” from human output.

But while Google’s “nearly indistinguishable” claim was buried deep on page 18 in the paper’s technical discussion and carefully hedged, Microsoft came out guns blazing saying in the very title of a new research paper that they achieved “human parity” in Chinese to English translation, no less.

According to Microsoft’s March 14, 2018 research paper with the full title of “Achieving Human Parity on Automatic Chinese to English News Translation,” a few variations of a new NMT system they developed have achieved “human parity,” i.e. they were considered equal in quality to human translations (the paper defines human quality as “professional human translations on the WMT 2017 Chinese to English news task”).

Advertisement

Within 24 hours, mainstream tech outlets such as TechCrunch, GeekWire, TechRadar, and ZDNet picked up on the story, predictably taking the human parity claim at face value.

Microsoft came up with a new human evaluation system to come to this convenient conclusion, but first they had to make sure “human parity” was less nebulous and more well-defined.

Microsoft’s definition for human parity in their research is thus: “If a bilingual human judges the quality of a candidate translation produced by a human to be equivalent to one produced by a machine, then the machine has achieved human parity.”

In mathematical, testable terms, human parity is achieved “if there is no statistically significant difference between human quality scores for a test set of candidate translations from a machine translation system and the scores for the corresponding human translations.”

New Human Evaluation Methods

The research team used the 2017 Conference for Machine Translation test set for news (WMT2017 newstest) data for training and testing their new NMT system variants.

The Microsoft team used bilingual human evaluators and presented them with both source text and translation output from the WMT2017 newstest set, and asked them to score the translation from 0 to 100. The top performing engine in the WMT2017 conference was Sogou Inc’s Sogou Knowing NMT system. The researchers also had their evaluators assess the output of Sogou Knowing NMT.

Part of WMT2017’s newstest task. Chinese source and English target translations side-by-side. These are the reference human translations used in the conference.

They showed the evaluators output from nine systems. According to the research paper, there were around 2,000 assessments made per system (at least 1,827 per system).

Ranked from best to worst, according to Microsoft’s human evaluators:

  1. Microsoft’s new NMT engine variation (Combo-6)
  2. Reference human translations used for this research
  3. Microsoft’s new NMT engine variation (Combo-5)
  4. Microsoft’s new NMT engine variation (Combo-4)
  5. WMT2017’s reference translations that were post-edited machine translation
  6. Sogou Knowing NMT
  7. WMT2017’s reference human translations used in the conference
  8. Microsoft’s existing production NMT system
  9. Google’s existing production NMT system

According to Microsoft researchers, the first four are grouped together and are in parity with each other, i.e. their scores are so close as to be indistinguishable from each other.

Microsoft Versus Sogou

Curiously, Microsoft’s research paper also shows that using this new evaluation method, Sogou Knowing NMT’s score is so close to the score of WMT2017’s reference human translations that they are considered indistinguishable.

It appears Microsoft also unintentionally showed using their new evaluation method that Sogou achieved human parity at least in comparison to the WMT2017 reference human translations.

Meanwhile, both Microsoft and Google’s existing production NMT systems scored lowest.

See for yourself: English output of Microsoft’s highest scoring NMT system variation taken from their open source Github link. From the content, it does not appear that average sentence length is very long nor is the verbiage very complex.

They also used Bilingual Evaluation Understudy (BLEU) to measure any gains from previous work that also used BLEU points for scoring, including WMT2017’s rankings of participating NMT engines.

Most of Microsoft’s NMT model setups (10 out of 12, baseline included) reportedly bested Sogou Knowing NMT’s 26.40 BLEU points. Microsoft’s top performing NMT variant beat the state-of-the-art by 1 BLEU at 27.40 points, all using the same training data from WMT2017.

Shiny New Tech and Training Methods

The research team developed new NMT engines for their experiment. They tried recurrent neural networks, convolutional networks, and transformers, and ultimately used the transformer engines reportedly due to better output.

Next, they also upgraded their training regimen.

They employed a recent technique called Dual Learning that allows their model to learn from both source-to-target and target-to-source directions of bilingual training data. They also used Deliberation Networks that uses another decoder layer to “polish” the translations of a first decoder in an NMT system—like an editor polishing the draft of a writer. Additionally, they also employed joint training and agreement regularization.

They basically mixed and matched all these methods to iteratively improve translation output across several variations of the same NMT system.

The Microsoft team also filtered the training data from WMT2017. After cleaning up and filtering the training data, whey were left with 18 million bilingual sentence pairs and around 7 million Chinese and English monolingual sentences.

Future Work

Microsoft made everything about this new research open source, citing external validation and future research as the reason.

As for when, if ever, Microsoft plans to transition their new systems into production, a company spokesperson told ZDNet: “We’re working to bring this to production as soon as possible, but we have nothing to announce at this time.”

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

Data and Research
32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations
$85 BUY NOW

TAGS

machine translationMicrosoftneural machine translationsogou
SHARE
Gino Diño

By Gino Diño

Content strategy expert and Online Editor for Slator; father, husband, gamer, writer―not necessarily in that order.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Rheinschrift Language Services – Strategic Improvements and Workforce Expansion in 2021

Rheinschrift Language Services – Strategic Improvements and Workforce Expansion in 2021

by Rheinschrift Language Services

Memsource Acquires Phrase

Memsource Acquires Phrase

by Memsource

Across Systems will be part of the Volaris Group

Across Systems will be part of the Volaris Group

by Across Systems GmbH

Upcoming Events

See All
  1. Handling Sensitive Information Webinar

    Handling Sensitive Calls with Limited English Proficient Consumers

    by Lionbridge

    · February 10

    Learn more about how Lionbridge Over-the-Phone Interpretation Services can help bridge communication gaps with limited...

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

The Most Popular Language Industry Stories of 2020

The Most Popular Language Industry Stories of 2020

by Seyma Albarino

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.