logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Facebook Wants to Understand the Uncertainty in Neural Machine Translation

3 years ago

March 2, 2018

Facebook Wants to Understand the Uncertainty in Neural Machine Translation

Academia ·

by Gino Diño

On March 2, 2018

3 years ago
Academia ·

by Gino Diño

On March 2, 2018

Facebook Wants to Understand the Uncertainty in Neural Machine Translation

The Facebook AI Research (FAIR) team published a new paper to Arxiv.org on neural machine translation (NMT) on February 28. FAIR team members conducted what they claim is a first-of-its-kind analysis on the inner workings of NMT models.

Specifically, they wanted to understand “uncertainty” in NMT, and propose ways to reduce its negative impact on output.

NMT research is picking up again on Arxiv, Cornell University’s automated online distribution system for research, after a temporary slump during December 2017 and January 2018. Indeed, from November 1, 2017 to February 14, 2018, 46 research papers on NMT were published. Four of those focused on the inner workings of NMT as well.

Advertisement

“There is still a lack of understanding of these models,” FAIR’s paper abstract reads. “Our study relates some of these issues to the inherent uncertainty of the task.”

In particular, FAIR researchers point out that NMT contends with two kinds of uncertainty when performing translations: intrinsic and extrinsic uncertainty.

First, the act of translation itself has intrinsic uncertainty in that a source sentence can be translated into a variety of different target sentences, all of which would mean the same thing while being equally adequate and fluent.

Another intrinsic uncertainty in translation arises from lack of context, be it grammatical or cultural. “Without additional context, it is often impossible to predict the missing gender, tense, or number, and therefore, there are multiple plausible translations of the same source sentence,” the research paper reads.

“Without additional context, it is often impossible to predict the missing gender, tense, or number, and therefore, there are multiple plausible translations of the same source sentence”

Aside from these intrinsic uncertainties, NMT also needs to deal with extrinsic uncertainties often pertaining to noise in the training data.

In their research paper, the FAIR team pointed out a few sources of extrinsic uncertainty:

  • Augmenting high quality, human translated corpora with “lower quality web crawled data”
  • Partial translations in the corpora, and
  • Translations of source sentences that are exact copies of the same source sentences instead of actual translations, at least in the data set they used (the English to German and English to French datasets from the Conference of Machine Translation 2014)—researchers called this source copying.

According to the FAIR team, source copying, in particular, was “interesting since we show that, even in small quantities, it can significantly affect the model output.”

The researchers put forward two methods to mitigate these extrinsic uncertainties:

  • Remove low scoring sentence-pairs according to a model trained with relevant corpora. The FAIR team used the English to German news-commentary portion of Conference of Machine translation 2017.
  • Eliminate the small but high-impact occurences of source copying. The FAIR team used an automated algorithm that prunes parallel sentences that had 50% overlap (indicating a high likelihood of it being a partial or full copy).

The FAIR team also noted that “performance degradation is greatly mitigated” by using both at the same time.

The researchers have open sourced the code to reproduce their analysis, and also released the data collected from their evaluation.

It remains to be seen how Facebook will directly benefit from this research, and there have been no follow-up research or known applications of the FAIR teams previous research on post-editing for NMT using “very simple interactions.”

TAGS

arXivFacebookFAIRneural machine translation
SHARE
Gino Diño

By Gino Diño

Content strategy expert and Online Editor for Slator; father, husband, gamer, writer―not necessarily in that order.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Donna Thomas Joins Visual Data Media Services as Senior Vice President of Sales, Americas

Donna Thomas Joins Visual Data Media Services as Senior Vice President of Sales, Americas

by Visual Data Media Services

iDISC Awarded ISO 27001 Information Security Management Certification

iDISC Awarded ISO 27001 Information Security Management Certification

by iDISC

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.