logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Focusing on BLEU Can Bias Machine Translation Output

2 weeks ago

January 14, 2021

Focusing on BLEU Can Bias Machine Translation Output

Machine Translation ·

by Seyma Albarino

On January 14, 2021

2 weeks ago
Machine Translation ·

by Seyma Albarino

On January 14, 2021

Focusing on BLEU Can Bias Machine Translation Output

A recent paper by top machine translation (MT) researchers concluded that beam search, a very effective way to maximize BLEU scores, can lead to a high rate of misgendered pronouns.

The November 2020 paper, Decoding and Diversity in Machine Translation, is a collaboration between Graham Neubig, Nicholas Roberts, and Zachary C. Lipton at Carnegie Mellon University and Amazon machine learning scientist Davis Liang.

The authors opened by describing the two basic stages of the MT process. In the first, the “modeling” stage, researchers train a conditional language model using neural networks; in the second, the “search” stage, the model searches for the “best” translation using either “greedy decoding” or a beam search to produce predictions.

Advertisement

Beam search, in particular, is very effective at maximizing BLEU scores, “but there is a significant cost to be paid in naturalness and diversity,” the researchers wrote. In practice, this means that MT models typically offer no variability in translations, leading to less engaging output. The researchers also suggested that readers who encounter a given language primarily through these more monotonous translations “might develop a warped exposure to that language.”

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

Data and Research, Slator reports
44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
$380 BUY NOW

Gender pronouns were just one of a number of diversity diagnostics the team introduced in their experiments, but researchers found that even when translating between two gendered languages, search disproportionately chose the more frequent gender, based on the input.

For English to German translations, researchers noted that since the German word “sie” translates as “she,” “they,” or “you” in English, the result was a bias toward the more common gender pronoun, “sie.” By contrast, when translating from French or German to English, male pronouns were more represented in the training set, and the bias skewed male accordingly.

“The singular focus on improving BLEU leaves no incentive to address issues of diversity”

A possible alternative to search might be sampling, which has lower rates of replacing “she” and “her” with male pronouns compared to search. However, the authors warned, the field might not be ready to shift away from search just yet, since sampling does not yield the same consistently high BLEU scores that search does.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms. Subscribe Now.

SlatorPod – News, Analysis, Guests

“The singular focus on improving BLEU leaves no incentive to address issues of diversity,” they wrote. The researchers’ own future work will explore techniques that can achieve high BLEU scores while producing natural-sounding translations.

TAGS

AmazonBLEUCarnegie MellonCarnegie Mellon UniversityDavis LiangGraham NeubigNicholas RobertsZachary C. Lipton
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
iDISC Awarded ISO 27001 Information Security Management Certification

iDISC Awarded ISO 27001 Information Security Management Certification

by iDISC

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.