logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Cambridge Researchers Tackle Neural Machine Translation’s Gender Bias

9 months ago

April 16, 2020

Cambridge Researchers Tackle Neural Machine Translation’s Gender Bias

Machine Translation ·

by Esther Bond

On April 16, 2020

9 months ago
Machine Translation ·

by Esther Bond

On April 16, 2020

Cambridge Researchers Tackle Neural Machine Translation’s Gender Bias

Neural machine translation (NMT) output is only as good as the quality of its training data (i.e., garbage in, garbage out). And it is not only tangible errors in training data that create problems. Social biases contained in training data can also seep into machine translation output. (Sadly, the notion of great data in, great results out does not always hold true, but it certainly does help.)

A new research paper published in April 2020 entitled “Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem,” co-authored by Danielle Saunders and Bill Byrne, seeks to minimize the impact of this specific social bias.

According to her LinkedIn profile, Saunders is a PhD student focusing on statistical and neural machine translation using the Tensorflow framework. She also works part-time as a Research Scientist at language service provider SDL. Byrne is Professor of Information Engineering at Cambridge and Course Director for the University’s MPhil in Machine Learning, Speech, and Language Technology. He was Director of SDL’s UK R&D office until December 2019 and now works part-time for Amazon on Alexa Search.

Advertisement

The natural language processing (NLP) events calendar that normally spurs a flood of new MT research papers is carrying on despite the lockdown. Research publication activity is now high in the run up to the 2020 Annual Conference of the Association for Computational Linguistics (ACL), which will take place online in July 2020.

Gender bias is an evolving area of MT research, and one that is being explored across many disciplines of NLP.

The Gender Bias Problem

As the two researchers from the University of Cambridge, UK, point out, training data tends to contain fewer sentences that refer to women than to men. In short, it is gender-biased. This is problematic in NMT because “gender bias has been shown to reduce translation quality, particularly when the target language has grammatical gender.” In fact, it may even amplify biases.

Not only that, but in gender-inflected languages, gender-biased training data can even lead to “translations with identifiable errors,” the paper reads. For example, they say, mentions of male doctors are more reliably translated than those of male nurses.

Slator 2019 Language Industry M&A and Funding Report

Data and Research, Slator reports
34-page report. Language industry M&A and startup funding. Transaction valuations, trade sales, financial backing, private equity influence, main rationale, seller verticals, geographical analysis, startup funding analysis.
$450 BUY NOW

Google Translate ran into a similar problem back in 2018. Phrases that included words such as “strong” or “doctor” would generally contain masculine pronouns when translated, while instances of “beautiful” and “nurse” would result in translated phrases with feminine pronouns.

The issue prompted Google to update its translation framework so that translations into gender-inflected languages (e.g., French and Spanish) would contain both masculine and feminine variations of the phrase.

For longer phrases, it was more complicated to resolve, and Google made significant changes to its framework. Although Google then claimed that its new system could “reliably produce feminine and masculine translations 99% of the time,” researchers continued to identify a number of shortcomings.

Fine-Tuning Rather Than Training

One way to reduce gender-bias in NMT output is to cut the problem off at the root: remove gender bias in the training data. However, this is too big an undertaking in many instances.

Instead, Saunders and Byrne approached gender debiasing as a domain adaptation problem; that is, by attempting to filter out gender bias in the output with fine-tuning rather than training.

Their intention was to use a form of fine-turning called ‘transfer learning’ on a small dataset that contained only unbiased sentences. They believed they would see “strong and consistent improvements in gender debiasing with much less computational cost than training from scratch.” In addition, this approach allows data privacy to be preserved because the training data itself does not need to be accessed or touched.

SlatorSweep - Daily Market Intelligence

SlatorSweep

Data and Research, Market Intelligence
Curated news from thousands of sources, SlatorSweep’s daily news service gives you a competitive edge on time sensitive market intelligence.
BUY NOW

Using their debiasing model, the researchers also hoped to demonstrate that it was possible to remove gender bias in the output of a number of commercial MT systems: Google, Amazon, Microsoft, and SYSTRAN.

They first created a “tiny, handcrafted profession-based dataset” that would be used for fine-tuning. This dataset contained gender-balanced English sentences, which were later translated into three target languages: German, Spanish, and Hebrew.

Each English sentence contained professions sourced from US labor statistics and was structured as follows: The [PROFESSION] finished [his|her] work.

There were 194 professions and 388 English sentences in total. 

For contrast and to compare the results, the researchers created an approximated counterfactual dataset, in which, for every sentence containing a gendered term, a bias-reversed equivalent was added.

They planned to use the tiny unbiased datasets to remove bias in NMT output through transfer learning. Transfer learning, however, is prone to the phenomenon of “catastrophic forgetting,” which negatively affects translation quality.

To minimize the effects of catastrophic forgetting while preserving gender balance, Saunders and Byrne used two other approaches at their disposal: a regularized training procedure known as ‘Elastic Weight Consolidation’ (EWC), and a two-step lattice rescoring procedure.

Slator Visibility Package - Directory Listing and Press Releases

Visibility Packages

Advertising with Slator, Business Development, Marketing
Increase your visibility, build referral traffic and save money by integrating your Press Releases with a Directory listing.
BUY NOW

The researchers used different training data for their experiments for each of the three language pairs: English-German, English-Spanish, and English-Hebrew. However, “all three datasets have about the same proportion of gendered sentences: 11–12% of the overall set,” they said.

With fine-tuning, the researchers showed that both EWC and lattice rescoring “allow debiasing while maintaining general translation performance.” Lattice rescoring, they said, “although a two-step procedure, allows far more debiasing and potentially no degradation, without requiring access to the original model.”

Saunders and Byrne also showed that lattice rescoring “can be applied to remove gender bias in the output of ‘blackbox’ online commercial MT systems.”

The researchers do not claim to have found the fix for gender bias in NMT and point out that the paper only explores the issue at sentence-level. They do, however, suggest that this small-domain adaptation is “a more effective and efficient approach to debiasing machine translation than counterfactual data augmentation.”

TAGS

Bill ByrneDanielle Saundersgender biasmachine translationMTneural machine translationNMTtransfer learning
SHARE
Esther Bond

By Esther Bond

Research Director at Slator. Localization enthusiast, linguist and inquisitor. London native.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

BeLazy Announces Full Automation for Plunet

BeLazy Announces Full Automation for Plunet

by BeLazy

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.