logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register For Email Marketing for Freelance Linguists and Learn How To Win New Clients.

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
How to Improve Automatic Machine Translation Evaluation? Add Humans, Scientists Say

3 weeks ago

February 4, 2021

How to Improve Automatic Machine Translation Evaluation? Add Humans, Scientists Say

Machine Translation ·

by Seyma Albarino

On February 4, 2021

3 weeks ago
Machine Translation ·

by Seyma Albarino

On February 4, 2021

How to Improve Automatic Machine Translation Evaluation? Add Humans, Scientists Say

A group of researchers has developed a leaderboard to automate the quality evaluation of natural language processing (NLP) programs, including machine translation (MT). The leaderboard, known as GENIE, was discussed in a January 17, 2021 paper on preprint server arXiv.org.

A leaderboard records automatically computed evaluation metrics of NLP programs. Over time, a leaderboard can help researchers compare apples to apples by standardizing comparisons of newer NLP programs with previous state-of-the-art approaches.

Automatic evaluation of MT is notoriously challenging due to the wide range of possible correct translations. The existing metrics to measure MT, in particular BLEU and ROUGE, fall short by diverging significantly from human evaluations; tuning MT models to maximize BLEU scores has even been linked to biased translations.

Advertisement

More generally, as MT quality has improved and produced more nuanced differences in output, these systems have struggled to keep apace with more sophisticated MT models (SlatorPro). 

It follows, then, that academics and tech companies alike will search for a more efficient, standardized method of human evaluation. (For example, Facebook patented a method for gathering user engagement data to rate MT in 2019.) 

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

Data and Research, Slator reports
44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
$380 BUY NOW

The researchers behind GENIE believe they are on the right path. The group comprises Daniel Khashabi, Jonathan Bragg, and Nicholas Lourie of Allen Institute for AI (AI2); Gabriel Stanovsky from Hebrew University of Jerusalem; Jungo Kasai from University of Washington; and Yejin Choi, Noah A. Smith, and Daniel S. Weld, who are affiliated with both AI2 and the University of Washington.

“We must actively rethink the evaluation of AI systems and move the goalposts according to the latest developments,” Khasabi wrote on his personal website, explaining that GENIE was built to present “more comprehensive challenges for our latest technology.”

Dynamic Crowdsourcing

GENIE is billed as offering “human-in-the-loop” evaluation, which it provides via crowdsourcing. The process begins when a researcher makes a leaderboard submission to GENIE, which then automatically crowdsources human evaluation from Amazon Mechanical Turk.

Once human evaluation is complete, GENIE ranks the model relative to previous submissions. Users can view and compare models’ performance either in a task-specific leaderboard or in a meta leaderboard that summarizes statistics from individual leaderboards.

In addition to MT, there are currently three other task-specific leaderboards: question answering, commonsense reasoning, and summarization.

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. Browse new jobs now.

LocJobs.com I Recruit Talent. Find Jobs

The authors encourage researchers and developers to submit new text generation models for evaluation. According to a VentureBeat article on GENIE, the plan is to cap submission fees at USD 100, with initial submissions paid by academic groups. After that, other options may come into play, such as a sliding scale whereby payments from tech companies help subsidize the cost for smaller organizations.

“Even upon any potential updates to the cost model, our effort will be to keep the entry barrier as minimal as possible, particularly to those submissions coming from academia,” the authors wrote.

Reporting a Gold Standard

Of course, GENIE has a ways to go before becoming ubiquitous in NLP. The authors acknowledge that their system will require “substantial effort in training annotators and designing crowdsourcing interfaces,” not to mention the costs associated with each.

Procedures for quality assurance of human evaluation have also yet to be finalized. In particular, the researchers note that human evaluations are “inevitably noisy,” so studying the variability in human evaluations is a must.

Pro Guide Sales and Marketing for Language Service Provider and Translation and Localization Companies (Product)

Pro Guide: Sales and Marketing for Language Service Providers

Data and Research, Slator reports
36 pages. How LSPs generate leads, hire and compensate Sales staff, succeed in Digital Marketing, and benchmark against rivals.
$260 BUY NOW

Another concern is the reproducibility of human annotations over time across individuals. The authors suggest estimating annotator variance and spreading annotations over several days to make human annotations more reproducible.

Besides standardizing high-quality human evaluation of NLP systems, GENIE aims to free up model developers’ time; instead of designing and running evaluation programs, they can focus on what they do best. As a “central, updating hub,” GENIE is meant to facilitate an easy submission process with the ultimate goal of encouraging researchers to report their findings. 

TAGS

Allen Institute for AIBLEUGENIEHebrew University of Jerusalemnatural language processingNLPROUGEUniversity of Washington
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Iconic Launches INTRA Translation Platform

Iconic Launches INTRA Translation Platform

by Iconic

Pangeanic Is Now Certified to ISO 27001 Information Security

Pangeanic Is Now Certified to ISO 27001 Information Security

by Pangeanic

VSI Acquires Leading Brazilian Dubbing Studio, Vox Mundi

VSI Acquires Leading Brazilian Dubbing Studio, Vox Mundi

by VSI

Upcoming Events

See All
  1. Multilingual Winter Series

    Let’s Talk About the Future of the Localization Industry

    by Lionbridge

    · February 25

    Participate in an easy-paced 90-minute conversation with the minds that lead and influence the direction of the...

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

by Marion Marking

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

by Seyma Albarino

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,500 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.