logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
NVIDIA Engineer on Machine Translation Research’s State-of-the Art and Future Direction

1 year ago

September 24, 2019

NVIDIA Engineer on Machine Translation Research’s State-of-the Art and Future Direction

Features ·

by Seyma Albarino

On September 24, 2019

1 year ago
Features ·

by Seyma Albarino

On September 24, 2019

NVIDIA Engineer on Machine Translation Research’s State-of-the Art and Future Direction

Academic research has been foundational to progress in neural machine translation, according to Senior Deep Learning Engineer Chip Huyen. But, as she explained to the audience at SlatorCon San Francisco 2019, there are still discrepancies between research environments and industry realities.

Huyen’s own background is a mix of academia and industry; her résumé includes time at Netflix and experience teaching Tensorflow for Deep Learning Research at Stanford, her alma mater. In 2018, she joined NVIDIA, a company that builds the hardware that brings AI into production. NVIDIA is the inventor of the GPU, or graphics processing unit, which brings the raw computational power to AI.

Huyen opened her presentation by revisiting the transition from statistical, phrase-based machine translation to today’s neural machine translation (NMT) models and frameworks. She explained that while the output of older models might have been more predictable, it lacked the natural fluency of today’s neural models.

Advertisement

NMT, however, continues to have limitations. It typically requires massive amounts of data, and the translation quality tends to degrade the longer the sentence.

Huyen said that one main goal of current research is to decrease the reliance on data. “In research, we work with datasets that are millions of sentence pairs,” she said. “And when we work with our clients, we ask them how much data they have and they say, ‘A lot, like, 10,000 pairs.’ We say, ‘That’s not enough.’”

One method, which Huyen explored in a recent study, is to support training with monolingual data, rather than source language, to target language data.

Another option is leveraging similar languages that share common sub-words, and pairing a low-resource language with a high-resource language. In 2016, Google Translate did just that, pairing Azerbaijani, a low-resource language, with Turkish. This pairing improved Google Translate’s work from Azerbaijani into English. On a larger scale, it demonstrated the system’s ability to translate between pairs of languages it had not encountered previously.

Slator 2020 Language Industry Market Report

Data and Research, Slator reports
55 pages. Total market size, biz dev and sales insights, TMS & MT review, buyer segment analysis, M&A, Covid impact & outlook.
$480 BUY NOW

Building on this success, the Google AI team published a research paper in July 2019 describing efforts to “[build] a universal neural machine translation (NMT) system capable of translating between any language pair.” The system was trained using over 25 billion examples and is capable of handling 103 languages.

In Reality, No Sentence Is Too Long

Increasing the memory of a neural system can condition it to handle longer sequences, which is important when NMT is used outside of research. “In industry, you can’t say, ‘This sentence is too long and we’re not going to translate it,’” Huyen said.

Huyen cited Transformer XL as one tool that is used to break long sentences into shorter sequences. As the system processes a text, Transformer XL then uses hidden states from the previous sentence to help with the current sequence. Using context rather than sentence representations as another input for the system can also help improve memory, she added. 

Feeling Bleu

As NMT becomes more refined, the need for effective quality evaluation techniques becomes more obvious.

ROUGE and BLEU, perhaps the most familiar method, measure n-gram overlapping; that is, how much reference text and the translated output overlap. (Another well-known technique, NIST, provides a weighted BLEU score.)

Although BLEU is still widely used in academia, Huyen pointed out that its reliance on reference text makes it impractical for industry use.

“You need to enumerate all of the possible translations, which is near-impossible”

“You need to enumerate all of the possible translations, which is near-impossible,” she said. Compounding the reference text requirement, BLEU does not take into account semantics, and does not map human judgment well.

Given the shortcomings of quality evaluation techniques like BLEU, quality estimation aims to predict the quality of machine translation output without using reference texts (e.g., to make it possible to estimate post-editing time). Huyen described quality estimation as “very under-explored in research,” noting that it seems to be “mostly driven from industry and not from academia.”

There are other hurdles as well. It is difficult to convince people of the merits of a new metric, and, at the moment, there is no real way of replicating human judgment, she explained.

Huyen’s own research has included developing a matrix to evaluate machine translation output without a reference text. The project, MT Evaluation Without Reference (MEWR), sought to evaluate translations by comparing their style and content to those of source sentences. The resulting fidelity score had a strong correlation with the corresponding BLEU score, and weaker correlations relative to fluency and human judgment.

Not Yet

Much of the current research on NMT is interrelated. One priority is to improve translations of entire documents. Focusing on the document as a whole may promote the use of new evaluation or estimation techniques, because BLEU tends to focus on quality on a sentence-by-sentence basis.

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

Data and Research
32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations
$85 BUY NOW

This could also allow systems to adapt to multiple domains within one dataset. “In research, all of our datasets are really well-defined, so you have datasets on news or sciences or movie dialogues,” Huyen said, “but in real life people have conversations about a variety of topics.”

Lastly, Huyen predicts the future may bring more opportunities for what she calls “hybrid human-machine translation.”

“Wherever I go, people keep asking me if AI is going to replace translators,” Huyen said. Based on the challenges MT still faces, she said, “I guess the answer is not yet.”

SCSF19 Presentation Chip (NVIDIA)

676 KB

DOWNLOAD

TAGS

AIartificial intelligenceBLEUBLEU scoreChip Huyendatadeep learningmachine learningmachine translationneural machine translationNISTNMTNvidiaROUGESlatorConSlatorCon San Francisco 2019
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Rheinschrift Language Services – Strategic Improvements and Workforce Expansion in 2021

Rheinschrift Language Services – Strategic Improvements and Workforce Expansion in 2021

by Rheinschrift Language Services

Memsource Acquires Phrase

Memsource Acquires Phrase

by Memsource

Across Systems will be part of the Volaris Group

Across Systems will be part of the Volaris Group

by Across Systems GmbH

Upcoming Events

See All
  1. Handling Sensitive Information Webinar

    Handling Sensitive Calls with Limited English Proficient Consumers

    by Lionbridge

    · February 10

    Learn more about how Lionbridge Over-the-Phone Interpretation Services can help bridge communication gaps with limited...

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

The Most Popular Language Industry Stories of 2020

The Most Popular Language Industry Stories of 2020

by Seyma Albarino

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.