logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Localizing at Scale for International Growth
    • Design Thinking May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Localizing at Scale for International Growth
    • Design Thinking May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register Now for SlatorCon Remote on May 13th!

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Here Are the Top 10 Most Influential Research Papers on Neural Machine Translation

4 months ago

December 8, 2020

Here Are the Top 10 Most Influential Research Papers on Neural Machine Translation

Machine Translation ·

by Seyma Albarino

On December 8, 2020

4 months ago
Machine Translation ·

by Seyma Albarino

On December 8, 2020

Here Are the Top 10 Most Influential Research Papers on Neural Machine Translation

Roughly half a decade after neural machine translation (NMT) was first deployed by trailblazing language service providers (LSPs) and buyer organizations, it is time to look back at the most innovative research and shed light on how the industry established a new normal.

Slator ranked the most influential research dealing with NMT based on the number of times each paper was cited since publication, averaging citation counts as reported by Semantic Scholar and Google Scholar.

This list focuses exclusively on papers dealing with NMT. So, for example, although the June 2014 paper Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation was cited over 10,000 times, it is not included here because its subject is statistical, not neural, machine translation.

Advertisement

Note that the latest publication date on the list is 2017; which makes sense considering that, as time passes and the field evolves, it becomes more difficult for research to truly break new ground.

This list is also Slator’s way of bidding adieu to the “neural” in neural machine translation, as most of the industry now refers to NMT as simply MT.

#1 Neural Machine Translation by Jointly Learning to Align and Translate
Citations: ≈14,400
Date Published: September 2014
Authors: Dzmitry Bahdanau (Jacobs University Bremen, Germany), Kyunghyun Cho, Yoshua Bengio (Université de Montréal)

The first NMT models typically encoded a source sentence into a fixed-length vector, from which a decoder generated a translation. Bahdanau, Cho, and Bengio identified the fixed-length vector as a glass ceiling for translation quality, particularly for long sentences. The architecture of their proposed model, RNNsearch, focused “only on information relevant to the generation of the next target word.” The authors described the model’s performance as striking, “considering that the proposed architecture, or the whole family of neural machine translation, has only been proposed as recently as this year.”

#2 Effective Approaches to Attention-based Neural Machine Translation
Citations: ≈4,490
Date Published: September 2015
Authors: Minh-Thang Luong, Hieu Pham, Christopher D. Manning (Stanford University)

Inspired by the integration of attentional mechanisms into NMT, allowing models to focus on select parts of source sentences during translation, Luong, Pham, and Manning explored two potentially useful architectures for attention-based NMT: a global approach, which looked at all source words, and a local approach, which looked at a subset of source words each time. When applied to WMT translation tasks between English and German, both setups were shown to improve translation quality, with the local attention yielding significant gains (as measured by BLEU), and the ensemble model establishing new state-of-the-art results for WMT14 and 15.

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. Browse new jobs now.

LocJobs.com I Recruit Talent. Find Jobs

#3 Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Citations: ≈3,250
Date Published: September 2016
Lead Researchers: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi (Google)

Google’s Neural Machine Translation System (GNMT), touted as producing translations “nearly indistinguishable” from human translations, was designed to scale NMT for work in the real world by decreasing training time, accelerating final translation speed, and improving work with rare words. A beam search technique promoted output sentences more likely to cover all the words in source sentences, and “wordpiece” modeling accounted for morphologically-rich languages. Human side-by-side evaluation of simple sentences showed a 60% reduction in translation errors compared to Google’s previous phrase-based production system. The authors concluded that details such as length-normalization and coverage penalties “are essential to making NMT systems work well on real data.”

#4 On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Citations: ≈3,035
Date Published: September 2014
Authors: Kyunghyun Cho, Bart van Merrienboer, Yoshua Bengio (Université de Montréal), Dzmitry Bahdanau (Jacobs University Bremen, Germany)

Researchers compared two NMT models with different kinds of encoders: one, an RNN with gated hidden units, and the other, a gated recursive convolutional neural network (grConv). Although both models were able to produce correct translations of short sentences without unknown words, the quality suffered as sentences grew longer and as more unknown words were included. “It is important to find a way to scale up training a neural network both in terms of computation and memory so that much larger vocabularies for both source and target languages can be used,” the authors wrote, adding that a “radically different approach” might be required for languages with rich morphology.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms. Subscribe Now.

SlatorPod – News, Analysis, Guests

#5 Neural Machine Translation of Rare Words With Subword Units
Citations: ≈2,960
Date Published: August 2015
Authors: Rico Sennrich, Barry Haddow, Alexandra Birch (University of Edinburgh)

Back in 2015, NMT models would “back off” to a dictionary upon encountering rare or unknown words. Sennrich, Haddow, and Birch, however, believed there was a way that NMT systems could handle translation as an “open-vocabulary problem.” If various word classes, such as names, cognates, and loan words, were “translatable via smaller units than words,” then encoding such rare and unknown words as “sequences of subword units” could help an NMT system handle them. The researchers looked at several word segmentation techniques, and their subword models showed improvement over a “back-off dictionary baseline” for the WMT 15 English-German and English-Russian translation tasks.

#6 OpenNMT: Open-Source Toolkit for Neural Machine Translation
Citations: ≈1,050
Date Published: January 2017
Authors: Guillaume Klein, Jean Senellart (Systran), Yoon Kim, Yuntian Deng, Alexander M. Rush (Harvard University)

What makes a helpful, open-source toolkit for NMT? For the Systran and Harvard University researchers behind OpenNMT, it was “modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques.” OpenNMT was designed to prioritize efficiency and modularity, with the goal of supporting NMT research into model architectures, feature representations, and source modalities, and providing a stable framework for production use. At the same time, OpenNMT was also meant to maintain competitive performance and reasonable training requirements. 

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

Data and Research, Slator reports
44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
$380 BUY NOW

#7 Improving Neural Machine Translation Models With Monolingual Data
Citations: ≈1,015
Date Published: June 2016
Authors: Rico Sennrich, Barry Haddow, Alexandra Birch (University of Edinburgh)

Targetside monolingual data was already known to help boost the fluency of phrase-based statistical MT, but this paper demonstrated that it could also be an asset to NMT. Researchers trained NMT models by pairing monolingual training data with automatic back-translations, and then treated this synthetic data as additional training data. Since the monolingual training data could be integrated without changing the neural network architecture, the authors believed their approach held promise for different types of NMT systems, but acknowledged that ultimately its effectiveness would depend on the quality of the NMT system used for back-translation and on the amounts of available parallel and monolingual data.

#8 Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Citations: ≈950
Date Published: November 2016
Lead Researchers: Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat (Google)

Google’s “simple” solution to multilingual NMT quickly turned into something much bigger. Researchers used a single NMT model to enable multilingual NMT by introducing an artificial token at the beginning of each input sentence to specify the required target language; the remaining parameters were unchanged and shared across all languages. Their largest models included up to 12 language pairs and allowed for better translation of many individual pairs. What researchers did not expect, however, was for models to learn to perform bridging between pairs never seen explicitly during training, demonstrating, for “the first time to our knowledge,”  that transfer learning and zero-shot translation were indeed possible for NMT.

Slator Translation and Localization Buyer Report 2020

Slator Translation and Localization Buyer Report 2020

Data and Research, Slator reports
11 translation and localization buyer features from 2020 plus typical buyer job titles and Slator's language industry market matrix.
$68 BUY NOW

#9 On Using Very Large Target Vocabulary for Neural Machine Translation
Citations: ≈785
Date Published: December 2014
Authors: Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio (Université de Montréal)

For all of NMT’s gains over statistical MT, large vocabulary still posed a challenge; as the number of target words grew, the training and decoding complexity increased exponentially. To make use of a very large target vocabulary without increasing the training complexity, a team of Montreal-based researchers proposed a new method, based on importance sampling, where decoding focused on only a small subset of the whole target vocabulary. Models trained this way matched and sometimes outperformed baseline models with a small vocabulary.

#10 Modeling Coverage for Neural Machine Translation
Citations: ≈520
Published: January 2016
Authors: Zhaopeng Tu, Zhengdong Lu, Xiaohua Liu, Hang Li (Huawei Technologies, Hong Kong), Yang Liu (Tsinghua University, Beijing)

The attention mechanism, credited with boosting state-of-the-art NMT by jointly learning to align and translate, is a bit of a double-edged sword, since it can also ignore past alignment, contributing to over- and under-translation. Feeding a coverage vector to the attention model to help it focus more on untranslated words can mitigate these issues. The two models proposed and explored in this paper — linguistic coverage (which leverages more linguistic information) and NN-based coverage (which resorts to the flexibility of neural network approximation) — both achieved “significant improvements in terms of translation quality and alignment quality over NMT without coverage.”

Finally, read this Twitter thread for a short discussion on the merits of using citation count as a measure of influence.

A bit disconcerted to see the "top influential NMT papers" by @slatornews. Citation count of a paper is not an accurate metric for this. How about asking MT researchers which works they think are influential? Would be nice as a contrast.https://t.co/YEkEf6FalH

— Mathias Müller (@bricksdont) December 9, 2020

TAGS

Alexander RushAlexandra BirchAMTCLBarry HaddowBLEUChristopher D. ManningDzmitry BahdanauGoogleGoogle ScholarGuillaume KleinHang LiHarvardHieu PhamHuaweiJacobs University BremenJean SenellartKyunghyun Chomachine translationMaxim KrikunMelvin JohnsonMike SchusterMinh-Thang LuongMohammad Norouzineural machine translationNikhil ThoratNMTQuoc V. LeRico SennrichRoland MemisevicSébastien JeanSemanticScholarStanford UniversitySystranTsinghua UniversityUniversité de MontréalUniversity of EdinburghWMTXiaohua LiuYang LiuYonghui WuYoon KimYoshua BengioYuntian DengZhaopeng TuZhengdong LuZhifeng Chen
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Pro Guide: Translation Pricing and Procurement

Pro Guide: Translation Pricing and Procurement

by Slator

Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Press Releases

See all
MasterWord Services Inc. Names Jeanette Stewert as Vice President of Operations

MasterWord Services Inc. Names Jeanette Stewert as Vice President of Operations

by MasterWord

XTRF Welcomes Roberto Ganzerli to Its Advisory Board

XTRF Welcomes Roberto Ganzerli to Its Advisory Board

by XTRF

Venga Reshapes Language Review with InQA Cloud Application

Venga Reshapes Language Review with InQA Cloud Application

by Venga Global

Upcoming Events

See All
  1. SlatorCon Remote May 2021

    by Slator

    · May 13 @ 3:00 pm - 8:00 pm

    A rich online conference which brings together our research and network of industry leaders.

    More info $110

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

SeproTec

SeproTec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Google Translate Not Ready for Use in Medical Emergencies But Improving Fast — Study

Google Translate Not Ready for Use in Medical Emergencies But Improving Fast — Study

by Seyma Albarino

The Slator 2021 Language Service Provider Index

The Slator 2021 Language Service Provider Index

by Slator

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,800 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.