logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Google, Facebook, Amazon: Neural Machine Translation Just Had Its Busiest Month Ever

3 years ago

April 27, 2018

Google, Facebook, Amazon: Neural Machine Translation Just Had Its Busiest Month Ever

Technology ·

by Gino Diño

On April 27, 2018

3 years ago
Technology ·

by Gino Diño

On April 27, 2018

Google, Facebook, Amazon: Neural Machine Translation Just Had Its Busiest Month Ever

While the FANG stocks (Facebook, Amazon, Netflix, Google) are having a wild ride on Nasdaq, neural machine translation (NMT) output by three of these four is hitting record highs.

Although NMT research output underwent a bit of downtime during December 2017 and the first couple of months of 2018, it seems to be in a resurgence since February 2018.

Slator has recently been fielding a number of new research papers on Cornell University’s automated online research distribution system Arxiv.org.

Advertisement

We found that between November 1, 2017 and February 14, 2018, there were at least 58 relevant papers, 12 of which were not about NMT specifically, meaning that there were 46 NMT-specific research papers published during this 105 day stretch.

Between February 15, 2018 and April 26, 2018, however—a 70 day stretch—we found 76 relevant papers. Of those, 51 directly related to NMT.  Some of those papers not strictly about NMT experimented on NMT as a subdomain of machine learning or neural networks, reflecting how much of a hot research topic it is.

There is clearly a spike in publishing, and the big players like IBM, Microsoft, Facebook, Amazon, and Google are all actively researching NMT. The most recent and significant research done by the “Big Tech” companies include:

Facebook on Low Resource Languages

The Facebook AI Research (FAIR) team has been busy tackling the problem of low-resource languages—languages for which parallel corpora for training machine translation (MT) engines are scarce. In March 2018, Slator covered Facebook’s USD 40,000 grant for research into low-resource languages for NMT.

Their most recent paper, Phrase-Based & Neural Unsupervised Machine Translation, explains how researchers improved upon previous unsupervised MT approaches and developed a fully unsupervised system based on phrase-based statistical MT (PBSMT) and NMT. The researchers claim that their system outperforms current state-of-the-art unsupervised MT systems by over 11 BLEU (Bilingual Evaluation Understudy) points “without using a single parallel sentence.”

Supervised training of MT means having both the source data and target data (i.e. parallel corpora of two languages) with which to train an MT engine. Unsupervised training of MT systems means only having source data, which is often the case for low-resource languages.

Low-resource languages are a practical problem for Facebook, which reached the two billion user mark in 2017 and requires 4.5 billion translations a day.

A portion of those billions of translations is for low-resource languages, such as Turkish, Vietnamese, and Tagalog, the main dialect of the Philippines, the social media capital of the world with over 47 million Facebook users.

Amazon on Improving Process Efficiency

In Amazon Research’s most recent paper, “A neural interlingua for multilingual machine translation,” the authors claim that they developed a model that “learns a true interlingua by performing direct zero-shot translation.”

Using their interlingua on an NMT model, the researchers reported using fewer parameters and still achieving output comparable to previous models.

Additionally, earlier in this month of April, Amazon published research on “Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation.” “Constrained decoding” is a method that allows NMT to consistently translate specific words or terminology.

The problem is that for every word that the NMT engine needs to remember to translate a specific way, the entire system slows down a bit.

This paper presents an application of constrained decoding used on Amazon’s Sockeye that reduces the complexity of the method, therefore speeding it up.

Google on Improving NMT Output

Google has a finger in pretty much every pie, and Arxiv research papers from Google Brain researchers include co-authored publications with Microsoft on low-resource languages, machine reading and question-answering, and unsupervised learning.

The subject of their most recent paper is quite telling, however: “The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation.” As its title suggests, Google’s researchers made mishmash NMT engines by combining the best parts of recent NMT models: recurrent neural networks (RNN), convolutional neural networks (CNN), and their own self-attentional transfomer model.

Google came up with a new and improved recurrent NMT (they called RNMT+), which uses some aspects of their own, in-production transformer model. In addition to RNMT+, they developed two more models which then reportedly outperformed RNMT+ by a slight margin.

All three engines outperformed state-of-the-art, according to Google, including their own in-production Google Translate transformer.

No word yet on implementation schedules, if any. Assuming they are not already involved in a constant rollout (similar to what Google does for its search algorithm), implementation could happen soon,  given Google’s new engines and research on further improving their self-attention mechanism.

BLEU Takes More Hits

The limited but still widely used BLEU metric took another hit when Amazon’s Matt Post published a paper recently titled “A Call for Clarity in Reporting BLEU Scores.”

Slator recently published Prof. Andy Way’s take on the limitations of BLEU. Post’s point, however, was what he calls BLEU’s “curse”, i.e. “a lack of consensus in how to report scores.”

“Although people refer to “the” BLEU score, BLEU scores can vary wildly with changes to its parameterization and, especially, reference processing schemes. Yet these details are absent from papers or hard to determine,” Post wrote in the paper’s abstract. “We quantify this variation, finding differences as high as 1.8 between commonly used configurations.”

All of the subject matter experts Slator spoke to for our NMT report 2018 also noted BLEU’s limitations especially in the era of neural, and most agreed that the only real basis for assessing NMT output quality was human evaluation.

On that note, One Hour Translation recently announced that they are launching their first NMT quality index they compiled from millions of human evaluations of NMT output. This is among the first of its kind in the language services market, and it remains to be seen how it will impact the development and adoption of NMT across the industry.

Record April 2018

Using Arxiv’s advanced search function, Slator found that April 2018 already qualified as the busiest month ever for NMT since research began gaining steam in 2014.

In April, up to the 26th, there were 43 papers published, which shatters November 2017’s record of 31 for the entire month.

Caveat: since NMT research began steadily increasing in 2017, abstracts of Arxiv papers have been actively mentioning NMT, though their subject matter may only be related and not strictly about NMT.

The research papers Slator reviewed from mid-February to late-April differ from those we first looked at in our article in mid-February, and there are some noticeable new trends in research direction:

  1. Research in low-resource languages and unsupervised learning (in an effort to alleviate the low-resource problem) is spiking. At least 11 papers published in the 70 days to April 26 are about these topics.
  2. The race to better NMT is definitely on. Nearly 30 research papers published in the 70 days to April 26 described improving the process of translation, improving the translation output itself, or developing new and improved NMT engines outright, from Google’s RMNT+ to Syntactically Guided Neural Machine Translation, which has reportedly been implemented by SDL.
  3. Automated, neural post-editing is upon us. Five papers described various ways of improving post-editing—three of them described neural post-editing and two were concerned with how NMT engines learn from human post-editing.

Meanwhile, there are conferences and meetups regarding NMT the world over. On April 18, 2018, Slator attended a NMT meet-up in Zurich, hosted by Textshuttle CTOSamuel Läubli.

The event showcased two presentations from Mirko Plitt, Head of Technology at Translators without Borders and Founder of Modulo Language Automation, as well as NMT research pioneer Rico Sennrich, Assistant Professor at the University of Edinburgh.

Plitt highlighted what in his view was the real breakthrough, interactive MT, where “the user works differently with technology,” citing adaptive MT company Lilt.

Sennrich, meanwhile, asserted that when it comes to NMT, “we haven’t hit a ceiling yet.” And, as he also emphasized in Slator’s NMT 2018 report, he said NMT works really well if there is a lot of data, but there are a lot of languages where there is very little data.

In the business world, the impact of much higher quality machine translation is fast becoming felt throughout the supply chain. A number of language service providers Slator spoke to over the past few months are actively lowering unit prices and / or reaping significant productivity gains.

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

Data and Research
32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations
$85 BUY NOW

Editor’s note: this article has been updated on April 30, 2018 to reflect One Hour Translation’s NMT Quality Index as it relates to NMT evaluation.

TAGS

AmazonarXivFacebookFacebook AI ResearchFAIRGoogleGoogle BrainIBMmachine translationMicrosoftneural machine translationRico SennrichSamuel LäubliTextShuttleUniversity of Edinburgh
SHARE
Gino Diño

By Gino Diño

Content strategy expert and Online Editor for Slator; father, husband, gamer, writer―not necessarily in that order.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Across Systems will be part of the Volaris Group

Across Systems will be part of the Volaris Group

by Across Systems GmbH

How Localex Made It Through the Pandemic

How Localex Made It Through the Pandemic

by Localex

Join Us for the First Virtual Together 2021 Next Month!

Join Us for the First Virtual Together 2021 Next Month!

by Elia

Upcoming Events

See All
  1. Handling Sensitive Information Webinar

    Handling Sensitive Calls with Limited English Proficient Consumers

    by Lionbridge

    · February 10

    Learn more about how Lionbridge Over-the-Phone Interpretation Services can help bridge communication gaps with limited...

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

The Most Popular Language Industry Stories of 2020

The Most Popular Language Industry Stories of 2020

by Seyma Albarino

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.