logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
In Research, Neural Steamrolls Statistical Machine Translation

4 years ago

December 15, 2016

In Research, Neural Steamrolls Statistical Machine Translation

Academia ·

by Marion Marking

On December 15, 2016

4 years ago
Academia ·

by Marion Marking

On December 15, 2016

In Research, Neural Steamrolls Statistical Machine Translation

Neural Machine Translation is the language industry’s 2016 buzzword No. 1. Some say it is groundbreaking, others say, overrated. But one thing is certain. Research into NMT has taken off over last 12 months.

Since the first paper with the words “neural machine translation” in the title appeared on the Cornell Unversity site arXiv.org (pronounced “archive”) in 2014, a total of 81 have been published to date. “Statistical machine translation” (SMT) numbered a meager 33 over the same period.

Much more telling, 63 of those NMT-titled papers were filed in 2016 alone, compared to just 11 for SMT. So, at least, as far as research output is concerned, NMT has won the battle, if not the war.

Advertisement

The Cornell repository arXiv.org is an automated online distribution system for research papers (so-called e-prints). It offers an alternative to traditional peer-reviewed journals or online platforms like Frontiers by being a “pure dissemination system,” as Paul Ginsparg describes it. Ginsparg is the Harvard and Cornell physicist whose brainchild arXiv is.

In 2015, arXiv received an average of 8,773 submissions per month, totalling 105,280 research papers at year-end. The site topped 10,000 submissions for the month of October 2016, a first since it launched in 1991.

According to peer-review journal publisher Frontiers, 2.5 million studies are published in one of the 30,000 scholarly journals each year. As far as microcosms go, arXiv.org may be a better gauge with over 1.2 million papers published compared to Frontiersin.org, which has only published 30,000 studies since it launched in 2007.

The most published NMT author on arXiv.org is trailblazer Kyunghyun Cho (14 citations for NMT-titled papers) followed by Yoshua Bengio (9), a Université de Montréal Professor who recently launched a deep learning incubator. We featured Cho, an Assistant Professor at New York University, in our story on simultaneous machine translation. At least one other scientist calls Cho an NMT pioneer and his 2014 paper a milestone in NMT research.

The most cited authors for SMT-titled papers are Krzysztof Wołk (4) and Krzysztof Marasek (3), who have often worked together. Their most recent joint study was published in Polish back in March 2016 and has to do with parallel data extraction from comparable corpora to enhance multi-domain machine translation on, say, Wikipedia and Euronews.com.

arXiv provides instant pre-review dissemination…a breadth far beyond the capacity of any one journal—Paul Ginsparg, Cornell University

A recent submission on machine translation, published December 12, 2016, is interesting because it is, to borrow the words of the authors, “a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT).” The study is attributed to scientists from the University of Cambridge Department of Engineering and SDL Research.

Another paper, published in October 2016, was the Oxford Master Thesis of Pinterest Software Engineer Paul Baltescu. (According to his LinkedIn profile, Baltescu’s Master’s at Oxford coincided with his internships at Twitter and Quora.) In it, Baltescu investigates “alternatives for the two components which prevent standard translation systems from working on mobile devices due to high memory usage.”

Baltescu explains that when he replaced the components with proposed alternatives, he was able to come up with “a scalable translation system that can work on a device with limited memory.”

Yet another study, published November 2016, bears the name of Huawei and Tsinghua University, Beijing. Its authors point out that NMT “suffers from a major drawback”: frequently inadequate translations. Their proposed framework “alleviates” NMT’s tendency to repeat the translation of some source words while wrongly ignoring others.

The authors say experiments show their approach “significantly improves the adequacy of NMT output and achieves superior translation result over state-of-the-art NMT and statistical MT systems.”

Authors Zhaopeng Tu and Lifeng Shang are Researchers at Noah’s Ark Lab and Yang Liu is the Chinese tech giant’s Supply Chain Planner; while Microsoft vet Xiaohua Liu is from Tsinghua and fellow alumnus Hang Li has worked at Hulu from internship to Senior Software Developer.

Among those who have authored papers published on the Cornell website are recognizable names from tech (such as Google’s Mike Schuster), the language industry (Systran CTO Jean Senellart), and members of the academe Slator has featured before, such as Marcin Junczys-Dowmunt, Graham Neubig, Jason Lee, and Rico Sennrich (whose latest NMT paper was published on arXiv on December 14, 2016), among others.

TAGS

arXivCornellFrontiersGraham NeubigJason LeeJean SenellartKyunghyun ChoMarcin Junczys-DowmuntMike Schusterneural machine translationPaul GinspargRico Sennrichstatistical machine translation
SHARE
Marion Marking

By Marion Marking

Slator consultant and corporate communications professional who enjoys exploring Asian cities.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

BeLazy Announces Full Automation for Plunet

BeLazy Announces Full Automation for Plunet

by BeLazy

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.