logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
The Race for Open Source Neural Machine Translation

3 years ago

May 28, 2018

The Race for Open Source Neural Machine Translation

Technology ·

by Gino Diño

On May 28, 2018

3 years ago
Technology ·

by Gino Diño

On May 28, 2018

The Race for Open Source Neural Machine Translation

Neural machine translation (NMT) often figures prominently during SlatorCon events, and SlatorCon London held at Nobu Hotel in London Shoreditch on May 17, 2018 was no exception. In his presentation for the event, Jean Senellart, Global CTO of event partner Systran, discussed an aspect of NMT that he found both exciting and scary at the same time: the race for open source.

Senellart briefly went through the history of 50-year old machine translation company Systran, a company that experienced and was directly involved in production-level deployments of all MT technologies—from rules-based MT to statistical MT to NMT.

He also spoke about the success of Open NMT, the open source NMT framework Systran and Harvard University built hand-in-hand, giving the audience an update on French company Ubiqus joining their venture.

Advertisement

Since its launch in early 2017, OpenNMT developed into the second largest open-source NMT project with 18 major releases, 3300 stars and 1020 forks on Github, and 6 complete code refactorings.

And this is where Senellart touched upon the core of his presentation: “We are talking about five thousand lines of code. We are talking about something huge and something tiny at the same time.”

NMT Changed MT History

When Senellart said he was talking about something huge, he was generally referring to how NMT has radically changed MT history.

In his presentation, Senellart showed that rules-based MT took to production in 1968 and stayed dominant until 2007, when statistical MT became good enough for production. Then in 2016, essentially a two-year-old technology would take over very quickly.

“SMT was created in in the 90s by IBM. It took 15 years to come to industry-level production,” Senellart said. “NMT was introduced by the academia in 2014. It took two years to be adopted by the industry.”

Aside from the massive difference in pace of development and industry adoption, Senellart also noted how each technology differed in what was considered its main asset. In rules-based MT, the asset was the code and the linguistic resources accumulated. For statistical MT, the asset was the data.

“The more data you have the better criteria you had and the equation was very simple,” Senellart said. “Double the data, and you were getting one more BLEU [Bilingual Evaluation Understudy] point.” He also noted that the first attempts at systematizing MT evaluation began during the reign of statistical MT.

Finally, NMT burst into the scene, and with it another asset shift: “We are not talking about big data anymore; we are talking about good data,” said Senellart.

The Good and Bad of Open Source

Aside from the change of mindset regarding data assets, Senellart emphasized that the open source aspect of NMT was also significant. “If you look at the last two years there has been, every month, about two new open source projects for NMT, so it’s incredible,” he said.

While that seems encouraging, Senellart noted that a lot of them are “dying,” i.e. new projects are not being maintained. Even Google would launch a new open source project only to abandon its maintenance in favor of a new technology or development, reflecting how fast NMT technologies evolve.

Senellart also called attention to the fact that while most open source projects are from the academia, the ones with the most activity are from industry players. Google, for instance, handles the biggest open source project with the most activity, and then second to that is Systran’s own Open NMT. Third in the list is Facebook.

“If you look at the last two years there has been, every month, about two new open source projects for NMT, so it’s incredible.”—Jean Senellart, Global CTO, Systran

This is “odd,” Senellart noted, because prior to this, Big Tech players like Google, Amazon, and Salesforce did not have an active open source culture. He went on to say that developments in technology were usually followed by published papers, often found on research repository Arxiv.org.

“There are very few players that are not open; that are not open sourcing their projects,” Senellart said, naming deepL, Omniscient, and Microsoft as some of them. They do release their “numbers,” however—like report cards, they release how well their NMT engines perform using measurements like BLEU.

So this is part of the good side of open source: collaboration even among competitors.

According to Senellart’s numbers, in 2017 there were 250 publications regarding NMT. “No company in the world can reproduce 250 papers just to check if they’re right or wrong and it is one of the reasons of the necessity of open source today,” he said.

In fact, Senellart noted that NMT tech has evolved so fast that in 14 months, there have been three major paradigm shifts in terms of the technology used. First researchers used recurrent neural nets (RNNs), then they flocked to Facebook-led convolutional neural networks (CNNs), and finally, Google’s self-attentional transformer models.

Senellart painted an interesting parallel between how the technology evolved and how humans process language and translation. RNNs process translation sequentially, word per word. CNNs process more generally, looking at sequences of words. Finally, the attention-based approach literally pays more attention to certain parts of text that may have significant impact to understanding and translating it.

“No company in the world can reproduce 250 papers just to check if they’re right or wrong and it is one of the reasons of the necessity of open source today.”

Then of course, with the good came the bad, and where the open source race helped speed up development, it also meant active players had to “fight for survival,” according to Senellart.

“An open source project is very fragile,” he said, explaining that Systran had to support Open NMT’s users and community, share data and even failed experiments, fix issues, make everything stable and compatible, among others.

“I remember one year ago, I received a call from Booking.com who used Open NMT,” Senellart told the audience. “They were just asking me will open NMT be there in one year because we are launching production now and can you guarantee that you’d still be there in one year?”

For a copy of Senellart’s presentation, register free of charge for a Slator membership and download a copy here.

SYSTRAN

2 MB

DOWNLOAD

What’s the Finish Line for this Open Source Race?

Reflecting on the paradigm shifts that the open source race accelerated, Senellart said “I’m talking about five thousand lines of code. It’s not as if we have made something huge. Is it small discovery and totally incompatible, which is the hint that we are still at the very beginning.”

“The big question I have is why are we all fighting for this?” Senellart asked about the open source race.

“I think it’s not NMT. It’s bigger than that. The real battle is behind the AI framework that you are using,” Senellart said. These frameworks include Microsoft’s CNTK, Google’s Tensorflow, Facebook’s PyTorch, and Amazon’s Sockeye. Senellart argued that NMT is only the proxy battlefield where players are “fighting to have their framework become the first framework… Because I believe that NMT is the gateway for all the NLP technologies.”

Senellart said NMT is quickly becoming a commodity, “it’s like running water or electricity; it’s everywhere. We need NMT and one question is what will be the winning computing framework in that case?”

Senellart said the industry is going in a specific direction, pointing out the Open Neural Network Exchange or ONNX, a joint, industry-wide effort between Facebook, Amazon, and Microsoft. ONNX is basically a standardization project, according to Senellart, who noted that standardization arises when technologies reach sufficient maturity.

“ONNX will allow you to take systems trained with Tensorflow to run with Caffe, which is a Facebook platform that lets you run a neural network on your mobile. Or you can do that with Sockeye from Amazon and run that on whatever.” Senellart explained.

He said ONNX is probably key to developing industry-wide standardization, but past that is a more important realization about NMT: “we need to realize that we are still at the very beginning of the technology.”

“The big question I have is why are we all fighting for this? I think it’s not NMT. It’s bigger than that. The real battle is behind the AI framework that you are using.”

Senellart briefly enumerated a number of impressive developments recently, including announcements regarding smarter virtual assistants, unsupervised machine learning for low-resource language translation, and touched upon adding document-level context to NMT that does exceeds sentence-by-sentence translation. He said Systran’s clients are looking for domain specialization now as well, what with general purpose.

As for the future of NMT, Senellart said NMT might be able to eventually augment human capabilities not only through increasing productivity and speed, but also through augmenting the way humans learn.

During the panel discussion, Senellart fielded a few questions from the audience, among which was a question in technological maturity: will the industry be using the same level of NMT eventually?

Senellart said yes, eventually the technology will plateau to about the same level across all players, but by then the differentiator will be the training data. As for how language data will factor into systems like zero shot translation, where essentially no bilingual data is required for machine learning, he said training data will still be used to build and model systems. “Unsupervised machine learning [such as what is used in zero shot translation] will probably help to make new language pairs, [translate] small, low-resource languages and probably increase the quality of the existing big ones,” Senellart said, “but data will still be there.”

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

Data and Research
32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations
$85 BUY NOW

TAGS

Andrew SmartarXivJean Senellartneural machine translationONNXopen sourceSlatorConSlatorCon London 2018Systran
SHARE
Gino Diño

By Gino Diño

Content strategy expert and Online Editor for Slator; father, husband, gamer, writer―not necessarily in that order.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Super Fast, Creative and Consistent: Supertext Launches Chat-Based Instant Translation Service

Super Fast, Creative and Consistent: Supertext Launches Chat-Based Instant Translation Service

by Supertext

Argos Multilingual Welcomes Kathleen Bostick as Localization Strategist and Senior Consultant

Argos Multilingual Welcomes Kathleen Bostick as Localization Strategist and Senior Consultant

by Argos Multilingual

Donna Thomas Joins Visual Data Media Services as Senior Vice President of Sales, Americas

Donna Thomas Joins Visual Data Media Services as Senior Vice President of Sales, Americas

by Visual Data Media Services

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

The Most Popular Language Industry Stories of 2020

The Most Popular Language Industry Stories of 2020

by Seyma Albarino

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.