Harvard Launches Open-source Neural Machine Translation System

On December 19, 2016, a Monday, at exactly half past nine, the Twitterverse was alerted to the existence of the OpenNMT project over at the Harvard natural language processing (NLP) group.

The Harvard NLP group comprises researchers who cover areas as varied as “computational models for human language,” machine learning, deep learning, artificial intelligence, and the “intersections between computer science and linguistics.”

The group’s OpenNMT tweet was followed the day after with a wink at Google, which read: “#Google, we promise we are not #taking you on. Please keep on putting out awesome research / feeding my grad students.”

Yoon Kim

OpenNMT developer Yoon Kim is a Computer Science PhD candidate and member of Harvard NLP. Kim had previously taken his Master’s in Data Science from New York University, another Master’s in Statistics from Columbia University, and baccalaureate in Math and Economics from Cornell.

Working on the project with Kim was his adviser, Alexander Rush, who runs the NLP group. Commercial machine translation provider Systran, which recently launched its own proprietary neural machine translation system, was also involved in the project.

What follows is Slator’s interview with Harvard NLP’s Alexander Rush and Systran CTO Jean Senellart on the OpenNMT project.

Slator: What motivated you to develop OpenNMT? How did this project come about?

Alexander Rush: The project is based on research software built by my graduate student Yoon Kim. We used the software in my lab to do research on improving translation systems and to teach graduate students. We happened to also put the software online for free, and Systran found it. It was useful for their products, and so they begin to send us updates to the code. It is the kind of mutually beneficial relationship that open-source communities can produce.

Alexander Rush

Slator: What exactly is OpenNMT and what does it do?

Rush: Recently, there have been a series of advances in artificial intelligence (AI), leading to improvements in speech, image recognition, and game playing. In the area of natural language processing, these improvements have been most impactful in the area of translation, leading to models that significantly improve on the quality of machine translation.

OpenNMT is open-source software implementing this technology, roughly similar to Google’s proprietary system. It is software to learn models for machine translation. It takes in a corpus of aligned sentences from a source and target language, and learns a mathematical model—known as a neural network—to [perform] translation. That model can then be fed unseen source sentences and OpenNMT will translate them.

We do expect some competitors quickly building products based on this technology—Jean Senellart, Systran CTO

Slator: What makes it different from the commercial solution Systran offers?

Jean Senellart: The core technology we propose to our users will be exactly the same as the one we are contributing for the OpenNMT project. Our business model is to build tailored

Übersetzungsagentur Was zählt beim Unternehmensverkauf (1)

Online Seminar – Was zählt beim Unternehmensverkauf

$1,690 BUY NOW
2022 Language Industry Market Report Cover

Slator 2022 Language Industry Market Report

100-page flagship report on market size, buyer-segments, competitive landscape, sales and marketing insights, language tech and more.
$880 BUY NOW
Slator 2022 LSPI data product

Slator 2022 Language Service Provider Index (All Data as a Spreadsheet Download)

Spreadsheet with underlying data for the Slator 2022 LSPI: ca. 300 LSPs, 2021 and 2020 revenues (USD and original currency), growth, ownership, headcount, headquarters, and more.
$850 BUY NOW
2021 M&A and Funding Report Product

Slator 2021 Language Industry M&A and Funding Report

46 pages on language industry M&A and venture funding. Includes financial investments, mergers, acquisitions, and IPOs.
$490 BUY NOW
2021 M&A and Funding Report Product

Slator 2021 Language Industry M&A and Funding Report

46 pages on language industry M&A and venture funding. Includes financial investments, mergers, acquisitions, and IPOs.
$490 BUY NOW
Slator Transcreation and Multilingual Content Origination Report

Slator Transcreation and Multilingual Content Origination Report

60-page report on how translation agencies can future-proof the business by adding creative content to their offering.
$590 BUY NOW
Pro Guide: Becoming an Effective Localization Buyer Product

Slator Pro Guide: Becoming an Effective Localization Buyer

How mature localization buyers create a content strategy, build partnerships, and drive growth within organizations.
$375 BUY NOW
Pro Guide: Language Operations Product

Pro Guide: Language Operations

How thriving LSPs structure operations, scale internationally, manage supply chains, execute program management, and mitigate risks.
$490 BUY NOW
Slator 2021 Video Localization Report

Slator 2021 Video Localization Report

45-pages on subtitling, dubbing, RSI, and captioning for media & entertainment, training & education, meetings & events.
$590 BUY NOW

Slator 2021 Language Industry Market Report

80-pages. Market Size by Vertical, Geo, Intention. Expert-in-Loop Model. M&A. Frontier Tech. Hybrid Future. Outlook 2021-2025.
$680 BUY NOW

Slator 2021 Language Service Provider Index (All Data as a Spreadsheet Download)

Spreadsheet with underlying data for the Slator 2021 LSPI: 190+ LSPs, 2020 and 2019 revenues (USD and original currency), growth, ownership, headquarters, and more.
$690 BUY NOW
Slator Pro Guide Translation Pricing and Procurement

Pro Guide: Translation Pricing and Procurement

45 pages on translation and localization pricing and procurement, human-in-the-loop models, and linguist compensation.
$470 BUY NOW
Language industry M&A and Funding Report product

Slator 2020 Language Industry M&A and Funding Report

40 pages on translation, localization industry M&A, venture funding. Valuations, PE funds, deal rationale, geo, investment theses.
$490 BUY NOW
Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
$380 BUY NOW
Slator Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

44-page medtech translation & localization report. Market overview, content types & services, buyers & suppliers, sales insights, more.
$290 BUY NOW
Slator Translation and Localization Buyer Report 2020

Slator Translation and Localization Buyer Report 2020

11 translation and localization buyer features from 2020 plus typical buyer job titles and Slator's language industry market matrix.
$68 BUY NOW
ISO and Quality Management for Translation Agencies and Localization Providers

Pro Guide: ISO and Quality Management for Language Service Providers

36 pages. How and why LSPs get ISO certified. How to succeed in a LSP Quality Management.
$240 BUY NOW
Pro Guide Sales and Marketing for Language Service Provider and Translation and Localization Companies (Product)

Pro Guide: Sales and Marketing for Language Service Providers

36 pages. How LSPs generate leads, hire and compensate Sales staff, succeed in Digital Marketing, and benchmark against rivals.
$260 BUY NOW
Slator 2020 How to Run a Translation and Localization RFP - Procurement

Pro Guide: How to Run a Translation and Localization RFP

25 pages. Actionable guidance for translation and localization buyers on how to qualify vendors and streamline procurement.
$375 BUY NOW

Slator 2020 Language Industry Market Report

55 pages. Total market size, biz dev and sales insights, TMS & MT review, buyer segment analysis, M&A, Covid impact & outlook.
$480 BUY NOW

Slator 2019 Language Industry M&A and Funding Report

34-page report. Language industry M&A and startup funding. Transaction valuations, trade sales, financial backing, private equity influence, main rationale, seller verticals, geographical analysis, startup funding analysis.
$450 BUY NOW
Travel and Retail 2019 Translation and Localization Report

Slator 2019 Travel & Retail Localization Report

29-page report. Travel and retail overview. Role of the language services industry. Market size. Competitive landscape. Biz Dev.
$230 BUY NOW
Slator Sponsored Article - Lead Generation in Translation Industry

Sponsored Article

Drive lead generation with Sponsored Articles hosted on Slator and promoted in our Newsletter and social media network.

Slator 2019 US Healthcare Interpreting Report

25-page report. US healthcare market overview. Role of the language services industry. Market size. Competitive landscape. Biz Dev and Sales.
$170 BUY NOW
Placeholder

Visibility Package [Enhanced Plus]

FEATURED + 6 PRS
Placeholder

Visibility Package [Enhanced]

FEATURED + 3 PRS
Placeholder

Visibility Package [Standard Plus]

REGULAR LISTING + 6 PRS
Placeholder

Visibility Package [Standard]

REGULAR LISTING + 3 PRS
Placeholder

Strategy Package [Corporate 2-Years]

>$10M REVENUE
Placeholder

Strategy Package [Corporate 1-Year]

>$10M REVENUE
Placeholder

Strategy Package [SME 2-Years]

<$10M REVENUE
Placeholder

Strategy Package [SME 1-Year]

<$10M REVENUE
Placeholder

Market Intelligence Package [2-Years]

15% SAVINGS
Placeholder

Market Intelligence Package [1-Year]

10% SAVINGS
Slator Research Strategy Package - Translation Industry Research

Strategy Package

Access all of Slator's subscription services (SlatorSweep, SlatorPro & Research) with a company-wide license and save money.
Slator Market Intelligence - SlatorSweep and SlatorPro

Market Intelligence Package

Access SlatorSweep’s time sensitive news and SlatorPro’s in-depth analysis with our Market Intelligence service and save money.

Sponsored Articles Listing

Drive lead generation with Sponsored Articles hosted on Slator.com and promoted in our Newsletter and social media network.
Slator Visibility Package - Directory Listing and Press Releases

Visibility Packages

Increase your visibility, build referral traffic and save money by integrating your Press Releases with a Directory listing.

Slator 2019 Life Sciences Translation Report

25 pages. Clinical life sciences market size, competitive landscape, industry service model, buyer insights, and more...
$170 BUY NOW
Slator Switzerland 250 Language Service Provider List

Slator Switzerland 250 Language Service Provider List

Full list of 250 active Language Service Providers in Switzerland as of July 18, 2019
$370 BUY NOW
Slator 2018 Financial Industry Report

Slator 2018 Financial Industry Report

Slator 2019 Language Industry Market Report

Slator 2019 Language Industry Market Report

33 pages. Total market size, key verticals, services & tech landscape, market share by segment, M&A, and outlook.
$385 BUY NOW
Slator 2019 Game Localization Report

Slator 2019 Game Localization Report

Figures, insights, and case studies on the game localization space from both sell-side and buy-side.
$85 BUY NOW
Slator Event Listing - Events

Event Listings

Attract our audience of decision makers to your events by promoting them on our website, Newsletters and social media network.
Slator Buy-Side Report 2018

Slator Buy-Side Report 2018 Actionable Insights From the Language Industry Buy-Side

Features 23 buyer profiles along industry verticals.
$48 BUY NOW
Slator 2018 Language Industry M&A and Funding Report

Slator 2018 Language Industry M&A and Funding Report

22 pages — analysis, valuations, rationale on 48 mergers and acquisitions as well as 10 language tech VC funding rounds.
$380 BUY NOW

Slator Germany 500 Language Service Provider List

Full list of nearly 500 active Language Service Providers in Germany as of December 20, 2018.
$280 BUY NOW

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations
$85 BUY NOW
Slator Press Releases - Press Release

Press Releases

Distribute your press release on Slator. Published on the website, in the email newsletter (12k opt-in subscribers), and on social media.
Slator Directory Listing

Directory Listing

Promote your company prominently in Slator’s Directory and select “Featured” to add extra visibility on across Slator.com’s web pages.
Slator RFP Service - Request for Proposal

RFP Center

Receive daily email alerts of tenders and RFPs issued by governments, NGOs and private entities from across the world.
Slator Job Ad - Recruitment in the Localization Industry

Job Ads

Recruit the best talent from our highly skilled audience by posting your job ads on Slator and across our Newsletters and social media network.

Slator 2018 Blockchain and Translation Report

24-page report. Emerging role of blockchain in language services and vice versa. Language industry ICOs and additional information.
$85 BUY NOW
Slator 2018 Media Localization Report

Slator 2018 Media Localization Report

25-page report. Entertainment media overview. Role of the language services industry. Market size. Competitive landscape. Biz Dev and Sales.
$85 BUY NOW

Slator 2018 UK Company List

Full list (xls) of companies listed under SIC Code 74300: Translation and interpretation activities as of 1 June 2018.
$280 BUY NOW
Slator 2018 Financial Industry Report

Slator 2018 Financial Industry Report

25-page report. Financial industry overview. Role of the language services industry. Market size. Competitive landscape. Biz Dev and Sales.
$85 BUY NOW
Neural Machine Translation in Use for Localization

Slator Neural Machine Translation Report 2018

Published March 2018. 35-page report. Current state and business case for NMT with expert commentary from over a dozen industry experts and academic researchers.
$48 BUY NOW

Slator 2017 Language Industry M&A Report

16-page report. Analysis of 2017 language industry M&A, 2018 outlook, list of all deals Slator covered incl. price, multiples if available, sector, country, deal type.
$280 BUY NOW
Slator Buy-Side Report 2017

Slator Buy-Side Report 2017

Features 30 buyer profiles along industry verticals incl. buyer name, translation volume and / or spend, technology used, sourcing approach, other key insights.
$48 BUY NOW
for our customers. [We] provide complete translation workflow; more features (e.g., document filtering, coupling with other technologies like language detection, entity extraction) than just the core translation.

Slator: Can you give us a simple first use case for OpenNMT?

Rush: We released several example translation models (e.g., German-English). Anyone can download and run the model to experiment with neural machine translation. We publicized the project because we thought it was quite stable; but also with the hope that more people in the translation community would contribute back to further improve it.

In theory, anybody could rent a server and train a model on available data, and we see some hobbyist doing just that—Alexander Rush, Assistant Professor Harvard School of Engineering and Applied Sciences

Slator: What is your mid- to long-term goal for OpenNMT?

Rush: There are two main focuses. One, we want to keep the code up-to-date with all the new ideas published in the research community, such that the open-source software stays competitive with closed-source offerings (e.g., Google). For instance, my group recently developed a system for shrinking translation models so they can run much faster, and this was implemented in the software even before the paper was published.

Jean Senellart

Two, we want to try out more cutting-edge “translation” ideas. For example, we are implementing an extension to map from images-to-text using OpenNMT. This is a rather recent research idea that we hope to make more accessible.

Senellart: On Systran’s side, we want this project to contain all the best of breed features and ideas that are published by the research community, but also keep the code simple, fast, so it becomes a reference for anyone wanting to do more research or even create commercial applications.

Slator: Who do you see as early adopters of this technology?

Rush: Great question! In theory, anybody could rent a server and train a model on available data, and we see some hobbyist doing just that. In practice, we expect a mix of researchers studying how to improve translation and people in the industry looking to become familiar with new AI technology.

Senellart: We do expect some competitors quickly building products based on this technology—and this will, of course, be challenging for us. But at the same time, [it is] quite an achievement that will help develop the machine translation market and global awareness about the technology.