logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
New Generation of Data Scientists Tackles Translation

4 years ago

October 25, 2016

New Generation of Data Scientists Tackles Translation

Technology ·

by Florian Faes

On October 25, 2016

4 years ago
Technology ·

by Florian Faes

On October 25, 2016

New Generation of Data Scientists Tackles Translation

There was a world of opportunity open to Korean-born Jason Lee after graduation, armed with a Masters Degree in Computer Science from Cambridge University and having completed internships at JP Morgan, Goldman Sachs, and Google.

Lee resisted the temptation to immediately monetize his résumé in big tech or high finance. Instead, he opted for a PhD in Deep Learning and Natural Language Processing at the Data Analytics Lab of the Swiss Federal Institute of Technology (ETH) under the supervision of Thomas Hoffman, a former Director of Engineering and co-site Lead at Google Zurich.

Lee could have set his sights on any one of the lab’s key research areas of machine learning, natural language processing and understanding, data mining and information retrieval. But he chose a problem in machine translation (MT) as his first major PhD research—a sign of how rapid progress in machine learning along with access to massive computer power are impacting language technology.

Advertisement

NLP is definitely the frontier of what the current artificial intelligence (AI) state of the art can do—Jason Lee

The addition of neural networks to MT is drawing a new batch of high caliber researchers into the field. Lee says he has always been interested in languages and naturally gravitated toward the natural language processing (NLP) side of machine learning. Had it not been for the recent advent of neural networks in language translation, however, we think Lee would likely have chosen a field other than MT for his research.

He says “NLP is definitely the frontier of what the current artificial intelligence (AI) state of the art can do.” And machine translation is one of the harder problems. Another hard problem: dialog systems (think customer service bots chatting with you).

Lowest Level Possible

The aim of Lee’s project was to take neural network modeling from the word- or subword-level to the level of characters. Drawing an analogy from image recognition, going down to character level is like going down to the individual pixel, the smallest possible unit (called “token” in NLP).

Lee says that, to his knowledge, work in statistical machine translation (SMT) or previous research in NMT never fully went down to that level. Even Google’s latest Google Translate NMT model operates only at subword level.

jasonlee
Jason Lee

This approach has a number of real, and perhaps surprising, advantages. For example, it tends to not get confused by typos (say, in user-generated content) and should have an edge at translating morphologically rich languages (think long words in Finnish, Turkish, and other agglutinating languages).

Since, according to Lee, their NMT model is agnostic down to the translated language, another benefit is the system performed well with so-called intra-sentence code-switching; that is, changing the language mid-sentence in the source.

The idea that an MT engine is language-agnostic may be difficult to absorb for many in an industry used to constant language and domain tweaking in machine translation.

Lee programmed his model based on Theano, a Python-based, deep-learning framework. Other frameworks used in NMT are Google’s TensorFlow and Torch, on which Systran based its latest NMT release.

Completely Data Driven

For this project, Lee visited Kyunghyun Cho at New York University, working at their new Center for Data Science. Cho is an Assistant Professor at NYU’s Department of Computer Science.

Lee calls Cho the pioneer of NMT and his 2014 paper “Neural Machine Translation by Jointly Learning to Align and Translate” a milestone in NMT research. In an interview with the NYU blog on his appointment in 2015, Cho called machine translation “the next field/task revolutionized by deep learning.”

We don’t give the model any linguistic knowledge at all—Jason Lee

According to Lee, NMT is very different from previous approaches to machine translation. Among the most obvious differences, Lee says, is the lack of linguistic and domain knowledge required to run the models.

“The reason why NMT is so exciting is because it is purely data-driven. When you design a model, you don’t inject any knowledge about what it should do. You just give it examples of source and target text and the rest is just magic. It just fills the gaps and the rest in by itself. So this idea of machine translation being a proxy for the advancement of artificial intelligence is valid because it’s so not domain specific. We don’t give it any linguistic knowledge at all.”

Here, Lee echoes his mentor Cho, who told the NYU blog that “instead of relying heavily on domain/linguistic knowledge, with neural networks, we now have a fully data-driven way to understand natural languages.”

Pace of Progress

We wanted to know if there was something like a consensus among the research community about a potential breakthrough in machine translation quality and when this could happen. Lee says, in the short time he has been active in the field, he observed impressive progress and a lot of new research coming out every month. But he qualified that, despite recent advances, it is difficult to say at this point just how far NMT can go.

We now have a fully data-driven way to understand natural languages—Kyunghyun Cho

Asked to share his thoughts on AI’s progress in general, Lee points out it is important to have an ethical discussion alongside the development of AI. He highlights the recent OpenAI initiative as an example of such a forum; and that governments are taking notice with the Obama administration releasing a report on the future of AI on October 12, 2016.

Neural networks are indeed having a profound impact on how machine translation is being approached. No matter where one places NMT on Gartner’s hype-cycle curve, there can be little doubt that the technology is giving a major boost to a field where quality improvements had once become increasingly hard-won. As more researchers like Jason Lee join the ranks of researchers tackling NMT, this progress can only accelerate.

TAGS

ETHGoogleJason Leemachine translationnatural language processingneural machine translation
SHARE
Florian Faes

By Florian Faes

Co-Founder of Slator. Linguist, business developer, and mountain runner. Based in the beautiful lakeside city of Zurich, Switzerland.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

BeLazy Announces Full Automation for Plunet

BeLazy Announces Full Automation for Plunet

by BeLazy

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.