logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
How Netflix Researchers Simplify Subtitles for Translation

8 months ago

May 27, 2020

How Netflix Researchers Simplify Subtitles for Translation

Machine Translation ·

by Esther Bond

On May 27, 2020

8 months ago
Machine Translation ·

by Esther Bond

On May 27, 2020

How Netflix Researchers Simplify Subtitles for Translation

As original productions of media entertainment content have come to a halt amid coronavirus lockdowns, streaming services have turned their attention to localizing back-catalog content into more languages.

With high levels of localization demand, even in times of lockdown, streaming providers such as Amazon Prime Video are increasingly active participants in the machine translation (MT) research space.

Streaming giant Netflix confirmed back in April 2019 that they had not yet rolled out MT for their subtitle operations, but said they were investigating the use of the technology. Investigating they are: In May 2020, a paper published by a group of computer scientists at Netflix explored how to improve MT quality for low-resource languages, with the intended use likely to be in subtitles and meta-descriptions.

Advertisement

The paper, entitled “Simplify-then-Translate: Automatic Preprocessing for Black-Box Translation,” was published on pre-print platform arXiv on May 22, 2020. The study is a collaboration between former Netflix Research Intern Sneha Mehta, former Engineering Manager Ballav Bihani, and current Netflix employees Bahareh Azarnoush, Data Science Manager, Boris Chen, Machine Learning Engineer, Vinith Misra, Artwork and Video Data Science Manager, Avneesh Saluja, Research Scientist, and Ritwik Kumar, Machine Learning Director.

Slator 2020 Language Industry Market Report

Data and Research, Slator reports
55 pages. Total market size, biz dev and sales insights, TMS & MT review, buyer segment analysis, M&A, Covid impact & outlook.
$480 BUY NOW

Kumar’s LinkedIn profile provides a glimpse into wider MT-related research areas at Netflix, and lists a number of the team’s projects: deep learning for high-quality machine translations, predicting per-title language demand, and deep learning for text understanding such as customer complaint mining.

Azarnoush’s LinkedIn profile also outlines her mandate to “partner with localization experts to unleash the power of data to transcend language barriers and ensure the best local user experience at scale.” Her focus includes, for one thing, “experimentation and causal inference to support localization decisions.”

Simplify-Then-Translate

Netflix’s Simplify-Then-Translate paper brings together two natural language processing (NLP) disciplines: sentence simplification and machine translation.

Back-translations are simpler than the original source sentences and can be used to build a simplification model. This is what is novel about Netflix’s approach

Sentence simplification is nothing new. As the paper points out, sentence simplification was originally explored in the 1990s as a way to improve machine translation. The idea was that simpler source sentences lead to more fluent translations and “reduce technical post-editing effort.”

Netflix’s method relies on this premise and also leverages the notion that translated content is fundamentally simpler than original source content. By extension, they argued, back-translations are simpler than the original source sentences and can be used to build a simplification model. This is what is novel about Netflix’s approach.

First, Netflix took content previously translated by humans (reference translations) and back-translated it into the original source language using MT; in this case, English. From there, the researchers used the simpler, back-translated sentences to build a simplification model for English sentences.

The simplification model — called an automatic pre-processing model or APP — would then be applied to any English source content prior to the machine translation step, to improve the resulting output.

Netflix’s flagship APP for English, the figsAPP, is built specifically to tackle tricky content such as idioms by replacing such expressions with a simplified alternative. Given that they focus on “conversational language as used in dialogues of TV shows, [which] tends to be colloquial and idiomatic,” Netflix judged that it was important to use reference translations from this domain.

Slator RFP Service - Request for Proposal

RFP Center

Business Development, Market Intelligence
Receive daily email alerts of tenders and RFPs issued by governments, NGOs and private entities from across the world.
BUY NOW

Suitably, Netflix used entertainment content in high-resource languages to build the figsAPP, employing French, Italian, German, and Spanish (FIGS) reference translations for a number of titles including “How to Get Away with Murder,” “Star Trek: Deep Space Nine,” and “Full Metal Alchemist.”

Testing Low-Resource Languages

To conduct their experiments, Netflix used a “black-box” machine translation system, Google Translate. To test the results of the figsAPP against an out-of-domain simplification dataset, Netflix machine-translated simplified content into seven low-resource languages: Hungarian, Ukrainian, Czech, Romanian, Bulgarian, Hindi, and Malay.

Source content that had been simplified with the figsAPP resulted in better quality translations in all seven languages, compared to translations resulting from non-simplified, original source content. Source content pre-processed with the out-of-domain APP performed significantly worse than the original, confirming Netflix’s hypothesis that using domain-specific content improves the performance of the APP.

Slator Research Strategy Package - Translation Industry Research

Strategy Package

Market Intelligence
Access all of Slator's subscription services (SlatorSweep, SlatorPro & Research) with a company-wide license and save money.
BUY NOW

Netflix also looked at the Translation Edit Rate (TER), and found that using figsAPP-treated source content improved edit distance by between 1.3% to 7.3% for the seven languages tested. This is “intuitive,” Netflix said, “because the APP simplification brings the sentences closer to their literal human translation.”

The researchers also used humans to evaluate the quality of a sample of the translations resulting from figsAPP-treated source content for five of the seven low-resource languages. Here, too, Netflix found that, at least for three languages, figsAPP-treatment resulted in improved translation output.

Although English source content is Netflix’s primary focus for the purposes of the research, APPs can also be built in any language for which enough corresponding reference translations exist.

TAGS

Amazon Prime Videoappautomatic pre-processingmachine translationMTNetflixsentence simplificationsubtitles
SHARE
Esther Bond

By Esther Bond

Research Director at Slator. Localization enthusiast, linguist and inquisitor. London native.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
BeLazy Announces Full Automation for Plunet

BeLazy Announces Full Automation for Plunet

by BeLazy

Seamless Transitions and the Latest AI-Powered Technologies – Tilde’s Success Story

Seamless Transitions and the Latest AI-Powered Technologies – Tilde’s Success Story

by XTRF

Live Stream Smartling’s Global Ready Conference on April 14, 2021

Live Stream Smartling’s Global Ready Conference on April 14, 2021

by Smartling

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.