logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register For Email Marketing for Freelance Linguists and Learn How To Win New Clients.

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
DeepMind Says Impact of New End-to-End Machine Dubbing Tech May Be Widespread

4 months ago

November 10, 2020

DeepMind Says Impact of New End-to-End Machine Dubbing Tech May Be Widespread

Machine Translation ·

by Esther Bond

On November 10, 2020

4 months ago
Machine Translation ·

by Esther Bond

On November 10, 2020

DeepMind Says Impact of New End-to-End Machine Dubbing Tech May Be Widespread

As the world increasingly relies on the medium of video for entertainment, communication, and education, a tech company collab has set its sights on making educational videos accessible to a wider audience.

Google and DeepMind’s system for dubbing educational videos was detailed in the research paper, “Large-scale multilingual audiovisual dubbing,” which was published on pre-print platform arXiv on November 6, 2020.

Co-authors are Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, and Nando de Freitas of AI company and research lab DeepMind — which was acquired by Google in 2014 — and Yu Zhang from Google.

Advertisement

The team has an impressive track record, which spans institutions such as Oxford University, Cambridge University, Stanford University, and others. Their previous employers include Baidu, Microsoft, KAYAK, Amazon, and Google, to name a few.

What’s novel about the DeepMind / Google project is that the researchers are not only concerned with translating audio content (speech), but also focus on adapting visual content. As explained in the paper, “We extend audio-only dubbing to include a visual dubbing component that translates the lip movements of speakers to match the phonemes of the translated audio.”

This involves modifying the speaker’s on-screen facial expressions (especially the lip movements) so that they match the target language, which the researchers said “creates a more natural viewing experience in the target language.”

This end-to-end workflow is complex and involves a variety of different subsystems.

  • Automatic speech recognition (ASR) – video transcription and sentence identification followed by manual correction;
  • Machine translation (MT) – followed by manual correction;
  • Speech synthesis with voice imitation – synthetic voicing of the translated text to sound like the speaker’s voice; and
  • Lip movement synthesis – alteration of on-screen images to match the translated audio. 
Slator Design Thinking February 2021 | $ 675

Slator Design Thinking February 2021 | $ 675

Design Thinking is a 3-session, live interactive workshop that will transform the way your team works to overcome the problems they face and develop highly innovative solutions.

Register Now

The researchers used more than 3,700 hours of transcribed video in 20 languages to feed the generic models, which were then “fine-tuned to a specific speaker before translation.”

Deep Fakes and Consent Issues

The purported goal of the research was to address the “imbalances in information access and online education” that exist because, according to the paper, nearly 60% of Internet content is published in English, while just a quarter of Internet users are native English speakers.

Crucially, the researchers said, “automatic translation of educational videos offers an important avenue for improving online education and diversity in many fields of technology.”

Yet, there are still a number of shortcomings that prevent the system from being fully automated and mean that its application is, for now, limited. The researchers identified several such challenges, including:

  • Idiomatic speech: MT cannot reliably deal with idiom, which means that humans are required to edit the MT output (in addition to the ASR output);
  • Multiple speakers: the system does not perform well in situations where the person speaking changes frequently, and where voices may run overlap (e.g., video interviews);
  • Sentence length: text can expand or contract when translated, and it needs to be modified by a human so that the length of the translated audio matches the original.

Despite its current limitations, the work “could have a widespread impact across sectors, from education to entertainment and gaming,” the researchers said, pointing out that “the general nature of this technology means it could be applied in many different settings.”

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms. Subscribe Now.

SlatorPod – News, Analysis, Guests

The researchers are mindful of the potential dangers of such video-altering capabilities, which primarily relate to deep fakes and consent issues. They acknowledged that “improving the ability to lip sync means it could be possible to ‘puppet’ an individual’s face using a voice actor’s speech, or other speech not spoken by that person, to generate deep fake content.”

The paper also noted that, “consent was retrieved from the source video owners of the translated videos shown with this work, and all video content generated via our system contains visible watermarks, so viewers are aware of any synthetic content displayed.”

Their demo videos are available to watch on YouTube.

TAGS

ASRaudiovisual contentaudiovisual localizationautomatic speech recognitiondubbinge-learninglip-sync dubbingmachine translationMTspeech synthesisspeech translationvideo localization
SHARE
Esther Bond

By Esther Bond

Research Director at Slator. Localization enthusiast, linguist and inquisitor. London native.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Iconic Launches INTRA Translation Platform

Iconic Launches INTRA Translation Platform

by Iconic

Pangeanic Is Now Certified to ISO 27001 Information Security

Pangeanic Is Now Certified to ISO 27001 Information Security

by Pangeanic

VSI Acquires Leading Brazilian Dubbing Studio, Vox Mundi

VSI Acquires Leading Brazilian Dubbing Studio, Vox Mundi

by VSI

Upcoming Events

See All
  1. Multilingual Winter Series

    Let’s Talk About the Future of the Localization Industry

    by Lionbridge

    · February 25

    Participate in an easy-paced 90-minute conversation with the minds that lead and influence the direction of the...

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

by Marion Marking

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

by Seyma Albarino

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,500 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.