logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Apple Scientists Go on a ‘Quest Toward Overcoming’ Speech Translation

9 months ago

April 20, 2020

Apple Scientists Go on a ‘Quest Toward Overcoming’ Speech Translation

Machine Translation ·

by Marion Marking

On April 20, 2020

9 months ago
Machine Translation ·

by Marion Marking

On April 20, 2020

Apple Scientists Go on a ‘Quest Toward Overcoming’ Speech Translation

After Google and Amazon, Apple has turned the page on the broad issue of speech translation. Or at least — to borrow the words of the authors of a recent paper — taken stock of where we are.

Published on the popular pre-print server arXiv, on April 17, 2020, “Speech Translation and the End-to-End Promise: Taking Stock of Where We Are” was authored by two research scientists from Apple.

Matthias Sperber is a Siri Machine Translation R&D Scientist based in the German spa city of Aachen. Matthias Paulik is a Senior Manager out of Cupertino HQ.

Advertisement

The two Matthiases got their PhDs from Karlsruhe Institute of Technology (KIT) a decade apart. Both served on the KIT research staff, with one focusing on automatic speech recognition (ASR), machine translation, and neural networks (Paulik), and the other, linguistic annotation, ASR, and speech-to-text (Sperber).

In their recent paper, Sperber and Paulik surveyed three decades’ worth of research into speech translation, defining its challenges, techniques, and requirements to “encourage meaningful and generalizable comparisons on our quest toward overcoming the long-standing issues found in ST models.” As the authors put it, “Given the abundance of prior work, a clear picture on where we currently stand is needed.”

Slator Market Intelligence - SlatorSweep and SlatorPro

Market Intelligence Packages

Data and Research, Market Intelligence, Slator reports
Access SlatorSweep’s time sensitive news and SlatorPro’s in-depth analysis with our Market Intelligence service and save money.
BUY NOW

As defined by the duo, speech translation (ST) is “the task of translating acoustic speech signals into text in a foreign language.” And although ST, put simply, has to do with generating accurate text output from speech input, the journey to get there is complex and multifaceted as it builds on previous work in automatic speech recognition (ASR) and machine translation (MT), the authors pointed out.

Taken in the context of Google and Amazon’s prior work (as well as Microsoft’s 2019 hologram demo), the brass ring in all this is, of course, (accurate) speech-to-speech translation.

Crucially, the authors point out that the only feasible approach, until recently, has been “the cascaded approach that applies an ASR to the speech inputs, and then passes the results on to an MT system.”

They note that there has since been progress in ST on two fronts: “general improvements in ASR and MT models, and moving from the loosely-coupled cascade in its most basic form toward a tighter coupling” (more under Chronological Survey below).

SlatorSweep - Daily Market Intelligence

SlatorSweep

Data and Research, Market Intelligence
Curated news from thousands of sources, SlatorSweep’s daily news service gives you a competitive edge on time sensitive market intelligence.
BUY NOW

Sperber and Paulik qualify that “a large share of the progress has arguably been owed simply to general ASR and MT improvements [but] “recently, new modeling techniques and in particular end-to-end trainable encoder-decoder models have fueled hope for addressing challenges of ST in a more principled manner.”

They go on to say, however, that “despite these hopes, the empirical evidence indicates that the success of such efforts has so far been mixed”; thus, their attempt to uncover the potential reasons behind this through their study.

Sperber and Paulik’s paper, basically, does three things: First, it analyzes the historical development of broader speech translation. Next, it carves out the challenges related to ST — pointing out that the research has, thus far, been insufficient in analyzing these challenges. In so doing, the paper then highlights open research questions that can hopefully be addressed in future studies.

Chronological Survey

The paper begins with a chronological survey of more than 30 years’ worth of ST research, introducing key concepts. For instance, it cites two early papers from 1988 and 1991 to define “the loosely coupled cascade,” where researchers used separately built ASR and MT systems and then used “the best hypothesis of the former […] as input to the latter.”

According to the authors, such early systems were prone to errors “propagated from the ASR, given the widespread use of interlingua-based MT which relied on parsers unable to handle mal-formed inputs.”

Slator Visibility Package - Directory Listing and Press Releases

Visibility Packages

Advertising with Slator, Business Development, Marketing
Increase your visibility, build referral traffic and save money by integrating your Press Releases with a Directory listing.
BUY NOW

They added that subsequent systems, which relied on data-driven, statistical MT, “somewhat alleviated the issue, and also in part opened the path towards tighter integration.”

Also noteworthy: Sperber and Paulik point out that “the possibility of speech-to-speech translation, which extends the cascade by appending a text-to-speech component, was also considered early on (Waibel et al., 1991).”

Challenges

The paper then defines “the central challenges, techniques, and requirements, motivated by the observation that recent work does not sufficiently analyze these challenges.”

Some of these central challenges arise from the aforementioned loosely-coupled cascade (e.g., error propagation, mismatched source-language, information loss). Sperber and Paulik then list typical countermeasures for each challenge.

In the case of mismatched source-language, for example — which is caused by (a) modeling assumptions, such as ASR only modeling unpunctuated transcripts and (b) mismatched training data, which leads to “stylistic and topical divergence” — typical countermeasures based on previous studies would be “domain adaptation techniques, disfluency removal, text normalization, and segmentation/punctuation insertion.”

Slator RFP Service - Request for Proposal

RFP Center

Business Development, Market Intelligence
Receive daily email alerts of tenders and RFPs issued by governments, NGOs and private entities from across the world.
BUY NOW

Open Research Questions

In conclusion, Sperber and Paulik suggest possible starting points for future research.

They note, for instance, that “while early decisions and data efficiency have been recognized as central issues, empirical insights are still limited and further analysis is needed. Mismatched source-language and information loss are often not explicitly analyzed.”

Moreover, wrote the authors, “We conjecture that the apparent trade-off between data efficiency and modeling power may explain the mixed success in outperforming the loosely coupled cascade. In order to make progress in this regard, the involved issues (early decisions, mismatched source-language, information loss, data efficiency) need to be precisely analyzed, and more model variants should be explored.”

As for traditional models, they suggest extending rather than altering them by, for example, “applying end-to-end training as a fine-tuning step, employing a direct model for rescoring, or adding a triangle connection to a loosely coupled cascade.”

TAGS

AppleASRautomatic speech recognitionKarlsruhe Institute of TechnologyKITmachine translationMatthias PaulikMatthias SperberMTspeech recognitionspeech translationspeech-to-speechspeech-to-speech translation
SHARE
Marion Marking

By Marion Marking

Slator consultant and corporate communications professional who enjoys exploring Asian cities.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Across Systems will be part of the Volaris Group

Across Systems will be part of the Volaris Group

by Across Systems GmbH

How Localex Made It Through the Pandemic

How Localex Made It Through the Pandemic

by Localex

Join Us for the First Virtual Together 2021 Next Month!

Join Us for the First Virtual Together 2021 Next Month!

by Elia

Upcoming Events

See All
  1. Handling Sensitive Information Webinar

    Handling Sensitive Calls with Limited English Proficient Consumers

    by Lionbridge

    · February 10

    Learn more about how Lionbridge Over-the-Phone Interpretation Services can help bridge communication gaps with limited...

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.