logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Localizing at Scale for International Growth
    • Design Thinking May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Localizing at Scale for International Growth
    • Design Thinking May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register Before April 15th for SlatorCon Remote and Save 15%!

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Now on Github: New Speech-to-speech Translation Toolkit

12 months ago

April 30, 2020

Now on Github: New Speech-to-speech Translation Toolkit

Machine Translation ·

by Esther Bond

On April 30, 2020

12 months ago
Machine Translation ·

by Esther Bond

On April 30, 2020

Now on Github: New Speech-to-speech Translation Toolkit

As speech-to-speech translation (ST) has become a prominent area of interest in recent machine translation (MT) research, a group of researcher-developers have made their end-to-end speech processing toolkit publicly available to other developers.

In a paper published on pre-print server arXiv on April 21, 2020, Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique, Yalta Soplin, Tomoki Hayashi, and Shinji Watanabe, presented their open-source toolkit, called ESPnet-ST.

The group of developer-researchers hail from academic institutions Kyoto University, Johns Hopkins University, Waseda University and Nagoya University, as well as a number of Japan-based organizations: research lab NTT Communication Science Laboratories; software development startup Human Dataware Lab. Co., Ltd., which focuses on machine learning software; and RIKEN AIP, an AI R&D center that aims to “achieve scientific breakthrough and to contribute to the welfare of society and humanity through developing innovative technologies.”

Advertisement

Kyoto University’s Hirofumi Inaguma told Slator that the team’s main motivation for developing the toolkit is to break down language-related communication barriers. In making the toolkit open-source, he hopes to help researchers to “move forward to the next breakthrough,” he said.

ESPnet-ST was designed for “the quick development of speech-to-speech translation systems in a single framework.” Rather than a traditional “cascaded model,” ESPnet-ST is an end-to-end model, which maps speech in a source language to its translation in the target language.

Slator RFP Service - Request for Proposal

RFP Center

Business Development, Market Intelligence
Receive daily email alerts of tenders and RFPs issued by governments, NGOs and private entities from across the world.
BUY NOW

ESPnet, which has more than 7,500 commits on github, was originally focused on automatic speech recognition (ASR) and text-to-speech (TTS) code. It has recently been updated to include code for building machine translation systems, and now professes to be an “all-on-one toolkit that should make it easier for both ASR and MT researchers to get started in ST research.”

Microsoft-owned Github is a popular software repository for developers and allows users to share code in order to build software. Contributors can create software libraries (or “toolkits”), such as ESPnet, that contain snippets of code that other developers can re-use in their own projects. Code-sharing facilitates development because, rather than write code from scratch (known as “rolling your own”), developers can re-use existing bits of code for a popular development task such as login — to borrow a generic example — or feature extraction — to use an example from machine translation.  

Slator Market Intelligence - SlatorSweep and SlatorPro

Market Intelligence Packages

Data and Research, Market Intelligence, Slator reports
Access SlatorSweep’s time sensitive news and SlatorPro’s in-depth analysis with our Market Intelligence service and save money.
BUY NOW

According to the paper, ESPnet-ST also curates code snippets into “recipes” for common speech-to-speech translation tasks such as data pre-processing, training and decoding. The researchers claim that their results are “reproducible” and can “match or even outperform the current state-of-the-art performances.” Their pre-trained models are available to download from Github.

The paper compared ESPnet-ST with nine other speech-to-speech translation toolkits including Facebook’s Fairseq, Google’s OpenSeq2Seq, and OpenNMT by SYSTRAN and Ubiqus. 

The researchers believe that ESPnet-ST is the first toolkit “to include ASR, MT, TTS, and ST recipes and models in the same codebase.” It is also “very easy to customize training data and models,” they said.

Looking forward to future work, the researchers said that they plan to “support more corpora and implement novel techniques to bridge the gap between end-to-end and cascaded approaches.”

Looking forward to future work, the ESPnet-ST developers said that they plan to “support more corpora and implement novel techniques to bridge the gap between end-to-end and cascaded approaches.” Inaguma also said that his current research topics focus on multilingual and streaming models, which he called “the next essential technique.”

TAGS

automatic speech recognitionmachine translationspeech-to-speech translationtext-to-speech
SHARE
Esther Bond

By Esther Bond

Research Director at Slator. Localization enthusiast, linguist and inquisitor. London native.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Pro Guide: Translation Pricing and Procurement

Pro Guide: Translation Pricing and Procurement

by Slator

Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Press Releases

See all
LocHub Announces QA Localization Solution For Multilingual Content Publishing Processes

LocHub Announces QA Localization Solution For Multilingual Content Publishing Processes

by Xillio

Former TrustPoint Translations CEO Joins XTRF Advisory Board

Former TrustPoint Translations CEO Joins XTRF Advisory Board

by XTRF

Global Ready Conference Lineup Announced

Global Ready Conference Lineup Announced

by Smartling

Upcoming Events

See All
  1. Smartling - Global Ready Conference 2021

    Global Ready Conference

    by Smartling

    · April 14

    When you can't traverse the world, let the world come to you. Join our annual global event from home.

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

SeproTec

SeproTec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Google Translate Not Ready for Use in Medical Emergencies But Improving Fast — Study

Google Translate Not Ready for Use in Medical Emergencies But Improving Fast — Study

by Seyma Albarino

The Slator 2021 Language Service Provider Index

The Slator 2021 Language Service Provider Index

by Slator

DeepL Adds 13 European Languages as Traffic Continues to Surge

DeepL Adds 13 European Languages as Traffic Continues to Surge

by Marion Marking

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,800 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.