logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register For Email Marketing for Freelance Linguists and Learn How To Win New Clients.

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Microsoft Releases Corpus So Clients Can Help Improve Skype Translator

4 years ago

February 6, 2017

Microsoft Releases Corpus So Clients Can Help Improve Skype Translator

Technology ·

by Gerard Castañeda

On February 6, 2017

4 years ago
Technology ·

by Gerard Castañeda

On February 6, 2017

Microsoft Releases Corpus So Clients Can Help Improve Skype Translator

Despite the headlines, the universal translator for conversational speech has not arrived just yet. Microsoft, whose machine translation technology powers Skype Translator — which probably comes closest — admits as much.

In a paper, Microsoft said they observed the “clear negative impact” of inserting Skype Translator into a conversation. They found that people spoke more slowly, used “restricted vocabulary.” and would often “need to ask clarification questions when results are not understandable.”

That is still a far cry from Microsoft’s goal, which is to translate a natural conversation between speakers of two different languages so that “one would not be able to tell the difference between conversations held in one language and those held in two.”

Advertisement

No Free Lunch

In a bid to ask clients and partners to help accelerate progress, Microsoft released a large, 2GB Speech Language Translation Corpus so users of the Microsoft Translator Speech API have a baseline “to evaluate end-to-end conversational speech translation quality.”

Applications for the API range from making large repositories of audio files searchable by transcribing them into text, real-time subtitling and machine translating those subtitles, or one-to-one, in-person or remote live translation (full-circle back to the universal translator).

Users of the technology include Lionbridge (automatic subtitling), telecom provider Tele2 (live translation of phone conversations), and ProDeaf (multilingual support of speech-to-sign scenarios). Microsoft wants the corpus to become the “gold standard…for speech language translation.”

Microsoft does not provide the corpus all for the sake of the greater good, of course. Using the Speech Translation API to transcribe and translate 1,000 hours of audio per month costs USD 7,000 per month; and 10,000 hours leaves you with a USD 35,000 bill.

Habeas Corpus

The corpus was created from actual conversations over Skype to “capture the typical side-effects of Skype’s transport layer.” It contains around 3,000 end-to-end speech translation sets for English, and 2,100 for French and German.

Each set consists of an audio file, a verbatim transcription, a cleaned-up transcription, and a translation based on the cleaned-up transcription. The average length of the audio sequence is 4.7 seconds in English, 5.4 seconds in French, and 6.7 seconds in German.

The nature of the content is conversational (e.g., “And I mean on WeChat you always have updates of new emoticons that you can download”). The audio was transcribed and translated by human linguists. Microsoft recorded 100 speakers for each language with 50-plus pairings.

To simulate the eventual use case (i.e., two people speaking over Skype in two different languages), Microsoft asked bilingual participants to hold a 30-minute conversation, where one spoke either German or French with the other responding in English.

In its blog post, Microsoft said it plans to release an updated version of Skype Translator in 2017 and expand language coverage.

TAGS

machine translationMicrosoftspeech-to-text
SHARE
Gerard Castañeda

By Gerard Castañeda

Research Associate for Slator.com. Runs, bikes, and climbs mountains for fun.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
6CONNEX to Partner with Interprefy to Help Clients Host Large Scale Events in Any Language

6CONNEX to Partner with Interprefy to Help Clients Host Large Scale Events in Any Language

by Interprefy

BLEND Raises $10m to Fuel Global Growth with End-to-end Localization Services

BLEND Raises $10m to Fuel Global Growth with End-to-end Localization Services

by BLEND

Iconic Launches INTRA Translation Platform

Iconic Launches INTRA Translation Platform

by Iconic

Upcoming Events

See All
  1. Smartling - Global Ready Conference 2021

    Global Ready Conference

    by Smartling

    · April 14

    When you can't traverse the world, let the world come to you. Join our annual global event from home.

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

by Marion Marking

The Slator 2021 Language Service Provider Index

The Slator 2021 Language Service Provider Index

by Slator

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

by Seyma Albarino

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,500 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.