logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register For Email Marketing for Freelance Linguists and Learn How To Win New Clients.

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Facebook AI Open Sources New Multilingual Automatic Speech Recognition Data Set

1 month ago

January 29, 2021

Facebook AI Open Sources New Multilingual Automatic Speech Recognition Data Set

Machine Translation ·

by Seyma Albarino

On January 29, 2021

1 month ago
Machine Translation ·

by Seyma Albarino

On January 29, 2021

Facebook AI Open Sources New Multilingual Automatic Speech Recognition Data Set

On January 22, 2021, Facebook AI released Multilingual LibriSpeech (MLS), a new large-scale, open source dataset to advance research in automatic speech recognition (ASR).

According to an official blog post introducing MLS, its English-language data set is about 47 times larger than that of the original LibriSpeech, a corpus that contains 1,000 hours of read English.

If one early assessment is any indication, MLS already offers an improvement: When Facebook AI researchers trained a model on an MLS English subset, they produced a “20 percent improvement in word error rate compared with the same model trained using LibriSpeech data.”

Advertisement

Like LibriSpeech, MLS content comes from public domain audiobooks from the LibriVox project, which provides a wide range of speakers and allows Facebook AI to release the data with a non-restrictive license. (MLS can be downloaded from OpenSLR, and pretrained models and recipes for training and evaluating models are available on GitHub.)

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

Data and Research, Slator reports
44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
$380 BUY NOW

As its name implies, MLS builds on LibriSpeech by expanding to include seven new languages (Dutch, French, German, Italian, Polish, Portuguese, and Spanish) in addition to English. Altogether, MLS offers more than 50,000 hours of audio across all eight languages.

The Facebook AI blog post pointed out that while datasets and benchmarks for non-English languages exist, “they are often relatively small or scattered around different places and rarely available under an open, permissive license.”

Pro Guide Sales and Marketing for Language Service Provider and Translation and Localization Companies (Product)

Pro Guide: Sales and Marketing for Language Service Providers

Data and Research, Slator reports
36 pages. How LSPs generate leads, hire and compensate Sales staff, succeed in Digital Marketing, and benchmark against rivals.
$260 BUY NOW

And the social media giant believes that “MLS will promote open and collaborative research in multilingual ASR and improve speech recognition systems in more languages around the world.”

TAGS

ASRautomatic speech recognitionFacebookFacebook AIGitHubLibriSpeechLibriVoxMLSMultilingual LibriSpeechOpenSLR
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
6CONNEX to Partner with Interprefy to Help Clients Host Large Scale Events in Any Language

6CONNEX to Partner with Interprefy to Help Clients Host Large Scale Events in Any Language

by Interprefy

BLEND Raises $10m to Fuel Global Growth with End-to-end Localization Services

BLEND Raises $10m to Fuel Global Growth with End-to-end Localization Services

by BLEND

Iconic Launches INTRA Translation Platform

Iconic Launches INTRA Translation Platform

by Iconic

Upcoming Events

See All
  1. Smartling - Global Ready Conference 2021

    Global Ready Conference

    by Smartling

    · April 14

    When you can't traverse the world, let the world come to you. Join our annual global event from home.

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

by Marion Marking

The Slator 2021 Language Service Provider Index

The Slator 2021 Language Service Provider Index

by Slator

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

by Seyma Albarino

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,500 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.