Speech Wikimedia Drops a 200GB Audio Dataset to Train ASR and Speech Translation
Researchers from NVIDIA, Factored.ai, Talon Voice, and others open-source a properly licensed dataset of 1,780 hours of speech in 77 different languages, plus transcriptions.