Meta Releases Large Dataset for Multilingual Speech-to-Speech Translation
Meta has released SpeechMatrix, a large-scale mined corpus of multilingual speech-to-speech translations.
Meta has released SpeechMatrix, a large-scale mined corpus of multilingual speech-to-speech translations.
Researchers from Meta suggest training-data curation as a means to reduce added toxicity in machine translation. Toxicity categories include profanity, insults, hate speech, bullying, etc.
In just three months, universities and tech companies released over two dozen papers on speech-to-speech translation (S2ST), highlighting continuing trends, new directions in research.
Open-sourcing large language models is catching on among tech giants, such as Google and Meta. Now comes BLOOM, whose 1,000 contributors designed it with multilingualism in mind.
The new model, praised by Meta founder and CEO Mark Zuckerberg, is already being used to improve more than 25 billion translations daily on Facebook, Instagram, and other apps.
Meta eliminates text generation in speech to speech translation; shares code and research with the public.
Large pretrained language models (PLMs) memorize lots of personal data. Researchers examine data privacy risks associated with PLMs, propose solutions.
Social media giant Meta proposes new metric to enable more consistency in human evaluation of machine translation; addresses big challenge in translation QA for low-resource language pairs.
Among the many things cooking at Meta: downloadable language packs for Android and iOS users to enjoy download-on-demand translation.
Meta AI wants to make large language models more accessible to researchers with the release of OPT-175B; advocates R&D transparency and a “research mindset.”
Meta’s VP of Internationalization, Iris Orriss, joins SlatorPod to talk about language operations at Meta, pioneering new localization approaches, and the future of the Metaverse.
While translators have long been able to work outside the traditional office, what do language service providers think about trading their remote work setups for virtual workplaces?
Part of the billions of dollars going into the Metaverse will undoubtedly go into these two areas of machine translation. Meta AI reveals its MT roadmap and the people working on it.
Key highlights from Slator’s 2022 Language Service Provider Index, NeuralSpace’s USD 1.7m seed round, and Meta’s language and MT plans.
100,000 Spanish-speaking workers in IT, design, data entry, and more on waitlist to use Viva Translate machine translation to communicate with potential US employers.
In virtual reality, physical borders may be less of an issue than language barriers. As Meta hypes its Metaverse, language managers work behind the scenes to localize the space.
Meta shoots for completing world’s fastest supercomputer by mid-2022. Here’s what it can do in terms of natural language processing and translation — and what the facility looks like today.
After an extraordinary year, our look back on 2021 — from Big Tech to big buys, machine translation to translators’ job prospects — shows the language industry is thriving.
Curating datasets, reviewing user-generated content, and liaising with locals — all in a day’s work for a linguist hired by big tech. (No computer science degree or published research required.)
To be fair, this is the first-ever multilingual model to win the international machine translation contest. But major tech companies have been exploring multilingual models for years.