Spotify Launches Voice Translation in Podcasting Push

Spotify Podcasting

Podcasting remains one of the fastest-growing areas in media and a battleground for the likes of Amazon Music, Apple Podcasts, and Spotify.

Now, Spotify, the digital music service, whose offerings include a range of podcasts with reportedly more than 100m listeners, has likely invested a pretty penny in bringing podcasts beyond their native-language audiences.

Announcing the pilot on September 25, 2023, Spotify CEO Daniel Ek tweeted, “It’s called Voice Translation and using AI, translates podcasts episodes into alternate languages, all in the podcaster’s voice.”

Spotify’s initial release consists of three episodes of different podcasts — including the Lex Fridman Podcast and The Diary of a CEO with Steven Bartlett — translated into Spanish audio. 

Elon Musk, a coveted guest on the latter podcast, per fans’ comments, declared the resulting product a “[d]eep real, instead of deep fake” on X. 

“It’s pretty insane to hear,” Ek added on LinkedIn. “More languages and more podcasts to come soon!”

According to a press release, French and German episodes will be ready in the order of days and weeks. The “voice-translated” episodes will be available to Premium and Free users worldwide, and while the company reportedly “aim[s] to expand access for more creators and languages,” few details are offered in terms of timelines or scale. 

Spotify developed the Voice Translation tool using OpenAI’s “newly released voice generation technology,” among other recent “innovations.”  

Coverage by The Verge suggests, more specifically, that OpenAI’s multilingual transcription and translation tool Whisper serves as the “backbone” of Spotify’s new feature.  

Demand for Authentication

Responses on social media ranged from skeptical (“Does spotify REALLY need new features? Fix your mobile app first.”) to tempered (“One of the few cases where AI advancement isn’t instantly giving me dreadful thoughts about its future applications”) to over-the-top (including the term “game-changer” and many, many fire and mind-blown emojis). 

“Linguists like myself would love to be flies on the walls of this project!” Aleksandra Pimenides gushed on LinkedIn, launching into theoretical questions about human involvement in quality control and decisions related to accents.

On X, another localization-minded commenter predicted, “The bilingual job market will explode for demand of authentication.”

Both comments point to an unacknowledged fact that the success of the technology will likely vary by podcast, depending on the specificity of the subject matter and the fluency of its hosts and guests. A jargon-filed, expert-targeted podcast like SlatorPod, for example, would be a challenge for any AI let alone an off-the-shelf, non-fine-tuned one.

As a demand driver, of course, the impact of the Voice Translation feature remains to be seen, but the use cases extend beyond Spotify’s likely intended goal, of reaching new markets of listeners. 

Language learners, for instance, can benefit from listening to more audio in their target language, and the feature could also give Anglophones access to podcasts originally recorded in languages other than English.

Ek has presented the in-voice translation feature as a draw for podcasters who wish to maintain a consistent brand across markets. But that can have its downsides, too.

“This sucks,” Petr Dvorak retorted jokingly on LinkedIn. “Why would I want my own shitty voice when the same tech allows me to explain stuff as Morgan Freeman?”