Showcasing AI-assisted music, South Korean entertainment company Hybe unveiled its latest project on May 15, 2023. With the help of voice AI and YouTube’s early access multi-track audio feature, K-Pop singer, Lee Hyun, released the song “Masquerade” in six languages simultaneously under the guise of his alternative personality, Midnatt.
Originally recorded in Korean, the song was then engineered into English, Japanese, Chinese, Spanish, and Vietnamese. Language AI startup Supertone’s voice technology was also used to create the female voice featured in the song.
Lee Hyun (Midnatt) sang all six language versions of the song and AI was employed for pronunciation correction using voice data from native language speakers. Hence, all versions sound natural in terms of accent and pronunciation but also retain Hyun’s style and musical expression.
It is currently unclear what other requirements in terms of services or human experts are required — e.g., for translation, localization, and song/lyric adaptation.
Multilingual Trailblazer MrBeast
YouTube began testing multi-track audio with a small group of content creators back in 2021. The feature allows several audio tracks to be added to a single video, enabling foreign-language dubbing, voice-over, and audio description for the blind and partially sighted. To clarify, the feature does not auto-generate foreign-language content; creators must produce this themselves.
Popular YouTuber Jimmy Donaldson (aka MrBeast) was one of the early adopters of multi-track audio as part of the test phase. He emphasized the impact for content creators, saying “if you [dub] into the top 15 languages, you can basically reach 90% of the world.”
More Languages, More Views
Until now, offering multilingual audio content required making, editing, and uploading multiple videos. Hence content creators often manage several separate channels in different languages.
YouTube’s multi-track audio streamlines content creation to one video and supercharges content as engagement is centralized in one place, thereby increasing views and watch time. Not only is it easier for creators to manage, it is also simpler to track channel performance.
From an industry perspective, this paradigm shift could open up new markets for language service providers (LSPs) and voice AI companies. Andrea Ballista, CEO of Voiseed, highlighted some of the potential applications of creating emotional virtual voices in multiple languages, including in media, entertainment, game localization, advertising, and eLearning.
The adaptation of music for multiple globalized versions may not be up every musician’s street since the lyrics are typically a core element and translation could harm the intended effect or musical style. However, globalizing music could be another possible avenue for LSPs, machine dubbing providers, and voice AI startups.