Synthesia Raises USD 3.1m for Lip-Sync Technology That Could Change Dubbing

According to a recent poll, only 15% of Slator readers prefer to watch foreign-language content dubbed rather than subtitled. It is worth noting, however, that the Slator audience is, perhaps, not the best measure of the current state of dubbing. Our readers are more than proficient in English and have a natural affinity for foreign languages.

While the dubbing market thrives in many territories, as this Goethe-Institut article on dubbing in Germany highlights, lip-sync dubbing is a difficult science. The goal is for foreign speech to perfectly match the movement of a speaker’s lips, and the result can be jarring when the visuals do not match the audio.

One company striving to improve audience experience for lip-sync dubbing and raise engagement is AI video production startup Synthesia. UK-based Synthesia has developed technology that automatically synchronizes an actor’s lip movements to a different language.

Founded in 2018 by a group of researchers and entrepreneurs, Synthesia has just secured USD 3.1m in funding. The founders hail from University College London, Stanford, Technical University of Munich, and Foundry, and include Prof. Matthias Niessner, whose research has focused on Deep Video Portraits and Face2Face.

The investment was led by LDV Capital and early investor Mark Cuban, owner of NBA’s Dallas Mavericks and of Shark Tank fame. New investors include MMC Ventures, Seedcamp, Martin Varsavsky’s VAS Ventures, TransferWise Co-founder Taavet Hinrikus, Tiny VC, and advertising executive Nigel Morris.

Slator spoke to Synthesia COO, CFO, and Co-founder Steffen Tjerrild to find out more about the company’s technology and plans.

Democratizing Visual Effects

According to Tjerrild, the decision to start Synthesia was fuelled by the lofty vision “to democratize visual effects, allowing content creators to talk to a global audience in their native language.”

The founders were spurred on by “recent advancements in computer vision, computer graphics, and deep learning,” which Tjerrild said “are finally allowing us to bridge the uncanny valley.”

He described the AI-driven technology as reanimating the entire face of a target actor, done by replacing the original video with a synthetically-generated face mask that can be reanimated.

Because of this, Tjerrild explained, Synthesia’s technology is “not subject to the traditional limitations of lip-sync phoneme adjusted scripting” — meaning the movement of a speaker’s lips no longer needs to be factored into the translation. Instead, Tjerrild said, they “allow the dubbing actor to come up with the best translation without thinking of lip-sync.”

But dubbing artists can rest easy. It’s not a case of replacing the voice artist, Tjerrild pointed out, since they would “still need a voice artist to speak the language you want to dub the video into.”

“We are not replacing the voice artist we still need a voice artist to speak the language you want to dub the video into” — Steffen Tjerrild, COO/CFO & Co-founder, Synthesia

The company website shows they have worked with the BBC, BuzzFeed, Accenture and, of course, the Dallas Mavericks. At the moment, Tjerrild said, they work along three main use cases: corporate communications, digital video marketing, and advertisement transcreation and localization. Synthesia is also “in dialog with several dubbing studios and media localizers that are looking for new ways to create an engaging localization experience,” he said.

Although the technology is not currently being applied to full-length feature films, Synthesia is “working on expanding the capabilities,” Tjerrild said, and hopes to dub its first feature film within the next couple of years.

“We think this will ultimately grow the demand for dubbing as we can help deliver a better consumer experience” — Steffen Tjerrild, COO/CFO & Co-founder, Synthesia

He added, “We think this will ultimately grow the demand for dubbing as we can help deliver a better consumer experience.”

Since the idea behind this technology is essentially manipulating a video of someone to look like they are saying something else, what does Synthesia think about the danger of “deep-fakes”?

All tools “can be used for good or bad,” Tjerrild said, noting that “soon, these and other sophisticated technologies will be widespread and it is important that the risk of misuse is both acknowledged and alleviated as much as possible. In our view, the best approach is to create public awareness and develop technological security mechanisms to ensure all content created is consensual.”

Tjerrild declined to discuss Synthesia’s valuation, but said they plan to use the money to “expand the research capabilities and take our first product Synthesia ENACT to market.”

Interested parties can view a raft of videos demoing the tech, including a TechCrunch-covered David Beckham video, on Synthesia’s website and YouTube.