On April 20, 2021, AI video generation provider Synthesia announced they had raised USD 12.5m in Series A funding. The funds will be used to focus on enterprise user growth and product development, Synthesia Co-founder and CEO, Victor Riparbelli, told Slator.
Riparbelli declined to share the company’s valuation, but said their SaaS Product, Synthesia STUDIO, has received an enthusiastic reception since it was launched six months ago.
According to Riparbelli, the UK-based startup currently has “thousands of customers in 40 countries, both S&P 500 and individual creators” and has generated more than a million videos for clients since the business started in 2018.
The Series A round was led by New York-based FirstMark Capital, an early-stage VC firm with investments in companies such as Riot Games, Airbnb, and Shopify. Synthesia said their USD 12.5m round, which included all existing investors and two new angel investors, is the largest investment in the AI video space to date.
Synthesia also raised USD 3.1m in seed money from a round led by LDV Capital and entrepreneur Mark Cuban in 2019.
Avatars Saying Anything in Any Language
FirstMark Managing Director, Matt Turck, blogged about the investment on his website and detailed Synthesia’s approach to video generation, explaining that, Synthesia greatly simplifies creating a business video and offers “a compelling text to video experience.”
Synthesia uses AI to create and customize avatars from a library of (real, human) actors as well as synthetic characters. The avatars are lines of code — they can be told to “say anything, in any language, opening the door to mass customization of video at scale,” Turck wrote in his blog post. The actors also receive payment when their likeness is used by a customer.
Asked about the main use cases for Synthesia’s offering, Riparbelli said that corporate communications, digital video marketing, and advertising localization are the main areas of focus, adding that they “also see great opportunities to partner with S&P 500 companies on their training [e.g., e-learning] needs.”
Another emerging trend is that of personalization, he said, pointing to an online video campaign Synthesia worked on for Lay’s crisps, entitled Messi Messages. The campaign, which features an avatar of footballer Lionel Messi, lets users select from different message options to have Messi’s avatar deliver a personalized invitation to watch a game.
“Since they are just code, the avatars can say anything, in any language, opening the door to mass customization of video at scale” — Matt Turck, Managing Director, FirstMark Capital
Riparbelli said that for the Messi project, “all we needed was five minutes of training footage of him speaking to the camera.” Synthesia’s algorithms learn from existing footage of the actors. So this same technique can be applied to a company exec for a corporate communications video, for example.
And where does the localization and multilingual element come in? According to Riparbelli, “it is absolutely key.” Synthesia’s clients use their multilingual capabilities every day, he said, and the “feedback from clients is that being able to communicate in video and in 40 languages has been a game changer.”
From a language technology perspective, Synthesia does not appear to have developed any specific capabilities internally (e.g., machine translation, speech recognition, or synthetic voices).
“We are focusing on improving the experience of synthetic video for now, with a specific focus on how to create personalized videos at scale,” Riparbelli told Slator.
The Real Competition is “Boring PDFs”
Asked about the company’s relationship with dubbing studios and media localization providers, Riparbelli said Synthesia sees them as partners rather than competitors, and they “have many as customers.”
He added, “They are building services using Synthesia. And we also use translation partners to create scripts in 40 languages for our corporate clients.” As for Synthesia’s true rivals, “our real competition is boring PDFs that nobody reads,” Riparbelli quipped.
The CEO also provided an update on Synthesia’s 2019 goal of wanting to dub their first feature film in the next couple of years, saying, “We still absolutely believe this to be true. In 10 years, anybody will be able to create a Hollywood-grade movie from their laptop. Cameras will be replaced by code.”
“Our real competition is boring PDFs that nobody reads” — Victor Riparbelli, Co-founder and CEO, Synthesia
He further stated: “Our mission is to reduce the entire video production process of film crews, studios, actors and cameras to a single API call. As the platform advances, our long-term vision is to make it possible for anyone to create a completely synthetic Hollywood film from their bedroom, without the need for anything else than a laptop.”
In the meantime, Synthesia is focusing on what they identify as an explosion in video adoption, which accelerated during Covid-19. According to Riparbelli, “current methods of production don’t scale.” Therefore, rather than attempting to disrupt the existing premium niche of media and entertainment dubbing (at least in the immediate future), Synthesia sees its sweet spot as catering at scale to the expanding video production market.
This is a thesis echoed by speech translation startup Papercup, which has raised USD 14m to date. Joining as a guest on SlatorPod, Papercup CEO Jesse Shemen told Slator: “We are not in this game to try and replace the dubbing industry. I am fine with it existing, by all means, but there are literally billions of hours of content that are untouched because they cannot necessarily afford the traditional method of localizing.”
For more on multilingual video production and synthetic voices, check out Papercup’s Jesse Shemen and Simon King discussing their AI dubbing and synthetic voices venture on SlatorPod.