San Francisco-headquartered startup Descript, the company behind the audio and video editing platform, raised USD 30m in January 2021.
Nabeel Hyatt at Spark Capital led the Series B funding round, and Descript’s existing investors, Andreessen Horowitz and Redpoint Ventures (also an investor in tech-enabled LSP Lilt), both participated. All told, Descript has raised a total of USD 50m from these VCs to date.
Redpoint partner Satish Dharmraj will oversee the latest investment, while Travis Bryant, also of Redpoint, continues to advise Descript’s business development team.
Descript said it is building “a new class of AI-driven tools that push the limits of how we create media.” In practice, Descript’s flagship software — which uses both Google and Rev’s automatic speech recognition engines to power automated transcription — aims to simplify the process of creating and sharing audio and video content.
Although Descript offers same-language captions, the company does not yet have a specific timeline for integrating language translation, which Descript’s Head of Business Development Jay LeBoeuf identified as “one of our most requested features.”
One of Descript’s recently added linguistic features is a transcription glossary that enterprise clients can use to ensure their preferred terminology is reflected across all transcripts, including live transcripts. Another is filler word removal. Descript detects filler words and users can delete them with a few clicks.
Language service providers (LSPs) may consider using Descript’s platform and technology to potentially expand their client offerings. The transcription glossary and filler word removal could boost consistency within and across client documents, leaving less room for human error.
Another interesting feature that may open up new business opportunities is Overdub Voice, which offers “ultra realistic text to speech voice cloning.” Basically, the tool converts typed text into synthetic speech based on a person’s actual voice (Ref: lip-sync dubbing, synthetic dubbing).
While it’s likely too early for the feature to be used in entertainment media, for corporate media content it could be an interesting add-on to an LSP’s pitch.
As a safeguard against potential misuse, Descript users must confirm their identity and express consent in order to record Overdub Voice training data. The creator then “owns” the Overdub Voice, and their voice is used to confirm consent for future voice synthesis (although an owner can grant other users access to their Overdub Voice).
Descript’s Roots, Branching Out
Prior to its December 2017 launch as a native Mac app, Descript was originally an internal tool at another startup, Detour. Descript’s founders took a detour of their own. Deciding that Descript was a better idea, they built a new company around it instead.
Three years later, Descript has graduated from single-track audio and transcription editing tool to podcast-creating platform — and joined forces with voice-cloning tech startup Lyrebird, based in Montreal. This partnership laid the foundation for the July 2020 release of Descript’s Overdub feature. (Lyrebird now functions as Descript’s internal AI research division.)
Some of Descript’s best-known clients come from the world of journalism: The New York Times, Al Jazeera, NPR, iHeartMedia, among others. (Full disclosure: Slator uses Descript to polish its weekly podcast). Outside that field, LeBoeuf said, other use cases include marketing and sales, customer support, user research, and online learning.
According to LeBoeuf, “Descript is well established as a tool for creatives — applications including amateur and professional podcasting, vlogging, and content production. On the business side, we’re enabling everyone to be a storyteller.”