On April 21, 2023, workflow automation tool Zapier announced that it had enabled a new OpenAI integration: With the Whisper API, users can now create transcriptions and translate spoken languages into English text.
Zapier allows users to connect data from separate apps via triggers and prompts. The Whisper integration has been a long time coming — meaning, in the fast-paced world of AI, that Zapier thanked Techstars veteran Yohei Nakajima in December 2022 for building the unofficial integration. “This was pre-ChatGPT!” Nakajima tweeted for context.
For its part, OpenAI introduced Whisper in September 2022. The automatic speech recognition (ASR) system was trained for transcription and into-English translation on 680,000 hours of multilingual supervised data from the web.
In March 2023, OpenAI made ChatGPT and Whisper models available on its API, “giving developers access to cutting-edge language (not just chat!) and speech-to-text capabilities.”
According to Whisper’s GitHub page, “Whisper’s performance varies widely depending on the language.” (Interestingly, the lowest Word Error Rates were for Spanish and Italian, followed by English. Nepali, on the other hand, had the worst WER, followed by Belarusian and Armenian.) At the time of writing, Whisper is available in 98 languages.
In a tutorial, Tyler Bryden, Co-founder of speech-to-text company Speak Ai, called the integration “disruptive.”
“People who had relied previously on some of the companies dedicated to this all of the sudden can chain things together,” he said. “Sometimes [it] can be a little bit cost-prohibitive, to flow a bunch of Zaps in if you have say hundreds, if not thousands, of files coming on a daily or weekly [basis…]. But even if we look at the price and cost structure of Whisper, it could be significantly lower than the price structure of some speech recognition systems.”
Bryden praised Whisper’s ability to produce accurate transcriptions from noisy audio, as well as the quick turnaround: “Previously it was generally a 1:1 basis. You take an hour to upload, and it’s gonna come back in an hour.” For now, limitations include the amount of data the integration can handle per file (25 MB).
Users can currently access core functions free of charge, and Zapier offers a 14-day trial for premium features. Back on GitHub, developers are already experimenting with new uses for Whisper, with and without the integration, such as extracting lyrics from songs and subtitling video files.