A few weeks ago, I watched a video of a presentation from a technology speaker on Youtube. He had a very heavy accent and made grammatical and pronunciation mistakes in almost every sentence, so I used Youtube’s automatic closed captioning to see if it would pick up what he was saying.
The results were amazing. Despite the accent and the mistakes, Youtube’s neural transcription engine managed to get most things right—and, where it did not, it was simply hard to understand what was said, even for a human.
Youtube’s closed captions are now powered by neural transcription engines, and it is not the only one. VerbIT, a Hybrid Transcription services provider run by Tom Livne, is a good example of a company founded on the very same technology. The model is straightforward: neural transcription engines produce an initial transcription which is then post-edited by humans. Finally, quality control experts review the project before it is delivered to the customer.
Just two or three years ago, voice recognition engines performed so poorly that VerbIT’s model would have remained theoretical and not practical, much less potentially profitable. Today, neural transcription engines produce transcriptions that can be up to 95% perfect, i.e. a human post-editor needs to fix only 5% of the text. Naturally, better sound quality could produce even better results.
Tom and his team’s post-editors do not even need to be professional transcribers, and yet customers get perfect results faster than they would from human transcription and at lower costs. Furthermore, VerbIT, as a company, has much higher throughput compared to traditional transcription companies.
A Revolution Before Our Eyes
Consider the magnitude of the revolution that is happening before our eyes: two to three years ago, high quality voice recognition was limited to very expensive systems and even then did not work very well. Today, commercially available inexpensive engines are handling complex transcriptions increasingly accurately. The revolution is happening faster than traditional transcription companies care to realize.
It is true, however, that there are some caveats to consider: most machine transcription engines are trained for English only, for instance, and factors such as voice quality and number of speakers also limit output accuracy. Yet this too is also changing quickly. The success of automatic English transcriptions drives faster adoption of other languages.
It is reasonable to assume that within one to three years, automatic transcription engines will successfully handle most common languages and types of material.
This technological achievement will have a huge business impact resulting in a totally different market layout. New players that leverage the hybrid transcription approach—combining neural machine transcription with human post-editing—are expected to dominate the market.
This massive change is just a matter of time.
Significant Competitive Advantage
Hybrid companies have another, major, business advantage: margins. The high accuracy of initial output requires minimal error correction. As a result, gross margins can be substantially higher compared to traditional transcription companies where intensive human effort requires higher rates of compensation. With better margins, VerbIT has a significant competitive advantage over incumbent transcription agencies.
On top of that, the hybrid approach also drives market expansion, as customers that have not transcribed before due to high costs and slow turnaround times can now reconsider transcription services. The specific budget and speed requirements of these customers also mean that their first choice is guaranteed to be hybrid companies, i.e. their business is a net addition to the current market size.
Clearly, the hybrid transcription approach is way better than the traditional approach. Likewise, most traditional transcription agencies that do not change in time will simply cease to exist.
Now replace “Transcription” with “Translation,” move the clock forward one to one and a half years, and it is plain to see the significance of this story.