Limecraft CEO Maarten Verwaest on AI-enabled Subtitling and Audiovisual Translation

Maarten Verwaest joins SlatorPod to discuss Audiovisual Translation

In this week’s SlatorPod, Limecraft CEO and Founder Maarten Verwaest joins us to talk about digital workflow management in the media sector and AI-enabled subtitling. Maarten shares his entrepreneurial journey, tells us why he doesn’t plan to bring on financial investors, and explains why AI should not be left unattended. 

First up, Florian and Esther discuss the week’s language industry news. We revisit the majority sale of UK-based translation management system (TMS) provider XTM. After a competitive M&A process looking for a potential buyer, XTM opted for US-based growth equity firm K1 Investment, which specialises in enterprise software and SaaS investing.

The two discuss an MT-related court case in Poland. Having taken their client to court over non-payment for a book translation, a Polish LSP soon found their practices under scrutiny by the court and expert witnesses.

The court concluded that the book had not been translated by a “professional translator” (as originally advertised and sold to the client), and had instead been 92% translated using Google Translate — the 2013, pre-NMT version. Not the outcome the LSP was looking for. 

Esther talks about a major deal win for LSP Semantix and friends worth around USD 40m over four years. Semantix, Summa Linguae, and ESTeam — the software company headed by Trados creator Jochen Hummel — partnered up to bid for an EU translation contract for the Commission’s Directorate-General, and succeeded in ousting incumbent provider AMPLEXOR, which will wave goodbye to the contract (and USD 10m a year) at the end of February 2021.

Sticking with Europe’s public sector, the two talk about the European Parliament’s new remote interpretation platform. The roll-out of the selected platform, Interactio, involved a huge deployment effort and saw them tackle myriad challenges such as scale, funding, and firewalls. A project that would have taken two years was fast-tracked as a result of Covid-19 to be up and running in a matter of months.

As remote interpreting went fully mainstream in 2020, UK-based LSP thebigword secured another three years of interpreting work from the Ministry of Justice (MoJ). The major contract they were originally awarded in 2016 is being extended, and now involves some 24 million minutes of distance interpreting per month, in addition to significant volumes of socially-distanced onsite interpreting. 

Subscribe to SlatorPod on YouTubeApple PodcastsSpotifyGoogle Podcasts.

Stream Slator webinars, workshops, and conferences on the Slator Video-on-Demand channel.

Transcript powered by Limecraft

Florian: Tell us what Limecraft is in a nutshell? Give us the elevator pitch.

Maarten: Limecraft in general is on a mission to become the largest backbone for professional media production, professional video production in particular. That includes television and film but applications also include eLearning, marketing and communications content for large enterprises. For those customers, we manage video assets and we take care of their production process. That includes, by the way, subtitling and localization, which is the direct cause of this conversation.

Florian: What is your background? How did you get started in this and tell us a bit more about the localization piece? Generally about your career, professional journey, how did you become an entrepreneur? 

Maarten: I have been an entrepreneur probably since I was born, without the work. It was a winding road. I have a master in physics because I wanted to understand how the world works and then after leaving university, I started working for a very traditional metallurgic company but all we did was develop nanofibres and develop a database, so I learned IT on the job. Then I decided for the rest of my career to go into information technology and to knock on the door of a company where information technology is the blood and vessels. 

I went to the public service broadcaster in Belgium. I said you must do a lot of IT in here and that was a brilliant move. Being responsible for technology at a broadcaster is like you are allowed to drive the Rolls-Royce. It was a fantastic time. VRT together with NRK in Norway, SVT in Sweden, the BBC in London, we pioneered and we were among the first four broadcasters having digitised the entire newsroom operations. Imagine from tape-based cameras to file-based production, we literally built the technology ourselves to make that possible. It was at that time that we discovered there was a huge opportunity as well for entertainment production so documentary and scripts. In the research and development department, we started working on the prototypes of what now is Limecraft Flow. 

Then in 2010, we, my co-founder and I resigned from VRT. We hired the lead developers that were working for us before at VRT so we jump-started the company just to find out that the media sector is horribly conservative. We were quite a bit ahead of the rest of the pack so it took us a couple of years to get the prototypes, to get that product-market fit right for the mainstream majority of media producers. 

I explained we manage video files, a lot of them, but subtitling actually happened as an accident as part of managing video. Video is not self-descriptive. Typically, you need an archivist to describe in words what is in the images and you can index those pieces of text and consequently retrieve your fragments. Quite early in 2012, we started working on automatic speech recognition for the purpose of indexing video and the first use cases were for documentary makers and journalists for transcribing interview material. 

Then in 2015, one of our customers approached us, said, you have built a beautiful transcription product, you should consider cutting those transcripts and subtitles. Which we did and looking back, probably saved our lives because, in 2015, 2016, the European Commission published legislation that made it mandatory to put closed captions on videos, the European Accessibility Act. Whereas traditional subtitling companies were not yet into artificial intelligence, we did not even think about it but we were quite good at handling speech to text and for us, it was a natural step. Initially having it deployed at the BBC, now with 10 broadcasters all over Europe who run their subtitling processes as one of the steps in the overall production process using Limecraft Flow. It was an interesting journey, not straightforward, but that is how it goes from the perspective of an entrepreneur. 

SlatorCon Silicon Valley 2024 | $ 1,340

SlatorCon Silicon Valley 2024 | $ 1,340

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Esther: You have mentioned a bit about the origin story there and how it came to be, are there any other milestones that you want to share with us?

Maarten: We started in 2010 with a crazy idea of putting software as a service offering in the markets, in a notably conservative market, asking producers to outsource the core of their operations. 100,000 euro per shooting day in monetary value to developing companies somewhere in Belgium and obviously that was a hard sell. Just about at the time where we decided to pull the plug or to make a pivot towards traditional software, a few well-known names like France Television, BBC and Warner Brothers came to us and asked, can you deliver us a cloud-based product? Yes, that is what we are. These companies typically operate multiple sites all over the place and it does not make sense to host point solutions for managing video and all these locations. What we anticipated in 2010 and what was a bit early at that time all of a sudden became reality. Once the BBC signs up and backs your company as a reference, then it goes much faster. We are operating 80 plus customers, the majority in Europe, and we are currently looking to expand in the US and APAC as well. 

Esther: Just so we understand a bit more about the subtitling components and the language elements, I know you mentioned ASR, but could you walk us through how subtitles get produced from start to finish in your system? 

Maarten: In our system, it is a bit different compared to how subtitles are traditionally produced by language professionals, and that in turn means the technological alternative is sometimes hard to accept by language professionals. When we are relying on automatic speech recognition it is important to first have a QC pass before you start cutting such transcripts into subtitles. Broadcast grades so high on subtitling. It is not just best practice. It is mandatory to put the line breaks after the commas and the periods. The accuracy for placing the punctuation marks is critical for the look and feel and the acceptance of your subtitles. If that is not properly done, you get typically computer produced subtitles that we all know of, like YouTube. Nobody accepts it, as this should not go on television. This is the expectation level of language professionals when they think of artificial intelligence. 

First of all, we have worked a lot on improving the accuracy of the speech transcription to put a number on its word error rate of two per cent or less. Then it becomes acceptable for professionals. Then the speaker segmentation has to be really good. The punctuation has to be really good and once that is done, cutting it in subtitles is a one-hour movie that is twenty seconds and it will be as accurate as done by a human. That is quite shocking. The difficult part is the transcript, removing the non-meaningful words, etc. The easy part for the machine is cutting it up in subtitles that are properly timed and in fact, a machine is better in counting video frames and more consequent compared to the same job done by a human being. 

Then the other part of the equation is where subtitles need to be translated. In fact, it is two different use cases, the same language subtitling for accessibility purposes, and there is interlingual subtitling for localization purposes and they are quite different but we offer both. In the beginning, usually, language professionals absolutely hated the idea of having to rely on artificial intelligence but frankly speaking, once they get used to it, they will not be willing to give it back. Once they are comfortable and they have been able to flip it into their workflow, they will not step back anymore from that comfort zone.

Florian: Are the subtitling and the caption just an add on or is it part of the core? Who are you talking to when you are talking to a new client, who is signing the checks, who is the person with the wallet there that you are targeting and how does the language component play into this? Is it an add on? Is it something more interesting? 

Maarten: First and foremost, I agree with many language professionals out there that we should consider subtitles as part of the core of the product. There is video, there is audio, and there should be decent subtitles and it is not just an add on or a gimmick. Subtitles should be of a high standard. That is what differentiates great European content from some non-domestic products out there. 

The answer to your second question, who are we selling this to is more subtle. Traditionally, local producers that were doing content for regional broadcasters did not include subtitles. It was just the local language, now there is legislation making it mandatory to put captions. More importantly, the rise of video-on-demand platforms is rapidly replacing local broadcast. Overnight producers are now creating content that is intended for worldwide distribution in the first place. Redistributing content via film festivals to nondemocratic broadcasters is not just an add on giving some extra margin to the producer, it is the design principle in the first place. 

Traditionally, subtitling of localization was outsourced to specialized language service providers, subtitling companies. We know them, SDI Media before they became Iyuno, etc. Those big and small LSPs did the bulk of the subtitling at a cost of, say, in continental Europe, eight to 10 euro per minute. Now, if you are a producer and you want to create several versions for worldwide distribution, this becomes a serious part of your budget. You want to be able to create additional vergers at marginal cost. It is Limecraft’s working assumption that the more artificial intelligence becomes more and more intelligent, the easier it becomes for producers to access those technologies themselves straight without intermediary language service providers. Artificial intelligence will take care of the bulk of the work, 70, 80 percent can be automated and that they will do the definition themselves. 

Now, this is exactly our pitch. When we approach a broadcaster we are looking for the department where they make short-form content as a first stepping stone in the company, because we know there is a very short turnaround cycle, the time between production and putting that content online is usually minutes rather than days that would be the cycle time when you rely on an LSP. If the BBC puts a five-minute piece of content on Limecraft Flow, it is transcribed and subtitled in two or three minutes. It is polished two or three minutes later and they are good to go. 

The starting point is often short for video for Do-It-Yourself operators. If I tell you that we have just signed up the Associated Press for processing 20,000 hours of content per year with the same convincing argument, which is a much shorter turnaround cycle between the availability of video content and having it transcribed and subtitles ready for distribution, that is it. I see it becoming a state of the art, accepted as a best practice. Then companies like the BBC, ITV, are all insourcing again subtitling. Hence looking for solutions that can automate the bulk of the process and gearing up the subtitling team internally that can then take care of the polishing and the finishing. I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. I Recruit Talent. Find Jobs

Esther: What is your experience of scaling that or scaling the solution to longer-form content generally? Then also thinking about how does one scale as a company, how does one scale within this industry of media localization and media provision? 

Maarten: We still have to see how it evolves. What originally happened as an accident, developing this first prototype, no more than four-plus years later and we find ourselves in a very explosive market due to the fact that what was restricted to television and film video, in general, became ubiquitous and best practice all over the place. Plus, the requirement to have subtitles under it as an essential component, it is hard to judge what is the total addressable market of such technology. 

I can tell you what we are not going to do. We do not plan to, like some of our competitors, put 20, 40, 60 million in venture capital in the company and create an incredibly accelerated growth. Frankly speaking, the level of intelligence of what we call artificial intelligence, quote-unquote, is just becoming good enough to be useful in some areas of the localization overall landscape. It is not yet good enough for high-quality fiction production where there are a lot of specific languages, dialects, etc. It is going to be a waste of time. When we are dealing with eLearning content so there is no soundtrack for documentary-style content where Sir David Attenborough does the voiceover at a very low reading speed, it is almost 100 percent match. We are at the tipping point you see, for some use cases, it is useful, for some not. 

As a technology vendor, we have to take care to manage expectations and to take them step by step, and we are going to evolve. The interesting point or position for us is that we are not just there for subtitling. Usually, it is part of a much wider agreement, Russia is managing some of the steps in the video editing process and we will see how it evolves. This is, by the way, how we usually grow the footprint in such a company. Sometimes it starts with the subtitling and then we go into the production area where they need speech transcription in the first place. Sometimes the other way around. 

What we do stress and emphasize with our customers is that they should only do the transcription once. Transcription has so many beautiful applications along the production process, it can be transcribing raw material at the very beginning. It is helping to create shortlists and rough cuts in the edit suite and it can help to do the subtitles. We try to convince the technology managers at bigger companies to create a certain awareness that they should do it at the source not when the edit is locked and they are about to do the subtitling. Let us start at the source and drag along those transcripts with the video as the process of reusing earlier results. 

Florian: The VC’s, maybe growth equity or what have you, are probably knocking at your door at the moment, you tick a lot of the right boxes, AI, cloud, media. Are you saying you are taking a very deliberate approach to growth right now rather than taking on a bunch of outside funding and trying to grow super fast? What is your approach to getting outside funding? 

Maarten: In 2015, when we were in need of capital, I think we would have gladly accepted external capital. Obviously, we struggled our way to the point of the break. Now we are growing the company at 10, 12 percent per month. There is no direct need, no urgency at all to grow the company faster. Let us not forget, it is a developing market and you can hurt yourselves by trying to run faster than the market can handle. We would rather take it step by step. We are watching the situation closely. When it comes to partnerships I think that is a different angle to the same subject. 

As an entrepreneur, I am convinced we should not try to compete with American companies using an American style. American tech companies have access to 10 times more capital than European companies. Now, there are two things we could do. Either we move our headquarters to New York or to the West Coast and we become an American company or we try to do it differently and in a more creative approach or more European style. We chose the latter. Rather than to screw with companies like ZOO subtitles, LinkSoft, Haymillian and automating their process, we say, here is our platform, it is five labels, you can cherry-pick and check what you can do with it to optimize your margin and to overall bring a better solution to the market. We do not want to release AI unattended to a market because it will always need manual review and QC. Consequently, we do not want to screw your business so let us combine the best of both, not in a single company, but we will stick to our own strengths so that is more an alliance based partnership, allowing enormous flexibility, scalability, and agility. If a crisis like Covid-19 pops up, a loosely coupled network of affiliated companies can manage such an impact better than a VC-backed company. I am not sure if I am right. When we look back in a few years from here, we will be able to make the right conclusions. That is how we have been growing the company to date and it in a way proves to be successful.

Florian:  Basically, you are going directly to the production people at the big media firms. On the other hand, you are also using the LSPs as client/channel partners in a sense. Just to follow up, so there is never a competitive situation where you are potentially going to the LSPs end client? There would not be a sensitive issue at some point?

Maarten: There is this sensitivity. There is a potential conflict of interest now, like in many other areas or situations you manage. If you look at our website, yes, we can host some producers directly, but we are open about it and we prefer working through trusted service providers rather than directly. We take this from an opportunistic point of view for sure. Working with the service providers gives us much larger exposure in a shorter time frame then compared to what would happen if we were to enter the competition head to head. As a consequence, we make agreements whereby we restrain or decide to not operate directly, otherwise, we would screw the partnership. It is a balance. We have decided to seek as much as possible those partnerships and to go indirectly because it creates a much wider scale. It has more potential in the end. It may be a bit slower, but we will see how it scales in the long run.  

Esther: You mentioned there, not wanting to set AI loose and unattended on the market. What do you think about some of the cutting edge research and development that is happening in various areas of speech translation, like voice to text, direct voice, voice translation?

Maarten: Frankly, I sometimes get a very creepy feeling. We have been labeling speech to text technology, which is 30-year-old technology and I even believe it is developed somewhere in Flanders, here in Belgium. When you look today, what happens with visual transcription, so not just audio transcription, but also a computer watching images and deducting the full semantic meaning of what is happening over there, not just facial recognition, not just voice recognition, but recognizing a human being in their capacity of what they are doing. That is amazing with a speed and an accuracy that outperforms what any human could do. Expect that if that technology evolves a little bit further, I think we are going to see automated video editing processes and what used to be controversial two years ago in the area of subtitling, the machine cannot do this part of the creative process. Now it is getting accepted. 

I think the same is going to happen in the area of video editing. As an engineer, it is absolutely interesting but I think we should not leave AI unattended. It needs to be handled with the greatest possible care, meaning in turn, that the job of the video editor and the job of a subtitler will change. Coming from a point where the bulk, if not all of the work is done manually, it will become a high-level job, they will become a machine operator. Some will like it. Some of the tech-savvy operators that like to drive cars and motorcycles will love it. Some will not like it and I think the very nature of their jobs will change. 

Florian: Hiring around AI, machine learning, cloud, tech is hard. There is a lot of competition from big tech companies. How do you get new staff excited to join Limecraft, now that you are in growth mode, double-digit growth every month? 

Maarten: Hiring in data science and artificial intelligence is hard, the demand is much larger than the offering so it is crazy. Fortunately for us, we have a huge competitive advantage, and that is that Limecraft, its core, its DNA is in the media sector. The media sector is always more attractive than Fintech or other industries, so if you look at our job openings on LinkedIn or in newspapers, we will try to use that as a convincing argument. We work for the best producers out there, why not join this fantastic team and have a look at what we do. That is our secret weapon.