VSI CEO Mark Howorth on AI in Media Localization and Adding More Dubbing Capacity

SlatorPod #156 - VSI CEO Mark Howorth

Mark Howorth, CEO of VSI, joins SlatorPod to talk about his plans for leading and scaling the media localization provider.

Mark discusses his route into the media and entertainment (M&E) and language space as well as his path to joining VSI. Mark took up his role at the leading media localization provider in January 2023 after spending five years as the CEO of SDI Media including overseeing the company’s sale to Iyuno in 2021. He outlines how the media localization industry has evolved with the streaming boom impacting turnaround times. 

Mark also shares some of the challenges of hiring and retaining media localization talent internally and externally post-Covid. He offers his thoughts on the use of machine translation (MT) and automatic speech recognition (ASR) as productivity enhancers, rather than a replacement for human subtitling.

Mark talks about VSI’s historically self-funded growth and highlights the company’s plans to scale further in 2023 and beyond. Mark shares some of the growth drivers behind VSI on the back of +20% growth in 2022

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

The CEO reveals key initiatives for VSI in 2023, such as building dubbing capacity, widening its customer base, and geographical expansion. The pod rounds off with Mark’s 2-3 year outlook on the media localization market in terms of customer demands. 

Transcript

Esther: Why don’t we start off with you telling us a bit about your own professional background and how you got into media, entertainment and localization industry.

Mark: I’ve been in media and entertainment now for well over 30 years, really, from the services side. I started out actually as a consultant. I was a partner with Bain & Company, working in many industries. But in the latter parts of my career I focused on media and entertainment and then went through a success of a number of roles, let’s just say. Started off in the live TV sports production, so I was a CEO of a mid-sized company there. Once again very interesting because I would work with all the different providers of sport, so really got an insight into the different types of customers who are doing largely the same things. Then I went on to Panavision. I was the COO of Panavision. Once again providing cameras and lenses to every single person who’s making movies and television. And then ultimately ended up in localization when in 2016, I joined as CEO of SDI Media. We’d just been purchased by two Japanese companies and had a really good run there, a really exciting run, and we ultimately sold the company to Iyuno and stayed with Iyuno for a period of time and then left and ended up here at VSI.

Esther: Tell us a little bit more about VSI then in terms of services and client segments.

Mark: VSI is sort of the, as you said earlier, one of the leading localization companies out there. We’re pretty much a pure play localization company at this point where it’s dubbing and subtitling and of course all the things that go along each one of those, so it could be lip sync dubbing or VO or audio description. Obviously anything with time text, but notably subtitling. We have a large footprint, global footprint, which is centered of course in Europe, but we have the largest presence of localization in Latin America as well, and as of today, nothing in Asia, but let’s just see what happens in the future. We’re different than a few of the big players out there in that we have a relatively small media services operation, but once again, an area that we’ll be looking at as our customer needs evolve.

Esther: One thing I’m interested in hearing is your thoughts on how some of the major themes have evolved since you joined localization. You touched on English dubbing. I’m sure there’s other evolutionary trends in subtitling, dubbing, technology as well.

Mark: Now it’s going on seven years ago, came into the industry and it is amazing how much it’s changed. I don’t know where to start. Well, let’s just start when I first came in to the industry, people were still using words like theatrical, home entertainment and broadcast in some way to kind of talk about the quality levels or the different types of product that we worked with in localization. That’s gone away. The onrush of streaming has changed, so there’s theatrical and then there’s everything else. Which is not to say that there are quality tiers inside of everything else, but it’s really theatrical, episodic with various different quality as I said. Other themes, of course, are the bringing in of technology. In 2016, when people were talking about machine translation or synthetic voices, it was perhaps sometime in the far future. Now it’s just a question of how much and when. Certainly machine translation has become an important tool in the subtitling world and there’s so much going on with voice synthesis and artificial voices that I think we’ll see those here shortly. And of course, the other big evolution, I think, is I’m going to call it speed. It’s really as… Particularly driven by the streamers, where they want to have… It’s easy when you’re using the internet to go day and day and to launch your content in multiple geographies at the same time. And that means that you have to localize all those different languages before you hit the Go button and so the turnaround times that we’ve had in our industry have really shrunk and that puts a lot of stress on the whole workflows for all vendors.

Esther: Let’s talk a bit more about that. I mean, how does one, how does VSI go about managing those challenges when it comes to you have to do it quickly, obviously we want certain level of quality, the sort of speed factor.

Mark: I think that each vendor has their own approach to it. One thing that I’ve always been impressed with with VSI is we’re sort of regarded as a little bit of a higher end localization company when it comes to quality and I saw that from the outside when I competed with them and now being on the inside, I certainly understand it. I mean, there’s a couple of ways you deal with the speed. There are certain things you just can’t speed up. However, what you need to make sure is that you have enough people who are organized around it. So, for instance, when we do rapid turnaround subtitling I’m really impressed by the way they basically create a subteam here to get entirely focused on it and it’s really quite impressive where they just dive in and the technology doesn’t really help you there. It’s just a bunch of people who are really good at what they do and making sure you have enough people to do that. With Dubbing, it’s always a little bit more challenging. But once again, I think that technology can be helpful but here it’s about having really highly trained people who are dedicated to a fast turnaround. And then I guess the last thing is making sure you don’t over promise and that’s one thing that I think is really important about VSI, which is when we take work on… It’s almost like a religion here about not missing a deadline, right, a deliverable deadline and we just don’t take on work if we don’t think we’re going to do a high quality job at it. This is not the kind of place where you just let a machine go and do it and say, okay, we got it out fast. The quality really does come number one here and as a result, we’ve got a lot of people. I’ll also suggest that our people feel a little more experienced maybe than some of the other places out there and that really helps you because when you get into a tough situation, you want people who have been there before and done that and kind of help you through it.

Esther: One of the dynamics, I think, that we observe as well is you’ve got, obviously, the increasing volumes of content, but you’ve also got potentially a broader range of languages and interesting combinations. So I imagine hiring the right talent when that comes to dubbing, VO, everybody who’s qualified or people that you need on the project is challenging and also retaining those people. How do you go about sort of building and retaining that talent base.

Mark: Especially over the last couple of years coming out of COVID you’ve hit one of the hardest things and I think you can break it into maybe two big pieces. One is the freelancers and actors and directors, let’s call it the talent. Let’s deal with that second, because first, I think, is actually the people who work inside our company and the people who work at our customers. Okay? When you went through the COVID and what they call the great resignation or all these people changing jobs, what you saw is the relative experience level at both the customer, the people who are ordering localization, and the people who are providing the localization. Most of these people are in their first or second job, they’re 25 to 32 years old, they’re the ones on the ground each day kind of doing that work. Those are also the type of people who tend to change jobs frequently and so we’ve done a couple of things here at VSI that I’m pretty impressed with. One is I will suggest that the morale here and the people that they hire tend to stay a little bit longer than most. So even though I know when I got here, people were talking about this, but it was not nearly at the same level that I may have seen in other places in the past. So it’s good to have the people who like working here and staying here because ultimately they’re the ones who are teaching the customer who might be a brand new title manager at a company, and we’re helping them figure out how to get their project done. But that said, there has been turnover and we continue to hire and focus on our induction scheme, making sure people are being properly trained and mentored and just making sure that they recognize that coming here should be a good and fun place to work, so I think that’s the first challenge. But there’s no doubt about it that the average tenure on both the customer and the vendors have come down over time, and that means we have to be a little more forgiving. On the talent side, I really haven’t seen that much of a… I know there’s a lot more content out there, but frankly, we can always generally find actors and directors. Each market has its own interesting dynamics, as you may guess. Certain markets are sometimes more open to letting new talent come into the industry and other markets maybe a little bit less so as they want to make sure their jobs are there. So I think on the acting and director side, we’ve done okay. On the translator side, there’s been points when we’ve had a little bit of a squeeze, but that’s always a dynamic market. There’s lots of people who can do translation of some level, and obviously it’s a little bit more like a free market, and there have been times when things have gotten out of whack. And we saw that there was a squeeze a year or so ago, particularly in the Nordics, and that changed the pricing and also meant everybody had to go out and push harder to bring in some new translators. I think one of the areas that’s been hard for us as an industry is adapters and adapters are really the people in the dubbing world who may take a translated script, but it’s not just a translation, they got to make it work in their context. They also have to pay attention to lip sync, and that’s a real skill, and sometimes genre specific even and adapters are probably one of the most precious commodities we have out there, so the talent squeeze is real. And then just because I did mention, you raised it again, the English dubbing, it has been interesting as the growth of English as a target language, not a source language, has just exploded. It is fun to kind of see coming into those markets, even a place like Los Angeles where I live, or here in the UK, where we are today, there’s not a lot of directors and actors who have dubbing experience. They certainly have acting experience because of just the nature of those markets, but it’s almost like introducing from square one, how do you direct dubbing is different from how you direct something that you’re using a camera on or directing a play. And so we spent a lot of time creating new talent in those markets, and it’s been a lot of fun because everybody’s excited to sort of have a new craft, sort of emerge in a place especially like a place like LA.

Esther: You touched a little bit on technology, and something we have been asking our guests more recently is their thoughts on language, the hype around language AI. Especially in a subtitling or dubbing setting, how do you view open source technology? The kind of automatic speech recognition, like maybe Whisper from Open API. Do you see that having any impact on the subtitling workflows, for example?

Mark: I do, and I think there’s a difference between subtitling and dubbing in both of them. So I certainly have a fair bit of experience using machine translation or ASR for different parts of the workflow for subtitling and I think what we find is that they’re a productivity enhancer, but they’re not a replacement. I mean, the challenge is that even if you said something is 98% right, 2% off in our industry would be a disaster, right? It would require so much going back and doing it again and by the way, I don’t think there are any of the technologies out there that are even close to 98%. It feels to me that particularly in translation, I’m not talking about ASR right now, but just in pure translation, we’re kind of getting to the point we’re 75%. So that means that you still have to have a human go through it, probably two humans. One to do the translation, fix the mistakes the machine made, and then to still do a QC. You do save some time, but I can’t see a world where you’re going to see humans get replaced from the translation task. Now, doing the original transcript, things like what Whisper does, I think those are getting much better. But it’s surprising how in a perfect environment, it works great. But then you start to do some of that when you’ve got lots of content where it’s very noisy, a War movie as an example, lots of effects, and it’s interesting to see how sometimes these technologies are challenged and remember, we have to be extraordinarily accurate. Not just what are the words, but the exact time code when the word starts, right, it’s all very precise. So I think they’re helpful in the time text world, subtitling world. On the dubbing side, I think it’s hard to say right now. I mean, certainly I have been surprised. I was a little more pessimistic. If we were talking five years ago, I would be much more pessimistic about how fast voice synthesis has come on and I think it’s coming on very strong right now. These voices are getting much better. The ability to manipulate them is getting much better. And when I say better, it’s not just the quality of it but remember, for us it has to sound good but it also has to be something that we can do fast and efficiently. When you have thousands of files or lots of program hours, each one can’t be a special little project, right? It’s got to be something that is part of the normal workflow and I think we’re getting there. I think obviously you’ll start to see voice synthesis used in audio description for the vision impaired community, which I think is terrific because I think it’s going to create a massive opening up of content to a segment of our population globally that has not been able to experience it, so I think that’s going to be wonderful. When you go beyond that, it becomes a little more complicated because the human brain is very sensitive to real and not real. And, of course, what matters when we’re creating localization we want people… Our ultimate goal is for them to think that it was originally shot in that language, and much in the same way that even when you use a sugar substitute, even you can just tell, right? No. Even Coke Zero is pretty close, but you still go, there’s not real sugar. There’s something about our wonderful set of senses that we have that can tell us what’s real and what’s not real, and once something’s not real you’re not quite as engaged. And so I think that it’ll be interesting over the next couple of years to see how we bring those technologies in. Clearly, documentary… Sort of the science documentaries feel like a kind of place where that would work. I think there will be a place for it in live where people would be willing to be a little bit less engaged because there is a live component to it. I think we’re pretty far away before you’re thinking about voice synthesis for live action, comedy, romantic drama kind of thing. I think that’s probably a little further away. And of course the technology I think will have to also move part and parcel with the ethical and legal challenges that go in there about what you’re doing with voices and so it’s going to be interesting but I think it’s happening. The tide is coming in and I think the vendors will have to give some sort of response.

Esther: Crazy different topic, but let’s talk a bit about external funding. Esther

Historically, as far as I’m aware of, VSI has been self funded. What’s sort of your thoughts on raising money or taking on external funding in the context of a lot of competition out there.

Mark: I have to be careful about that, so let me just say I have a ton of respect for Norman Dawood who built this company from scratch really without taking on, as you just noted, any external funding, and he has built a very large and successful company with that. However, I also understand that as our customers consolidate and become bigger and the localization market gets larger, our customers are asking to work with large multilanguage suppliers, preferably who have owned and operated studios. And so as I come into the organization, certainly my remit is to take us to the next five years and I would certainly be very surprised if we’re not larger in the future. Does that mean that… Today we have no debt, we really have no external debt, which is quite nice, because you’re not beholden to a set of banks and you’re not beholden to a set of rules or worried about the interest that you have to cover. We don’t have that at VSI today. However, if you think about where things are going, I could see us growing a little bit faster and probably taking on some of that. I think that the appetite… You started out by asking about my history. When I was at the TV company and when I was at Panavision, both of those companies had taken on an obnoxious amount of debt before I got there, and largely that had brought those two companies down, and I was there to help clean that up. So I certainly know what too much debt looks like. As far as another investor or other deals, I certainly think we’ll be buying. We will be buying probably smaller studios. Probably I don’t know whether it’s 18 months ago, maybe two years ago, Norman was very successful at buying the best dubbing operation in Brazil, Vox Mundi, which is now VSI Brazil Vox Mundi. They are terrific and I see us doing more deals like that. As I said, noted earlier, we’re not in Asia right now, and you have to decide, do you build from scratch or do you do a deal, and I think we’ll look at some of those. So I could see us partnering with people. I could see us taking on some debt. Right now I’m only two months into this, so I’m still just trying to understand who everybody is and get to know as many of the 700 employees as we have here today.

Esther: For our listeners, we at Slator have just released our Language Service Provider Index, the annual one. VSI features on it. It’s among the Leader category. Coming out of a year of, I think, more than 20% growth in 2022. Thinking about growth, where do you see that coming from moving forward? Services, I mean, maybe you touched on some of the regional?

Mark: Let’s talk about the easy growth, right? By the way, our growth projection for 2023 is quite strong still. So obviously we’re only in March here, so we’ll see how it goes. But as we look out at the year right now, we feel like 2023 will be a significant growth over 2022. Let’s call it… The obvious growth is that our share of wallet at our major customers right now is still very, very small. So obviously there’s a few other companies out there that are larger than us, provide more services, but I am surprised even in the last two months, when I talk to the customers at the top 10, key individuals at the top 10 content producers, they want to do more work with us. The reality is that we haven’t had enough capacity to supply them all. So right now we’re adding capacity in France, we’re adding capacity in the US, we’re adding capacity in Germany, we’ll add capacity in Spain. I mentioned some of the new markets that we may be looking at. So these are material increases that almost we know before we build them that our customers want to come. So I think our customers want to see a better spread of their localization dollar or pound or euro, whatever currency they’re using, and I think that that will work in our favor for the next couple of years. We offer a high quality product, and while every piece of content is important to our customers, I get it, some is more important than others. And I think that there’s a slice of product that is very made for VSI to come in and work on,  so I feel pretty good about the future here.

Esther: 2023, can you identify any major priorities on the roadmap?

Mark: I already touched on capacity, right, and the problem with capacity is that it’s not like subtitling, where subtitling capacity, you can ramp up or ramp down very quickly. I’m talking about dubbing capacity here, and unfortunately, this looks like a normal wall, but it’s actually soundproof, and there’s a lot of money in this wall, let’s put it that way, and it doesn’t happen by itself. So there is a little bit of a lead time. So our first priority is getting some of this capacity built out this year, which we will do. I think the next one is for me, and this is maybe more of an internal look, but I think it’s bringing VSI together as a group. We have offices all over the world, and I want to make it easier for our customers to give us one piece of content and get 20 languages. We do that today but I think there’s an ability for us to expand. And then, as I said, there are a couple of markets I would be very surprised, might not happen, that if we’re not in Asia by the end of the year. It’s got to be the right way. I know that one thing I’ve always respected about Norman, who’s still involved in the business by the way and talks to me almost daily, is we want to be thoughtful. Better to make the right answer than make the quick answer and so we will be thoughtful about that. But yeah, it’s grow that capacity, widen that customer base a little bit, get us all acting together as one team. If we just do that this year, life will be pretty good.

Esther: A question to round off our discussion today, I’m sure we’ve covered this, but could you summarize just a few thoughts on what your vision would be for the industry in two to three years time?

Mark: I’ve said this in other contexts before, but I really believe that our customers are going to demand that in the future they have three or four big, let’s not even say localization, let’s say media services vendors. It’s hard to underestimate how sensitive our customers are because a lot of the content we’re dealing with is prerelease or more importantly, or additionally precommitted, so if they say they’re launching a series on this day, it’s getting launched on that day and that creates so much stress in the customer organization that then gets transferred onto the vendor organization. Our customers want to know that their media services vendors are solid, financially healthy, and capable of deploying all the right people and all the right technology. And I hate to say it, but the day of the little boutique localization vendors in little countries, who by the way, can probably do work cheaper than we can because they’re not making the same investments in technology, they’re not making the same investments in project management teams, even though they might be able to do it cheaper, if you go one by one, that world is not going to exist any longer. Our customers are getting leaner in their operation. That means they’re pushing more and more activity onto the vendors and so there’s going only be three or four survivors in this and of course I strongly believe that VSI will be one of those. But even then we have to widen our offering, as we said, with media services, so I think it’s going to be a different world. Certainly price will be important. Our customers would like prices to come down. On the other hand, their demands are going up and so we have to find that right balance. And I think especially if you look… I’m going to take a step back. If you look at the historical failures of some of the media services companies, Deluxe before when it went bankrupt, obviously it’s not bankrupt now, but Deluxe went bankrupt… Had challenges, Technicolor had challenges. Well, there’s even people in the localization world who have had challenges, let’s be honest. And so I think that our customers recognize that and they know that if they squeeze too hard, that’s bad. The vendors themselves have to be smart about not over committing in terms of debt but we’ve got to find a world where we can be successful and the content company can be successful and I think they know that, so I think they’ll end up being three or four people. I think it’ll be a world where there’s lots of technology, right, like I said, MT for subtitling, voice synthesis in its own place. More importantly, the way we handle and store files, asset management is really critical, and those are big investments. Storing petabytes of data, what you put in the cloud and don’t put in the cloud, so you can get things quickly. These are big things that only the big companies can do, and so that’s how I think the world changes.