Building a SaaS Product for Subtitling, Dubbing with Checksub Founder Florian Stègre

SlatorPod #177 - Checksub Founder Florian Stègre on automated subtitling and AI dubbing

In this week’s SlatorPod, we are joined by Florian Stègre, Founder of Checksub, to discuss how the SaaS platform powers subtitling and dubbing for SMEs and enterprise clients.

Florian’s entrepreneurial journey began after graduating from business school and founding video production company, ROOM Agency, which laid the foundation for Checksub. 

Checksub primarily serves SMEs and enterprises, where they have the option to work with external providers in the video editor.

Florian expands on the platform’s unique features, including animated subtitles and voice cloning, which analyzes the emotion and type of voice and provides a better match for the original speaker.

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

Florian shares why he took a bootstrapping approach to company building, focusing on achieving product market fit with an eye on long-term sustainability rather than rapid scaling.

When it comes to building tech in the fast-evolving age of AI, Florian keeps up to date with recent research, attends conferences and workshops, and collaborates with industry professionals.

Checksub ensures they remain ahead in the competitive AI market by focusing on skills development in their team. On the roadmap, Florian plans to enhance user experience, improve the translation editor, and add more capabilities to their AI dubbing platform.

Transcript

Florian F: Florian Stègre is joining today, so Florian is the Innovation Director and founder at Checksub, automated subtitling, and AI dubbing platform. So it’s a SaaS platform with monthly annual subscription options, which is really cool, kind of in the broader language AI space that there’s these platforms emerging. So hi, Florian, and thanks for joining. So also, let’s start with a long, long time ago, your professional background. How did you start, and what’s your journey to founding Checksub? What were some of the light bulb moments that made you start this particular company?

Florian S: Yeah, that was a long time ago. So after graduating from business school, I found myself enrolled within various American IT companies, HP, Dell. The experience was priceless, but I had a growing desire to set up my own business, so I decided to move back to France with the intention of starting a company. I just wasn’t quite sure what that company would be yet. I immersed myself in research, exploring hundreds of potential ideas, and then I started working on promotional videos for starter because some friends asked me. They knew I was able to record video and write scripts, and I created a first company ROOM Agency, and it turned out to be the catalyst for what Checksub is today, because everything starts one night. I had to create, like, an English subtitle for a French video. It was the first time I was creating subtitles, and it was a nightmare. I had to finish for the following morning because the video has to be published, and I spent the entire night working on it and at the end, I said, never again. I will never do subtitles again in my life. But more and more people was contacting me with YouTube kickstarter at the time and one day, I decided to return to school to study computer science. To imagine a platform that could translate video automatically, so that’s where, after months of work, Checksub was born.

Florian F: You’re the sole founder of Checksub or?

Florian S: I’m a solo founder. At the beginning, I was alone, and step by step, I bring more people into it.

Florian F: Nice. So it’s the classic pain point, personal pain point experience, origin story. Like, I did it, it was so horrible, and then can we not build a product that addresses this? Were you doing it in some kind of video software, like Final Cut Pro or something or how did you?

Florian S: At the time I used like AegiSub, which is a famous subtitle software, but you have to do everything manually and I was thinking, wow, that’s crazy. YouTube was offering auto subtitles, but when I see the output they give me, I said, okay, I prefer to start from scratch.

Florian F: Back then it was probably terrible. Even now it’s not great.

Florian S: They don’t know what punctuation is, so that’s a problem.

Florian F: There you go. So you kind of gave us the, I don’t know, light elevator pitch, but tell us more about Checksub, like key features, offerings, products, and also the pricing options, which I find fascinating. So, so few so far kind of language startups, language AI startups have managed to really put together a compelling subscription offering. So tell us more about just what Checksub offers and then how did you kind of decide on the pricing structure as well?

Florian S: Yeah, so Checksub is a technology-driven platform offering auto subtitles, machine translation and AI dubbing. And our primary focus is serving small and medium enterprise and large enterprise with a need for efficient and high-quality video localization platform. So we didn’t have this target at the beginning, but yeah, you adjust when you search for product-market fit, you adjust the target and we found that was the client for which we can bring the more value with the platform. So about the features, recently we deployed AI dubbing features, it was more than a year ago, so I said recent, but at that time it was one of the first platforms offering that. And about the pricing model, that’s a big challenge. It was a big challenge because we are focusing on enterprise, but you still have, like, a small creator who want to use the platform and because we start working with them, at the beginning, we didn’t want to close the door. And we still want to give the ability for NGO, small creator to be able to use the platform to translate their video if they want, so we have this subscription model for them. But when you are a larger company, SME and large enterprise, we have special enterprise pricing where you can manage more volume.

Florian F: For the enterprise, does this include any service as well, like you would be taking care of any part of it? Or is it purely just access to the technology that you’re giving even under the enterprise package?

Florian S: We took a decision a few years ago because when we start Checksub, we had the services like human translation and human transcription services. But at the end we decided it wasn’t the same business and if you want to be good, you have to be focused. So we decided to turn off the human services to focus on the platform and how clients can work with partner agencies or they can do it internally with their own team. And that worked great because that’s the best way to train your own translator to your industry, to your enterprise and your jargon and everything.

Florian F: Interesting, so when you were saying product-market fit, so you had probably some requests for services, you maybe initially took them on to please those kind of early clients. But then you decided, well, that is a very different line of business, let me focus on the platform.

Florian S: I wasn’t satisfied enough to provide accurate result at scale because that’s very a different business. You have to train the people, you have to onboard the people, you have to manage the payment of the people. And it’s difficult to say, okay, when you are a client, you come and I always give you the same person to work on your project. That’s very impossible because maybe the person will not be available to work on your project and you can’t wait one week and because of that it’s very complex to provide a good quality for our clients. And so we decided to focus on the product because that’s where the value were.

Florian F: It’s just hard to scale and what you described with the availability of kind of the same translator, that’s like one of the key reasons why kind of these large language service providers exist because they can solve this at scale, but it’s just a very different line of business. So now you have subtitle, machine translation, AI dubbing, kind of walk us through this and which parts of your client base would use which component, which solution. I mean, I guess you mentioned you have some individual creators, some NGOs, SMEs, and then what will be the enterprise? What would they use most? Or is it kind of the integrated package?

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms.

SlatorPod – News, Analysis, Guests

Florian S: Basically we have different kind of use case. We have the one where they want a simple solution to be able to translate like external communication or learning content. That’s a very important use case for us. So if they just want an easy solution, they will start with the subtitles part, okay? And if they want a more advanced quality, they are going to use AI dubbing because you should know that not like other platform, other solution in the localization video industry, we develop our own video editor. So it means since the beginning we spent a lot of time developing this product because we decided to develop a video editor. And thanks to that, you can very easily manage your subtitles project in the same tools than your dubbing project. So when you have the subtitles, you just have one button where you click and we are going to be able to generate the automatic dubbing synchronized with your video in one click.

Florian F: Lip synced, I guess in the traditional media localization sense would not be actually changing the mouth like AI style, but it would just be lip synced, right? Now we’re seeing all these… Basically the AIs actually changing the lip movements to align with the text, but what you’re offering is kind of a very closely timed voiceover or how would you describe that?

Florian S: Yeah, it depends on who you are talking to. You may have a different definition but voiceover is basically when you add a voice on top of the original voice, like you have a reality TV show, they add the voiceover on top but you keep the original voice. But if you do dubbing, you are going to replace the original voice with another voice. So basically we do both because we allow you to adjust the audio level so you can keep the original voice or you can just remove it.

Florian F: Got it, and so as a user, I’m uploading like what, an MP4 or any kind of file and I’m editing it in the cloud on Checksub?

Florian S: It can be a video or even an audio file. We support both type of file and then we process everything in the cloud. So it can be in the EU or it can be in the US because we are working with large companies, so the data privacy is something important to us. And after that we are going to generate the subtitles, the translation and the dubbing if it’s desired. We are also providing voice cloning. That’s a unique feature we developed a few months ago and we deploy it to some of our users and it brings a new step to the dubbing because before we had like voice libraries where you can pick the voice you wanted, but now you’ve got a voice similar to the original one. So it means we are not exactly cloning your voice because you are going to speak another language, but we are going to analyze the emotion, the type of voice and it’s a better match because when you watch a video and you have like a large voice with a small guy, it doesn’t match, so it gives a better quality at the end.

Florian F: It’s funny how like what you just described in voice cloning, you’re adding it in a sense as an incremental feature on top of what you already have, right, and it’s great for clients that they can use it. But there’s like probably a cohort of startups whose kind of sole USP right now is having voice cloning, right, but you’re adding it on top of something that you already have. What’s your take here on… Where does the real value lie for your customers? Like they’re probably not coming for the voice cloning but they love having the voice cloning there, so why would they come to you and kind of get started in the first place? And then another quick follow up question. How do you find them? Do they find you? Do you do sales? Do you have salespeople, business developers or how do they come in?

Florian S: I’m going to reply to the first question about the voice cloning. So you’ve got some solution out there who do provide a voice cloning but the use case is you want to create content with your voice from scratch. So it means you have just a text and you want to clone your voice and generate audio with the text you give. And it’s similar in term of technology, but we cover another use case where you have a video and you just want to get the transcript, get the translation and generate a voiceover with the voice cloning. So even if the technology is similar, the use case is way different because if you do it manually, you will have to synchronize every sentence with your video, you are going to generate the transcript, you are going to generate the translation, et cetera, et cetera. Checksub do everything with you, for you. Okay, so that’s the big difference with some voice cloning startup. The second question about the sales… So because we have six years old, Checksub is six years old now we have a very good referencing on Google for subtitles and stuff like that. So some people just come on the website and send sales request and we also do some sales on LinkedIn and stuff like that, but we don’t spend so much time on sales actually. But that’s one of the focus for this year because, yeah, we spend a lot of time developing the product we are proud of and now we are very proud of the product and we can scale the sales.

Florian F: The founders, sometimes if they’re technical, they tend to be a little shy, not shy in kind of a psychological sense, they want to build the most perfect product before they’re even there to go and speak to a client. And obviously it’s great when you have inbound from Google, right, but I think at some point generally in the localization translation industry, there’s just a lot of sales required. I mean, some of the successful companies there, the large ones have like hundreds of salespeople and it’s a tough one to crack.

Florian S: We are bootstrapped since the beginning, that’s what I like with bootstrap because it’s like life: it’s not a sprint, it’s a marathon. So when you have external funding you have more pressure to go out there and maybe it’s good, but I like to build something I enjoy and very powerful. So now I’m very happy when I see many clients using the platform and say, yeah, Checksub is amazing, it’s magic, and saves me hundreds of hours.

Florian F: I was going to ask you about the bootstrapping versus external funding, so you have absolutely no intention of raising money at this point? You’re very happy being in full control?

Florian S: I started the company in 2017 and at that time the market was very different. Everyone was raising money, but I didn’t feel it was right because raising money if you don’t have a product-market fit looks like a fake success. So at the beginning, because I was solo founder, I decided to find the product-market fit before raising any external funding. Then I found it and we began to grow and develop Checksub and we were profitable. So it became clear that our product resonated with a real need in the market. But I didn’t raise money because I didn’t need it. So it doesn’t mean I’m not going to raise money in the future, I don’t know, but it’s not something I’m looking at immediately.

Florian F: It’s a very modest/almost kind of a little bit European old school answer. Like, if I don’t have product-market fit, why would I go out and raise kind of and have other people pay my finding product-market fit? And then once you have it, you’re like, well, why would I need somebody else? Unless you have like, yeah, you really feel that there’s this huge, massive opportunity that couldn’t be captured otherwise. But yeah, I understand that bootstrapping is a great way to just retain full control and kind of go incremental, so why not?

SlatorCon Remote March 2024 | $180

SlatorCon Remote March 2024 | $180

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Florian S: It’s a lifestyle in term of philosophy. It means, okay, you can raise money. Then when you raise money, the goal is to sell the company and when you sell the company after, you have to find a new company to found and then you raise money. And I’ve got some friends who raise money at that time and they sell the company and after they call me and say, Florian, I like the way you are doing business, can you give me some advice because I want to do something small, I don’t want to lose too much control and go in the wrong direction because I lost the direction. It’s just maybe it’s clearly it’s slower, but I’ve got a life, so you don’t need multiple success to be happy.

Florian F: Let’s quickly talk about machine translation and the translation component of your business. I’m assuming you haven’t developed this component, but you’re kind of plugging into some third parties? And then if you can disclose, what are those third parties? Or how do you provide the translation bit? And is it fully machine translation or could the client also have linguist in the loop? Just walk us through the translation component.

Florian S: Yeah, so our approach since the beginning is to provide state-of-the-art translation, involve a blend of proprietary technology and external resource. It means if something exists, we are going to use it because it doesn’t make sense to rebuild it. But you have some gap sometimes where you have to develop your own technologies. So we develop our own AI solutions sometimes, but we can use such solutions such as DeepL and other kind of provider because they give very good results, but at the top of that we may add some internal technology to improve the result.

Florian F: Are clients able to, like if they wanted to have a linguist, kind of another layer on top of that, could that be done inside the platform or?

Florian S: Yeah, exactly. You can work with external provider on your project. So that’s very useful because you can have like a link where you are going to send the link and say, okay, can you review the project, can you edit it? And the human professional is going to have the full project ready with a video with the editor and it’s quite easy to use the editor. It means what most of the clients like is that you don’t need to work with professional subtitlers. Even a translator can work on the interface because we manage the synchronization of the script with the video while you are doing the editing, so that’s important.

Florian F: How are you handling all these heavy video files? Like if I open an account and I drop you like a half gig, I don’t know, whatever file, how do you handle that? Is that something that can be controlled quite easily on some AWS or how would that work?

Florian S: Yeah, so that’s something we didn’t think it was going to be so difficult to manage at the time. Now some technology emerged and it’s quite easier. But yeah, we are using technology similar to AWS S3 to be able to provide a large storage solution and be able to stream the video when you edit the project. So, yeah, now you can send very large file and we are able to manage it quite perfectly in the editor.

Florian F: I think that’s one of the areas where in theory the idea works and you got some kind of beta product, but then, yeah, people start using it and they drop all kinds of terabytes onto your server and then maybe the math doesn’t even work anymore in terms of the pricing structure and all of that.

Florian S: We are focused on that, but that’s the main difference between mature solution and new solution because when you are a mature solution, you had the chance to master a lot of bugs at the beginning, to master a lot of exception and stuff like that and control the cost. So when you are a mature solution, you have to find solution and fix that.

Florian F: What are your thoughts about entertainment grade subtitling, for lack of a better term? I mean, just today, for example, I came across an article that says that Netflix now with all the Hollywood studios shut down, for example. Apparently they have a lot more subtitled content that’s being consumed, especially in the US, because a lot of people would be watching or slowly transitioning to shows that are developed outside of kind of the Hollywood production studios, et cetera, and then obviously subtitled or dubbed. Okay, so it’s kind of a twofold question, so have you even tried to enter this kind of entertainment grade subtitling market or dubbing market and what’s your thoughts on kind of the potential for this for you in the long run?

Florian S: I didn’t have that information, so that’s a great thing about Slator, you are always learning new stuff, so thank you. No, that’s great news because I think it’s good to spread some other culture to the world. So when you are watching a non-English show, it’s great. About focus at Checksub, it’s predominantly on serving enterprise and SME, so we see a big potential in entertainment sector and the demand for localized content is growing as you can see, so basically we see the potential. So giving our technology capabilities and the quality of our services, we believe we are well equipped to serve this market so we are going to move forward to this market but at the moment it wasn’t the focus we had.

Florian F: You say SME is an enterprise, but within would it be the marketing function or what type of kind of function or person would be a key client?

Florian S: Yeah, so we have different use case so marketing, communication, external communication or internal communication and we have also the learning department, so all of these departments create video. Video takes time and they don’t have the time to spread this content to every language. But when you are a worldwide company you have to spread the knowledge when you are learning department to everyone. But at the moment, not everyone is speaking English for example, so only 20% or between 20% and 40% of the employees is going to have access to your content if you are a worldwide company but if you translate in two, three, four languages you are going to be able to spread the knowledge. So it’s very important and especially at this time where the training is key, everything is moving so fast you need to be able to bring your employees to the new destination and learning is key and internal communication is key. And the other side we are covering is the external communication, people who want to create like short video with the raise of TikTok, of short content and we develop unique features for that. So you can generate subtitles but we are also providing like a customization features. You can use your own font, you can add animation and not only static style so you have like the karaoke style and stuff like that, so it’s very famous on social media platform and you can master it very quickly.

Florian F: I love the karaoke, like I do like a weekly update to the team and I actually also use that karaoke function in a different tool. I might consider Checksub and I love it because it kind of focuses the mind. I’m still torn about the one-word subs or like the two-word subs which is super fast and just like for a fraction of second you see the word as the person says it. Do you think that’s kind of a fad or is that here to stay? What do you think about that?

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry.

LocJobs.com I Recruit Talent. Find Jobs

Florian S: I think it depends on the audience. If you are talking too young, you need to have a fast pace. But if your audience is older, if it’s too fast, it’s going to be very difficult to follow for them because they are just going to be focused on the subtitles. So I’m not a big fan of too fancy subtitles because it doesn’t have to replace the content itself. So what I recommend is to focus on the content and then add adapted subtitle style to your target, so it really depends on your target. If you are going on TikTok with a 30-second video, it’s easier to have very quick subtitles everywhere, that’s good. But sometimes even myself, when I look at the subtitles, I don’t even have the time to read them because it’s so fast.

Florian F: Probably 30 seconds is like the max and it’s just pure raw entertainment and you’re not really focusing and probably not retaining much from these videos either, right?

Florian S: What we do to solve that is that you can apply a style only on specific subtitles and you can say, okay, I want the style for all subtitles. You can use like more chill style, let’s say, a slower style. But sometimes when you want to focus on the specific subtitles, you are going to use a special effect.

Florian F: Let’s take a bit of a tour to AI, like the dubbing that you mentioned and there’s been this huge shift since you started, especially like in 2017. And obviously now in the past, I guess now it’s nine months with Whisper and with all of these… That’s of course more on the subtitle side, but generally also with voice, huge AI shift and ChatGPT, et cetera. So I guess my question is when you’re building something that’s involving AI dubbing, how has the past year kind of influenced the way you’re building? And did you have to adjust a lot or you kind of just kept incrementally adding some of these cool new technologies to your product or, yeah, just walk me through that?

Florian S: The field of AI indeed evolved significantly since we started Checksub. However, we believe the fundamental concepts remain the same. Leveraging machine intelligence to create solutions that solve real-world problems. That’s the idea. So the name changed from machine learning to AI, but basically the technology changed a little bit, but that’s the same principle. So in term of the way we follow the innovation, so basically we stay updated with recent research. That’s very important. We participate in relevant conference and workshop and we also regularly collaborate with other professionals in the industry to share some insight. Since the beginning, we decided to build the video editor. We can easily bring capabilities on top of the platform we have, so even if you… Let’s say we have a new technology available for machine translation. We are going to just be able to build an external services that we are going to connect to the platform we have and that’s very powerful and that makes a difference. We spent a lot of time developing that, but now it’s quite easy. We just add something and for example, just to give you an idea, what we did at the beginning for transcription. When we look at transcription, we see that we have many external provider, but not all of them give the best result for every languages. And because you have regular updates, it’s difficult to say, one time I’m going to use this one, and one time I’m going to use this one, because you have to integrate them. So we develop our own meta API, that’s what we call a meta API and we connect the API services to every provider available and like this, we can easily use the best one at the time.

Florian F: Does your client care, know, are they aware, like, which one you’re choosing? Or is it kind of a dynamic process behind the scenes?

Florian S: It’s a dynamic process. We thought about the possibility to give the possibility to the client to select the providers they want to use. That’s something we could do, but actually at the time it wasn’t relevant because you want to do subtitles or translation, then you trust Checksub to pick the best services to provide that.

Florian F: Got it, because then you’re saying like, okay, they come to us, they need a solution for real life problem and I don’t want to overburden them with having to select 50 different options here. All right.

Florian S: Because you don’t know which one to use, but what we do for large enterprise, because it can save a lot of time, we can do a benchmark to see which one gives the best result for their content and then we adjust the technology we are going to use.

Florian F: Speaking about transcription, how did you integrate Whisper at the time? Or was that a big deal or?

Florian S: Whisper is open source and that changed a lot in term of open source, but you still have a gap in term of quality from our point of view. When you have team in other companies who work on that, they improve the output. So if you want to just deploy Whisper because some clients maybe decide, okay, let’s deploy Whisper like this, we are going to be able to provide a transcription to everyone with a cheap solution. The problem with Whisper is that you have like a wrong time code per word. The time synchronization is quite bad from scratch, so you still have to spend time to train the model and stuff like that. So we deploy Whisper, but the output at the moment isn’t that great and because our focus is to provide the best translation, best transcription, best dubbing, at that time it’s not so useful. But we know that we could spend more time to train the model with data and improve the output. But the problem is we have other priorities right now and we think we have more value to improve the dubbing, for example, than improving the transcription, because transcription is quite great at the time.

Florian F: Now, when you say we, that’s the team, how do you hire anybody that’s even remotely capable in AI today or retain them because they got to be getting offers left, right and center?

Florian S: Being remote from day one has given us a distinct advantage compared to the market. So we are not geographically limited in our hiring. We can tap into a global pool of talent, so that’s a big advantage and because everyone is in remote, it’s not like a fix in a part of the company. Everyone work the same way. And beyond hiring, we are working hard on developing the skills of our team. AI is a fast developing market, but the number of experts is quite low at the time, so if you want to pick the expert, it’s going to be a war to get him. Then you have to match the person in the team. You don’t know… It’s not because he is a great expert that he is going to be a great employee in the team. So we are choosing to recruit people and help them build up their skills on these areas, like LLM for example, so we help the team to learn new skills internally at Checksub.

Florian F: Anything on the roadmap, new features 2023, 24 you can disclose?

Florian S: For this year, it will be the year of maturity. On the product side, we are going to work on the user experience. We want to go a step further in term of translation editor. We spend a lot of time on the transcription editor. We think it’s one of the best one on the market and we want to provide something similar for the translation part. And on the other side we want to focus on the dubbing. Dubbing is a key features for us. We’ve got an advantage and now we want to bring a feature on top of that. Recently we deploy a solution able to extract the music background and the voice, like this you can translate and replace only the original voice with a generated voice. Because when you have a video like this, you don’t have sometimes the music file, so we are able to get the music, get the voice, and you can replace only the voice and keep the music and that improves the quality of the final exportation.