Dubverse co-founder, Anuja Dhawan, joins SlatorPod to talk about launching the machine dubbing startup and powering video content dubbing using the latest AI language models and tech.
Anuja starts with her journey to co-founding Dubverse with Varshul Gupta as a result of the rise of English speaking video-generated content during the pandemic. She goes through the company’s core team structure where the focus is on scaling through AI and technology.
She discusses how they tailor their product to language demand in India as roughly only 10% of the population understands or speaks English fluently. She talks about their approach to tackling technical challenges when producing highly natural, emotional AI voices.
Anuja notes the importance of dubbing short-form content from large social media and video networks like YouTube, TikTok and, specific to India, ShareChat. She unpacks a human-in-the-loop model where they onboarded a translator to improve linguistic nuances in their automated workflow.
Subscribe on Youtube, Apple Podcasts, Spotify, Google Podcasts, and elsewhere
She also shares the story behind connecting with Dubverse’s first investor, Kalaari Capital, and the lessons learned from the interview process. The pod rounds off with the startup’s future plans to make dubbing published content even simpler for creators, so they can break the language barrier.
Transcript
Florian: Tell us a bit more about your background. How did you get started with this AI dubbing idea?
Anuja: India is a land of languages. Every state, every city for that matter has a different language or dialect that we speak within different regions specifically. The idea got started when my Co-founder and longtime friend Varshul was working on AI. He has been into AI for about seven years. He has done a bunch of projects there and went to Delhi to do a research paper in computer vision. He came up with this idea when the lockdown happened and we could see everybody going back and locked in their houses. Only the internet was the way to interact with the world, so that is when he thought that there is enough educational material available for all the people who understand English, but what about the people who do not have access or who do not understand this language equally? How do they upscale themselves? That is where it got germinated from. Then he spent quite a bit of time understanding how this problem can be solved using technology. That is how it got started. From then to now, it has been about two years, I believe and we tested the waters, and did a few things here and there for education, which I am sure we will talk about in detail, but that is when we realized that it is not only education. It is way beyond that because the amount of content that is being generated right now is immense and it is everywhere and specifically video content. That is where it got started and now we have raised our first fund. We sought validation from the market. Those are the building blocks for the journey that we have taken.
Florian: Tell me a bit more about the Co-founders and current leadership team. How many people are on the team? What are their functions? What are their roles? What are their backgrounds?
Anuja: We are a small team and that is how we plan to be because we are a deep tech company. We are trying to do things at scale using technology, so we are a team of seven people as of now and primarily heavier on the engineering side. Varshul, as I mentioned, is the AI engine or the AI brain behind the whole project and now it is taking shape as a company. He looks at the product and the technology side where we come up with a new magical way of doing things, using technology in the language specified. We have an AI engineer working on our voices, our translations, and how to make that more contextual, so a lot of work is being done on the AI side. Very soon you should be seeing a lot of technology coming out of Dubverse because there are a lot of cutting-edge things that we are doing and experimenting with. Again, more on the product and engineering side is where we have more people. We understand that we do not come from a linguistic background, so we have onboarded someone recently from that space who has been a translator himself for the last four years and has worked with the likes of Google and Netflix on different localization projects. We believe somebody from the industry is required to get that nuance when you move between different languages, so that is what we are trying to do. We are trying to relate ourselves more to the language world. We also have multiple consultants working with us who have pioneered in the terms of AI or NLP and language, voice recognition specifically. The core team is seven people but we work with a bunch of people to make sure that we are headed in the right direction.
Florian: Now tell us a bit more about what languages you currently cover. What is the expansion plan, Indian languages versus European languages, et cetera, versus other Asian languages?
Anuja: Currently, we cover about 30 plus languages, so this is a mixed bag of Indian languages, the regionally spoken ones, and a few international languages, the majorly spoken ones. The idea from day one was not only to stick it out to India. What we have seen and why all of this started was because we could see that not everybody understands English. Only 10% of India speaks English or fluently understands English, 90% do not, and if you look at the content available on the internet, 75% of the content is still in English. That is the gap that we saw and we jumped into a great opportunity there. We believe that in Europe there is a very similar problem from a language standpoint. There are smaller regions that speak a specific language and come onto the internet and want to consume more content in their own native language and not just English. We do not see this as just an Indian problem, but a global problem. Interestingly, there is a lot of inflow from the US as well, for that matter, from an English to Spanish standpoint. We see that as a very strong use case in India, English to Hindi use case. From a problem statement standpoint, we are definitely a global product from day one and hence that is the reason I want to have a conversation to understand what the market looks like on that side of the world. Build a product that can be globally acceptable and seamlessly used, so that is the idea we see here. We are supporting 30 languages, and very soon we plan to add more languages. We will have coverage from 100 plus languages very soon.
Florian: You consider Dubverse a deep tech company and there is this huge universe of components around processing voice, machine translation, speech-to-text, et cetera. How do you evaluate what you are building with your own engineering and machine learning team and what components of the stack are you buying or licensing or subscribing to?
Anuja: We are a deep tech company because we believe we are trying to solve a problem that has not been solved yet in the world. A platform or a service or a technology like this does not exist which can give you contextual translation into a different language altogether. Hence, we are doing something new. We call ourselves deep tech but we are very user-focused. We are 100 percent reliant on the market to tell us what is required, so what we call internally, super users. We have 20 of these super users who we chat with daily and these are all bulk video creators. For whatever purposes, they are creating a lot of videos, either in an unusual capacity or for the company that they are working at but the idea is to understand what the workflow looks like. The idea is not to bring everybody onto our platform, but for us to go to them and become a part of their workflow. If we talk about your speech-to-text, text-to-speech, or text-to-text translation, the combination that we do specifically at Dubverse, these have been in existence for quite some time, but in silos. You have software where you can go and do text-to-speech, speech-to-text, and translation. What Dubverse is trying to do is bring all of these together in a product-defined manner where any person with a low-code capacity can come onto the platform and use it. The idea here is not to reinvent the wheel. Translation has existed for a very long time and multiple bigger players are doing it across languages, so that definitely acts as a base for us to get started at least. The problem that we see now is that those things are not contextual. We have broken this problem down at a category level. For example, sports is one of the biggest categories that we are working with at the moment. We do enough work within the sports category that if we reach out to a sports person or a sports video, we can translate that without any human involvement. When we do a translation, there is quality analysis, which is required to make sure that whatever the machine has generated is of a certain quality, it can be understood, it is more contextual, and it is more conversational rather than a literal translation. That is the idea that we are headed after, where we pick a category, we solve it, and then we replicate it across.
Florian: Sport is full of emotion, so what is the primary technical hurdle when you are looking at producing emotional AI voices or robot voices or whatever the appropriate term is? Synthetic voice?
Anuja: When I say sports, we are primarily looking at sports training, so there are different types of courses which are out there. There are different types of training videos that are out there from a sports standpoint, which are physical sports, like swimming or football or anything on those lines, or soft skills, like chess. These are training videos where somebody is trying to learn a specific sport, how to play, how to learn the techniques, and so forth. Those are the videos that we are doing. Emotion is the biggest hurdle and the most challenging aspect now when it comes to AI translation, so we have certain solutions there. At Dubverse we have been able to prepare an engine where with minimal input we can create an AI speaker, offering an actual human voice. For example, there exists a cyber user who can imitate my voice and can imitate the way I speak as an AI speaker. Any creator can use my voice on top of their content or video and dub it in English. I am not a professional, so we are going to professional voiceover artists now and recording there. Only one hour of data is what we need as input and as an output, we can create their AI speaker which will be a 90, 95% match. Whatever they are speaking, however, they speak, however they can convey a message that is being replicated in our AI system. Our target is to take this to 100% and then the boundaries are endless. You record a person’s voice once and they can create any content from there on.
Florian: How important are social media channels and video networks for you, like YouTube and TikTok? Are you seeing some shifts in the past two years since you started in terms of the adoption of these networks?
Anuja: YouTube and TikTok have played the biggest role in terms of how content is consumed and initially, it was more about music. It was only about those things, but then these content creators in the last two to three years have come on board. If you are stuck anywhere in your life, you would just go to YouTube and you would search for a video and you would have a solution for it. Interesting story, I was traveling once and I had this scooter and I was unable to unlock the boot because it was a little complicated. What I did was I Googled a review and in that review, this person explained how to open it and I was able to do it. I am saying that it is not only entertainment, it has just become a very critical part of our lives now. Anything you want to know you go to YouTube and we have an answer from that. Likewise for TikTok, so TikTok made it so simple for anybody to be a creator. It is a click away and we have seen what TikTok has been able to do. In similar ways what we also see in India happening is that multiple other apps have come out after TikTok was banned in India which are ShareChat, Moj, and so forth. Very similar to TikTok where they can create bite-size content which is very quick and easy to adapt. Adoption because of these applications of the internet has gone immensely high and that is the reason people have come onto the internet and now they are using it for different purposes as well. All of these apps have accelerated multifold and a very good shift that we also see along with this is the requirement or the demand which is increased from a vernacular standpoint or from a standpoint that I want to consume content in my language. I do not want to see only English and that is the reason we see the supply also increasing from a certain standpoint where these individual YouTubers, and individual people are creating multilingual content and not just sticking to English as a language on the internet.
Florian: What are some of the apps that have emerged that are taking TikTok’s role and is it easy for you to cover them or does it take additional work? How does that work?
Anuja: ShareChat I believe is the biggest now. I am not sure of the exact stats but ShareChat is a similar application where you can go and create easy bite-size content. Multiple such apps have come forward and the adoption for that is insane because the population in India is on the higher side, so if you launch an application, which is more consumer-friendly, they are getting those returns. If they are getting that engagement, more and more people will be more than willing to join those, so we see that happening. For us at Dubverse, that is step two. In step one, we are trying to solve this contextual translation within different fields and make our system more robust, make our AI more contextual and once that is done, we would love to integrate with these applications. Imagine if you, as a TikTok user, upload a ten-second video that is only in English, but it can be uploaded in five different languages on the go. That is the future that we imagine with all of these different applications. You do it once but it gets published in multiple, so that your addressable market is tenfold, twentyfold depending on the number of languages you publish it in.
Florian: When you are looking to expand into the B2B space, business to business in Europe and the United States, who would be an early best-fit client for you at the moment?
Anuja: To start with we are going after the smaller setups. SMB, small and medium businesses are what we primarily look at because we believe that those are easier to work with. It is a faster turnaround time. We pitch a product, pitch a use case that we look at and within the coming week we can start working on it and start delivering. We see two specific use cases, one is specifically, those that are creating a lot of videos as their main product. You are creating any course, from martial art training to how to use Excel to anything along those lines. If you are creating a lot of content there, video commerce is one of the very interesting things which has come out again from a video standpoint. Those can be directly used or if you are creating videos to support your main product or service, so product experience videos, how-to videos, featured adoption videos, and so on. All of those also can be very easily translated because what we see in India, and I am sure it is similarly seen in Europe as well, is when you are creating a product, it is not only consumed by one language-speaking audience. Maybe the owner speaks a different language, and the vocal and the outlet are speaking a different language. If you are restricting yourself to one language, the adoption becomes a barrier, so that is the barrier that we are here to break.
Florian: Talk to us a little bit about the role of human linguists on the linguistic side in your automated workflow. What are you working with human linguists for and what components are you having them review or update or tweak?
Anuja: There is a human-in-the-loop that is required because it is video-in video-out, so we primarily only focus on the video content as of now. The video can be on any of these platforms, YouTube, Vimeo, or you can locally upload it. We ingest that video, we do a speech-to-text, then we do a text-to-text translation in whatever language that you want to move into and then we do a text-to-speech based on that specific language. This is happening almost in real-time, so for a five-minute video it will take about 30 seconds to get that output in the second language, so it is real-time in that we can produce an output, but this is machine-generated. Certain contextual things are required for a person to pull back and relook into it, so it has to be a linguist who understands this brain and understands the audience. Those are the nuances that still need to be looked at, so a five-minute video takes about a 15-minute review, and then it is good to go from a public standpoint. We usually try and get a professional linguist in place so that the nuances can be taken care of. We understand that people do not have all language capabilities, so we provide this as an additional service on top of our product when you can come and use the product along with the service so that the actual output that you get is 100% ready to publish video which can be distributed as required.
Florian: This is probably more early stage. You do not want to become an agency or get too deep into that component, correct?
Anuja: We definitely do not want to, but what we see in usual SaaS as well is that it comes along with the service because SaaS definitely is the place to be. Multiple SaaS organizations are coming out but for that actual adoption, there is that service arm, which we see usually is required. We still had to make sure that our products are being used to their best capacity. What is happening is that if you want to go out and launch your podcast in Hindi, now if a system can do it, you would still not understand Hindi. You will need somebody there who has that confidence, where you have that credibility, that this is the right thing which has come out. For Dubverse to attain that credibility that whatever our user is getting is 100% correct, nobody needs to check. We just take some time for us to get there, so until then, we see that we need to be able to give you that credibility through a human. That somebody who understands Hindi has looked through the video, made corrections, and now it is good to go. It is more from an adoption standpoint that they can use our product when they have capability issues in terms of languages, so that is how we look at it.
Florian: I keep asking a lot of these startups that we have on the pod that it must be very hard to hire machine learning talent in this super competitive market. What do you think about that? How do you look for people? How do you get them excited about Dubverse and the division?
Anuja: That is one of the biggest challenges, machine learning or otherwise. There is so much more that you want from a founding team at such an early stage than just good talent. There is a risk-taking appetite and to connect to the problem as well, so on and so forth. What has happened is that the problem that we are working on from an Indian context is very interesting, so when we put this out to anybody from an employee to a user to an investor, for that matter, it sparks interest in the first conversation. Wow, how is this possible? That has given us that head start, to be honest with different conversations that we have. With AI and ML people, usually, they are very interested in solving very different problems. If you give them a scale problem then they are like, okay, this is just one of the other problems that I have to scale an application. That is not a very interesting problem to look at, but this problem is interesting, it is very difficult and comes with a lot of Indian contexts as well. Fortunately, one of the AI engineers we have in place was working with Varshul earlier as well, so he has been around before me, to be honest. I jumped into this a little later when it was taking shape, so he has been around before me. That gives us a very strong foundation from an AI standpoint. Going out and hiring a new person, and starting a conversation is brilliant for us because Varshul has been in the startup world for about eight years, so we have that extended network where we are connected to the right set of people. The problem is very interesting, so we are still hiring. It is a very difficult problem that we are still getting our heads around.
Florian: You have been in the startup world for eight years, did that help connect with Kalaari Capital, the VCs that funded your seed round? Or how did you come across them? How was the process there?
Anuja: Investment was an interesting process. We spoke to a lot of people and that made us better in what we are doing, to be honest. A little backstory, when I was interviewing as well, switching jobs, moving one to another, I literally took the interview process like a therapy session because there is somebody very smart sitting across and asking you very relevant questions. That is something that happened in the investment as well because there are very smart people sitting across the table and asking you very genuine questions, so you have to be prepared and think hard about that. We can talk to users. People are ready to pay for this. We can onboard clients. All of that cycle was running, but then the hard questions hit you when you go out for investment, so that made us pretty strong from that standpoint. The investment cycle was brilliant. Kalaari has been great. Kalaari very interestingly reached out to us. Obviously, people know when you are out there, so it worked out and it went very smoothly. We had an introduction with Vani Kola and they also are very bullish on the creator economy. This directly fits into their thesis and we were able to move very quickly after that. A lot of lessons to learn, to be honest. It was a very different conversation that you have when you are out there vulnerable, to put ourselves open in front of everybody, so very interesting lessons came out from that.
Florian: Everybody is talking about this whole metaverse, VR, is this on your mind at all? Or is it not an issue at all?
Anuja: When we get into that world of the metaverse, how we look at it is that is real decentralization, there are no boundaries. Where you are from, what you do, and what your background is. Nobody knows and nobody is interested. They are there in that certain environment that is being created and they just want to be there and we see that this language or this communication is still going to be the only barrier that people face. From that standpoint, being in the language space and doing it through technology, all of this is going to come together very beautifully. Interestingly, Varshul and I were discussing this when we got started and this Meta thing was happening. By the time this starts in terms of adoption and people have started living more in the metaverse, we will have contextual data and AI ready to be deployed in real-time, so if I come, I speak whatever language, but you will only hear what you understand. It will be on the go, so to make that adoption wider, the language value needs to be broken in the metaverse as well.
Florian: Language is the last, if probably the most difficult barrier to break because we can all meet in the metaverse, but if you do not speak the same language, it is going to get tough. I keep arguing that it might be beyond language and, of course, a lot of cultural aspects, et cetera, to this as well, but interesting.
Anuja: People are more open towards it. If you go back and look at the globalized world as well, the amount of movement that we see is the reason why this translation industry exists at the pace that it does. For example, to give you an understanding of what is happening, at least in India, Bollywood is pretty big here and that is where we take inspiration from in the country. In Bollywood, if there was a high-budget movie coming, it was only in Hindi, but in the last two years, what we have seen happening is there is no high-budget movie just in Hindi. It at least launched in four different languages because people are becoming more aware of what it is that they want and everybody has their spending capacity. All of those things are happening, so people are coming out of their shells, of their own culture. They are willing to learn more about what is happening in the panel culture as well, so we see that happening. I am sure that is happening globally as well and that is what I say that after globalization, language is the only barrier that we see in the world.
Florian: Tell us a bit more about your roadmap as far as you can disclose for 2022 and a little bit beyond.
Anuja: Now what we are trying to do is solve primarily the pre-recorded videos wherein if you already have a set of videos, which is in one language, if you have a distribution channel, either it is YouTube or some other platform, or it is more internal, we see a lot of videos being created internally as well, so we are trying to take that multilingual. Our mission at Dubverse is that every video will go multilingual because that is the capacity that we look at with this problem that we have identified and the solution that we are building. What we want to do is that on the creator side you are going and publishing any content you have with just a button that says Dubverse and you can click it and you get all the options in different languages and you can publish it right there on whatever platform it might be. It is for the creators to make it super simple for them to dub videos and basically use it. Interestingly we see a lot of use cases, this is maybe ahead in the future, coming from the consumer side as well. We see a lot of people reaching out to us that there is a very interesting TikTok that they want to show to their parents, but they do not understand the language it is in, so they want to use our platform to translate it just for their consumption and not for global consumption. Let us see when we get to that, but definitely, the idea with us is to break that language barrier. There are multiple ways we can do it. We have started with a certain scenario to test the waters, to make our system robust to be able to do it, and then once we have that system running there is a lot more that we can do.