Reverso CEO Theo Hoffenberg on Building Language Technology Used by Millions

SlatorPod #123 - Reverso CEO Theo Hoffenberg on MT Used by Millions

Theo Hoffenberg, Founder and CEO of Reverso, joins SlatorPod to talk about specializing in AI-based language tools and translation aids used by millions online.

Theo begins with his background in the software industry, where he helped launch tech companies in Europe and how he got into the language technology industry with Reverso. He discusses the evolution of the company, from a downloadable software to an online translation platform with millions of users including language professionals and enterprises.

The CEO talks us through the release of some of their AI-based language tools like Reverso Context, a bilingual dictionary tool based on big data and machine learning algorithms. He shares how they maintained and grew their user base by staying on top of SEO and leveraging the App Store.

Theo talks about acqui-hires and recruiting remote engineering and tech talent. He discusses how they stay ahead in a highly competitive space by deploying locally and online, offering secure SaaS for those in sensitive industries, and translating documents and websites while preserving their layout.

The pod rounds off with Reverso’s roadmap for 2022, including an all-in-one desktop app and an updated mobile app that will have even more learning capabilities. Theo also weighs in on the value of large language models when it comes to memory usage; because, as volumes grow, it becomes more critical to achieve high efficiency.

Transcript

Florian: Tell us about your professional background. How does one start a company like Reverso? Give us the extended elevator pitch for Reverso for those few of us who do not know the company.

Theo: My background is in engineering. I studied at École Polytechnique in Paris and also did some management courses at Stanford. I have always been fascinated by language in general, so this is something that brought me to launch Reverso. My professional experience has always been in software, so I only had a couple of experiences before launching Reverso, but one of them was with Lotus, the American company that I helped launch in Europe.

Florian: What year did you start Reverso? 

Theo: 1997.

Florian: 1997, that is when the internet exploded. Tell us a bit more about the origin story, so how did you start it? What was the original thinking behind it?

Theo: I was a professional in the software industry and I thought that the next step with software would be NLP. At the time nobody used this term, at least nobody in the professional world that was not in research. I thought that it is the next generation that takes care of the content and that is how I decided to go into this field. I knew how to design good quality translation software and I found a very interesting team in Russia with whom we started and that was how all this began. It is thanks to the quality of the team that we had and the design and the know-how in software publishing.

Florian: Was machine translation the first application you put online?

Theo: Yes, it was. First, we did software that you would install on your PC and that could be also customized and integrated into office applications at the time. Then a little bit afterward, we started our first online translation platform which is now reverso.net with 80 million users worldwide. At the time there were probably not even 80 million users on the web. We started to publish on our site, but also in partnership with a lot of big portals like Lotus or Orange in France, for example. Then we also pioneered the translation tool inside companies used by everyone because before that, it was more a tool for translators or translation departments. We created the new business line of having a tool that is used by everyone in the company and the person in charge of I.T. or internal communication can spread out in the whole company and have users communicate between themselves in different languages or communicate with their counterparts in other places, but with some control by the company, which means that they were not looking into the content of languages, but more into the usage and who is using what, et cetera, and it still exists and it is still a flagship product.

Florian: Is that what you would call on-premise, so it is not in the cloud?

Theo: It is both. It is on-premise and SaaS. We always had the two components, on-premise, and SaaS. At the time it was more on-premise and now it is more SaaS, but we have the two capabilities. The advantage of SaaS is it is much easier to deploy more languages, more new features, and for on-premise, it guarantees you more control. You can even have an air gap system if you want if you are in the defense industry.

Florian: Tell us about some of the features that you have and some of the milestones and challenges getting these features live. 

Theo: One of the big milestones is the launch of Reverso Context 10 years ago. It is the first dictionary that is using big data to create dictionaries only from corpus, plus a lot of alignment algorithms. Today it is the most comprehensive context-based dictionary where we have more than 100 language combinations, about 100 million segments on average in each language combination, and we compute the dictionaries based on different criteria but mostly based on the corpus itself. Then we have a lot of cleaning. We eliminate the segments which are too long or too short, the ones that contain special characters that can ruin the alignment, and the ones that contain too many figures and things of this kind. Then we apply an algorithm of alignment that is a mixture of AI and algorithms to make sure that we match word by word, but also send the phrase by phrase. For example, that is the only tool in which you can say something like “je m’en vais” in French and it will allow tell you: I am off, I am going, I am leaving, I am on my way, so all those translations are aligned and there is a lot of processing afterward because sometimes the alignment is not precise. It goes one word too far, one word missing, and we correct those by a lot of different cycles. 

Florian: Is there a human-in-the-loop or expert-in-the-loop from your side or is it more iterative? 

Theo: There is an expert-in-the-loop to analyze the results and improve the cleaning algorithms and alignment algorithms but there is no human-in-the-loop segment by segment or word by word. For example, we use human-in-the-loop to filter out the rude words, so we have a list of rude words which we define manually, but it is never completely accurate. We try to filter it so that people do not get content with rude words if they search for something completely innocent. When we mean rude words, it can be words that simply are indications that content can be arguable. 

Florian: The use case here is mostly for people going to the website or is this something you do custom as well for enterprises?

Theo: This is mostly for the wide public, so it is on the website, on mobile apps. It can be integrated into other apps, but it is mostly standard. We have a customization capability, but it is not so much used because it is so wide already that it fits technical usage. The target use is from school students to translation professionals because a school student can look for something like a table or window and will find translation examples, and how to use it. The translation professional will find very precise idiomatic phrases and will have a lot of content to choose from.

Florian: You added a lot of other features as well. You have context, grammar check, and synonyms, so what are some of the highly in-demand features that you have?

SlatorCon Remote December 2022 | $150

SlatorCon Remote December 2022 | $150

A rich online conference which brings together our research and network of industry leaders.

Register Now

Theo: Reverso Context is our flagship because it is the entry point for many searches. From Reverso Context, you can go to conjugation, you can go to synonyms. Also, if you enter a longer text, you have the result produced by machine translation and Reverso Context is also the place where you create your own phrasebook. People that want to improve their language skills can gather words and expressions when they search for a term on their mobile device. We have a Chrome extension and we have desktop apps so that you can bookmark when you are there. The entry point is Reverso Context for historical users and we also have the entry point from pure machine translation, but we include results from context when it is a very short search. When you search for only a word, it will also show you the results from Reverso Context and we also include synonyms and phrases inside translation, so that when you have a translation, you say, maybe here I can have a different nuance or shade of meaning. You can click on words and get synonyms or you can click on a sentence and get a phrase, so you have alternatives for the translations.

Florian: A lot of people know you and they are returning users, but in terms of acquiring new users, it must be super competitive in terms of SEO and traffic acquisition. How do you do that? Is it because you have been there for a while?

Theo: We are trying to give good answers and be on top of technical SEO which means having our pages respond very fast, having content that is original for each search or query, having a good interlinking between pages, and for the moment it has been quite successful. We want to keep our users and we try to get them to download the apps, whether it is a mobile or desktop app because then it is much easier for people to use and also people are much more engaged. It is clearly the best way to engage people and the user experience for them is much better because once you are in a desktop app, for example, you can interact with any application. There is no advertising and very fast response. It is the same with the mobile app, you can use dictation, and you can save some of your searches offline, so it is a much nicer user experience.

Florian: You have the Mac app and then the app store. Is it a big multiplier for you being present there and being ranked quite high?

Theo: Yes, so we have about 25 million downloads already. We have a rating of 4.7 in both stores. People like it a lot. They are very engaged. People that are using our apps use it on a regular basis, twice a week on average and again, they use it both for keeping track of their searches and learning new words. That is something which is unique that you cannot do on the web at the moment. We keep it only for mobile users and also the desktop app. People are much more engaged because they have a shortcut to use in any app. For example, if you cannot write it in a different way, you will open the app and it will give you rephrase alternatives and this is super useful. 

Florian: It must be very complicated to balance the needs of your customer’s enterprise deployment with building an app that people use regularly. How do you manage this? 

Theo: The technologies are the same and it is about the same as what people do with Zoom and so on. When you use Zoom as a professional, the product has different features. For example, to have a virtual classroom where you can split into groups, et cetera, but this is not something that you would use one on one, but the main features are the same. People look for the translation of a page, the translation of a word, a sentence, or a document and this is something that you can use as a student when you write your essay. This is also something that you can use as a professional marketer when you have done your PowerPoint that you want to spread out. We believe that it is very complimentary because it allows us to invest more in the technology, so the technology is mostly the same and we do not have to create two technologies. We create one technology and then we create products that fit specifically. For example, mobile products are not developing much because it is not the habit of companies to manage the generic mobile apps of their users, but the usage as a web app, on the contrary, is super useful for everyone.

Florian: It seems that you are personally very much involved in the development of the product. How do you find it currently with all these highly funded startups and big tech companies hiring? How do you find the hiring and retaining of engineering and tech talent? 

Theo: Hiring is not easy and so one of the things that we manage to do is to keep people where they are, so we find people that are in locations where you do not have those highly funded startups or GAFAMs and we keep them where they are. Also, the other thing that we can guarantee to new hires is if they want to do something if they have the capabilities, they do not have to have big hierarchies. Even in highly funded startups, there is red tape and we have less red tape. People that know how to make products in our company have a lot of capabilities to express themselves. I want to take advantage of being here to say, for people that have those capabilities that do not want to go into GAFAMs or that do not want to go in companies because even in startups you have sometimes a lot of internal politics and so on, we do not have that and that is one of the benefits. The drawback is that we have less people or less the idea of the ping pong table in the middle. We have a lot of other benefits. We have a very international team with a lot of different locations in Spain, Canada, Russia, Ukraine, Romania, and that is a very positive aspect. We do some gatherings from time to time. People are very happy to meet each other.

Florian: You are a proven business as opposed to a startup that may or may not be able to raise the next funding round. Speaking of which, has this ever been a topic? Have you ever done any acquisitions or acquihires?

Theo: We have done acquisitions and acquihires. Not big ones, but we had three acquisitions that were acquihire and that allowed us also to increase our technical range. It was more for the engineers that we acquired and we are in the process of a new one in this field. We are interested in acquihires. For example, one of the acquihires that we have done is a company that was making a tool to use Netflix and others to learn languages and we took this capability and we put that in our Chrome extension. Now we can use our Chrome extension to watch Netflix and click on subtitles with Reverso translation on top of it and this is something that we got from this company.

Florian: That is a great feature but was it ever a big consideration to take on outside funding? It has been bootstrapped and grown quite well over the past two decades. 

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. Browse new jobs now.

LocJobs.com I Recruit Talent. Find Jobs

Theo: First of all, this is my way to try to develop a business, to make a nice product and sell it, and luckily, we managed to sell it. We managed to get it out on the internet. We were very early, so we did not have customer acquisition costs, so it was word of mouth and we managed to get to the right level in the company, so we acquired a lot of big corporate customers without having a high cost of acquisition. We find people that take us to the next level because we do not want to lose our independence which brings a lot of swiftness, agility, and so on. We do not want to lose that if it is not for something that helps us grow to the next level in order of magnitude of 10 times bigger in a couple of years. Otherwise, if it is just funding and so on, this is not so appealing to us.

Florian: For machine translation, it is extremely competitive. You got everything from Google, DeepL, Amazon, and all these guys coming in, how do you see that space developing the raw MT or slightly custom MT space going forward?

Theo: We are lucky to have a lot of users, both on our website and on corporate and the quality that we provide them is on par with all the other players and we have additional features that others do not have. For example, the capacity to deploy locally or online. We also have secure SaaS for people who are in defense industries or sensitive industries. We also have a training tool that works very well. If you have your own corpus, we know how to train fast and accurately, and also we have a tool that is available to the public, but also for companies which is called Reverso Documents in which you can translate documents and make the revision online. This is something quite unique because it is not like SDL Trados which is dedicated to translators. It can be for anyone that wants to translate a document and it is easy enough so that anyone that wants to translate a document can do. It is post-editing online and you can use the translation memory inside this tool.

Florian: Large language models are now all the rage. I want to hear your thoughts on this. Is this another neural machine translation-sized breakthrough for the translation and localization community or do you feel it is more theoretical? Where do you see this at the moment? 

Theo: There are breakthroughs all the time in our industry, for example, there are models that allow you to grow much faster. For us today, one of the key points on which we focus is to keep the quality and make it faster and more efficient in terms of memory usage because as volumes grow it is important to have high efficiency. We believe the quality currently is good enough for general usage. If you combine all the customization and so on, we have never had complaints about quality. We are always looking, but we do not want to change all the underlying technologies if there is no breakthrough. For example, when there were transformer models, we changed radically and we switched to transformer models three or four years ago, and currently, it is the topic for grammar checkers. For grammar checkers, the usage of AI techniques is the hot topic and it is the combination also of parsing plus this. Even parsing is now using machine learning and AI techniques. 

Florian: In terms of the roadmap, is there anything you can tease here? Anything that you can share with us this year, next year, or anything you are launching?

Theo: First of all, we have a desktop app which is going to be one of our new flagship products. The desktop app will have everything, so it is one product that will give you translation for single words, phrases, and sentences integrated into all your applications, synonyms, phrasebook, plus correction, so everything in one and very nicely integrated. This is something which we invest a lot in. The other thing that we are going to do is also a complete overhaul of our mobile apps to have even more learning capabilities. It is already used a lot for searching and some people use it for learning capabilities, but we believe that anyone wants to grow their vocabulary if it is nice and fun. We make it even more fun and easier to discover new words and memorize them because oftentimes you think you searched for this term but a week after you do not remember it. You find it in Reverso Context and then you want to learn it. It is something that you use, because how do you remember it and how do you make sure that you will use it when you want to be more accurate next time you are going to make a speech?