Translated CEO Marco Trombetti on Time-to-Edit as Proxy for AI Singularity

SlatorPod #143 Translated CEO Marco Trombetti on their Singularity in AI research

In the final SlatorPod episode of 2022, we are joined by Marco Trombetti, cofounder and CEO of Italy-based LSP Translated. Marco joins us two years on from our episode with Translated’s other cofounder, Isabelle Andrieu.

Marco breaks down the LSP’s Singularity in AI research project and defines what singularity means in translation. He walks us through the process and the vast data collection effort behind the research, with “time to edit” emerging as the key performance indicator for machine translation quality.

He talks about whether Translated has reached a plateau of productivity in postediting and the challenges of building a user interface for interacting with natural language. He also gives his take on the importance of owning an adaptive solution for machine translation like ModernMT

Marco shares Translated’s approach to working with some of the world’s biggest companies such as Airbnb. He explains how they see financing as a requirement to reach their goals — which eventually fuelled their decision to secure USD 25m in growth equity.

The CEO shares his thoughts on ChatGPT. He thinks big language models will be the future of search and natural language will become the primary way of interacting.

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

Initiatives such as participating in the Ocean Globe Race 2023, the Imminent research center, and Pi Campus are all central to spreading their values and creating an innovative community, Marco concludes.

Transcript

Florian: Two years ago we had your Co-founder, Isabelle Andrieu on the pod. Tell us a bit about what’s happened since, and just kind of lay the land a bit about Translated, the story and just kind of update people on where you stand at the moment. 

Marco: Translated continued to grow. We were having a lot of fun with different projects and more and more we are realizing that symbiosis between human and machine is the way to go for the future, is what we like to do more and where we’re putting all our efforts for the future.

Florian: So this pod we arranged cause there’s a super interesting piece of research that you guys published just very, very recently and so if I may let me start with a quote from the article that you uploaded to the website, so very interesting. It says, language translation was one of the first challenges taken on by researchers in the domain of artificial intelligence. It remains one of the most complex and difficult problems for a machine to perform at the level of a human. Now, if I understand this correctly, this is part of a research project you named Singularity in AI, and it was part of your recent keynote at the I think AMTA conference in Orlando, right? Where you quantified progress towards singularity in AI through data that highlighted quality improvements in MT. So Marco, tell us more about this singularity research project, scope, key parameters, assumptions, et cetera. 

Marco: Yeah. First, everyone today is talking about general artificial intelligence and every single week you find the news about something new created that is making progress in AI. But, for many years we’ve been working on that and nobody knows really when this is coming. Okay, so we took some of the data that we have, that I’ll tell you in a second, to try to predict the speed in which we’re approaching singularity, in general, not just in the language, but trying to predict thanks to the data coming from our industry. That is one of the industries that adopted AI before anyone else. Okay. Most of the AI model they used today were invented for machine translation, so because we are pioneer as an industry we use this data to try to predict how quickly we’re getting into the singularity. And so this is the study about the speed for the approach and so trying to predict the progress because every single day we know that MT is getting a little better, AI is getting a little better, but how much and how far we are from reaching the goal of having something that is as good as a human.

Florian: Singularity is kind of a tricky concept, right? I mean, I know it in the realm of like Ray Kurzweil and like basically just the machines completely taking over and the world basically being run by AI. How would you define a bit more narrowly singularity in translation? 

Marco: Yeah, so in translation we define singularity as when a professional human translator will take less time in editing a machine translation than editing another translation done by another professional translator. When we reach that point, where for me is more convenient to use the output of the machine rather than the output of the colleague in order to review, to create a final translation, perfect, then this is where we say singularity. Then the more complex formulation that we use internally at Translated to be able to measure the progress and not trick ourself so that we have a solution, we say that is a translation whose quality is… basically, it produces a quality with less than five minor error per thousand words, which is the equivalent of a translation done by pro and a reviewer, actually. That is delivered in less than 500 milliseconds and the cost less than 1000 times less than human translation. So you really need to achieve a quality level, measure it in minor error per thousand word standard metrics. It needs to be fast. 500 milliseconds. It needs to cost 1000 times less. Why? Even today we could create a much, much larger, more than what we use, it would cost 10,000 times more than, no, 10 times more than human translation. It would be very good, and in some use cases will be better than human, but is inapplicable. Who will pay for a machine that costs 10 times a human. So the easy way, as I said, is that it would be more convenient for me to edit a machine translation than a human, simple and then the complex one is less than 5 error per thousand words, 500 milliseconds, cost less than 10,000, 1000 time less than human.

Florian: Would the 1000 times less, would that apply pretty much to all machine translation or? I don’t have to figures in my head. If it’s like a custom model that somebody had to kind of build a bit, would that still fall below that or?

Marco: Today when you have like machine translation at tens of dollars per megabyte and human translation at 10, 20 cents per word, if you do the math, that is about a thousand time, a thousand, 10,000 time. So keeping that ratio, if you have the ratio, really nobody will care about the cost. People will be able to translate a thousand times more given the budget that they have. 

Florian: Now, of course, you guys have run this piece of research on an enormous amount of data and experience. Can you just tell us more about like what… obviously, with Translated and MateCAT, you have so much data and so much kind of live data, real live data, not theoretical academic data, right?

Marco: Yeah, so just to connect the dots. We started doing machine translation post-editing in 2002. Okay. With rule-based system as a project sponsored by the EU with SYSTRAN, rule-based system, and then we switch to statistical machine translation, did other research, and in 2010 we released it. We started a project called MateCAT. That was the first web-based CAT tool that was integrating adaptive machine translation, so statistical, it was at the time, not neural, but it was learning from user correction. Okay. By creating MateCAT basically for now 12, 13 years, we have been collecting what we were suggesting to a human in terms of machine translation and what the user, the final translator was really delivering, and the amount of time it took to the human, to the translator to do that. Both the first pass, but also every single edit, the time was increased by the amount of time spent on the segment. So we collected what is called a Time to Edit and so we have about a hundred thousand translators that participated over 12 years, and really the data is… 90% of the data is generated by 10,000 people, but still this is a very large amount of people, very diverse. They represent many different humans, many different contexts, many different domains, and so we think this is the most representative sample ever created of the industry because it’s very diverse and it’s not only customer of Translated because we have provided MateCAT for free to many professional translators that are using it every day. So they use it and basically they share with us the corrections, so we give them the technology, they tell us basically where the machine was wrong, and we improve the adaptive model. That’s the principle. By doing that, we collected 12 years of Time to Edit data so that we could predict every single day what is the amount of edit a translator needs and if you will look at the research, basically we went from 4.55 seconds per word in 2007 and it’s starting getting down, down, down. Now we are about at two seconds per word, so really people just in the last seven, eight years already doubled the amount of words that are leaving in per hour. 

Florian: People need to look at the chart. I’m just looking at it here. Yeah. You see how it’s going down like from three to two, and then obviously you’re kind of pulling it further down towards the one, right. Very interesting chart. A little bit of ups and downs, but the down trend is definitely perceptible. 

Marco: In fact, that is the feeling we have every single day. You have the… The change is so small that every single day maybe you don’t perceive it, but when you look at an… time then you really see this small progress and when you sum up the progress in 10 years, that is impressive. And I think that this is the first time ever in the field of artificial intelligence that someone did a prediction of the speed to singularity, so Ray Kurzweil, you mentioned, used a very different approach. He was saying, look, in order to achieve general artificial intelligence, you will need this level of capacity of computation, this level of data, this is level of learning outcome and really what we’re saying here, okay, we don’t know what is needed, this is what is happening. This is the speed in which we’re approaching singularity. 

Florian: It does speak to the fact that we, as you said, we’re as an industry a pioneer in adopting this technology. I mean, it kind of became only very obvious at least to me over the past couple of years as a lot of other use cases and applications kind of start to come online. It’s like, well, we’ve kind of been doing this for a long time already, right? Yeah. You mentioned Time to Edit as kind of the key metric here, and can you just unpack that a bit more and also maybe distinguish it from other things like edit distance or even the BLEU metric that some people hate to love, love to hate.

Marco: BLEU score is an edit distance and so what happened at the beginning of machine translation is that you’re not able to measure the time that people are spending and you’re doing research in labs. So what people initially did is that they say, okay, I know for this content, this sentence, I know what the correct translation is. Let’s call it a reference and so I ask the machine translation to translate it, and then I measure what is the difference in terms of characters based on a reference. So the more the translation look different in terms of characters, the more bad the translation is, so the closer it is, the better it is. So these work great, edit distance works great at the beginning when the translation was so bad that really the output was so different that you could measure by character’s difference or word difference. You could measure really the difference. What’s happening, that now translation are so good the BLEU score really does not have a resolution good enough to be able to distinguish between the further improvements that we do. If we do a new model and we try to predict if it’s better by BLEU, BLEU sometimes will say it’s worse, it’s bad. The only thing we can use today is either Time to Edit, so really you measure if this is helping translators, if you want, so the cognitive effort to go to perfect. Or you do something called A/B test, so you provide two translations, the old one and the new one, and you ask people to say which one is better? So BLEU is slowly becoming not a good metric, and A/B test with human evaluations and even complex human evaluation is becoming better. And for some use cases where you really are in production scenario, well, Time to Edit is the final metric, is the metric because it is the real measure of the cognitive effort. And to give an example, we may have two sentences that do not require any kind of edit. They’re good, they are an acceptable translation, but one will require the translator a lot of time to be approved because it’s awkward in the way it’s written, and it’s a good translation, but it’s not fluid. And the other one is fluid, easy to understand, easy to read. Time to Edit is able to measure the difference. No other metric will ever be. So, and also sometimes you need to change that just one character and it takes a lot of cognitive effort to really understand that sentence and change. The change is not a measure of how wrong the translation is. It was just an approximation that we used in the early times. Our industry is based on a cost per word, so if you think at the end what you want to minimize is the time that it takes to do translation. 

Florian: That makes a lot of sense and, of course, also the framework you’re building here eventually will help move towards like an hourly based or just a different type of pricing model as well for the entire industry, eventually.

Marco: Potentially. At the end, we still use word-based level. It’s just we discount words in different ways based on the effective time. Something the translator always did is that this time, for example, if translator will spend 50% less time than with no MT, then we basically pay them 25% more per hour, so we don’t pay them 50% less words. Maybe we discount only 25, so really they earn 25 more percent per hour, which is a good way of splitting the benefit and making sure that the best translator will join us to do the work. 

Florian: Now, would you agree that we kind of are starting to reach that, what I guess Gertner would call the plateau of productivity kind of in post-editing now? Or like how far do you… I mean, in your chart there’s still a bit of ways to go down, right? But like, do you feel that we’re at that plateau and like are there any easy, quick wins left in the supply chain for post-editing or not really because we are now six years into this. 

Marco: Honestly, so I have seen many people talking about the plateau, but we don’t see it. We see that we still have the room for doubling the productivity in the next five years or so, and we are at two seconds per word, and we can go to one where we think the singularity will come. One, why 1%? 1% is that if I take a perfect translation that does not require any kind of edit and I ask a translator to approve it. He’s taking about one second per word, 10 seconds for a sentence. So when we reach that point, then we reach what we defined before the singularity and so, but still we can double the productivity in the next year, and every year we still see an improvement. So I think there is a lot of work to do, and then we reach the singularity and then everything changes because the way… I think a lot more content will be translated automatically and a lot of more… and so I think that we will go from human-first and then eventually MT to MT everything and then asynchronously translators that will help the machine to improve. Probably bigger industry, more work translated, but different approach in the work.

Florian: It will be so fascinating to see once, yeah, the cost comes down to a level where it becomes like the natural thing to do. Of course, you have this piece of content in 30 languages, why not? Right? Why wouldn’t you? And then you’ve got to differentiate yourself with some added layer of language quality or whatever it is on top of it, right? One problem I guess that some people have, many people have is still kind of the user interface and the kind of the interactivity problem when it comes to interacting with machine translation. What are your thoughts on this and what are you guys building? What do you see? 

Marco: It’s a very good question. It’s the hardest problem we have. So building user experience, interaction design in many other fields is relatively easy because it’s new stuff and you have to train the users basically to interact with something. If you think about a car, we learn how to use the wheel, the brake, acceleration and the pilots or a piano player and also user interface, like search or other tools they use on your computer, the mouse, the keyboard, so these are mechanical things outside us that we need to interact with. But here we’re discussing about the user interface for interacting with natural language, the most human thing we have. So, there was a lot of progress and I think there was different approaches. The company named Lilt created a very nice interactive way to basically, propose machine translation to translators and so they get suggestions and then they write, predicts, and then you basically complete or accept. A lot of interruptions for the human, but also you feel very in control, so it was easy to adopt. We use other approaches like reusing the interaction that there was with translation memories and so the machine translation comes as a TM suggestion and so really you just need to edit what is wrong and then you go in that way. But this was relatively easy and it was done during the last 10 years. Now think about this. Two seconds per word means 20 seconds per sentence, so now we need to be able to allow people to change the style of a sentence, many other things that are more complex and we have to interrupt them to provide the suggestions and that input. But we only have 20 seconds and 20 seconds they have to understand what the source sentence is, they have to read the suggestions, they have to approve the suggestions, and then we need to provide an extra suggestion, incremental. It’s very difficult because every single thing we have there is slowing down people more than what we help. So we are in a point where improving the user interaction in machine translation is very complex, but there is so much potential in now using very large language models to help people to drive more fluidly and better, and we need to invent solutions, so it’s hard, but kind of we need to.

Florian: The thing that powers it is, of course, machine translation and Translated now owns ModernMT, and so why do you think it’s important for a large language service and tech provider like yourself to kind of own that part of the stack. 

Marco: To be precise, so we were the founders of ModernMT anyway, so we always owned a big stake of ModernMT and then we added our research partner also in the company. So we acquired a hundred percent, but we always considered ModernMT as our baby. Was always an important asset. Now it’s becoming more and more important every day because we think we can only reach a singularity to this symbiosis between humans and machines and so standard machine translation solution are not designed for professional use. They’re not designed for our industry, so if you think Google Translate, beautiful translator, but this is designed for general public. So you need a technology that you can control, you can change, so we needed to have context adaptation. We need to have user adaptation, so as the translator fixes the errors, you want to improve the model in real time. You don’t want the translator to correct again and again the same errors in the document. That needs to happen now. You cannot wait six months for re-training. So, you need to create much larger models in some use cases because you can pay more. While you are translating, we can use very large models to just improve the part a little bit because cost of MT while professionally translated is irrelevant and also we also needed very low-cost machine translation models that the market does not offer, because when Airbnb wants to translate 700 million reviews, you cannot use what is in the market there. So, really you need to own it if you are serious about translation and really you want to reach the singularity. And human and machine, we can no longer improve the quality of MT using data that is in the web. We’ve reached the plateau there. So we need to own it, we need to get the user feedback, improve the models and work with the best translators in the world to improve the MT and so really this is something that we couldn’t find from the markets. I mean, the people were great models, but interested in other problems.

Florian: Let’s talk a bit about the business side of Translated. So just tell us some, like what are some of the key client segments? And I understand you’re working with some of the world’s biggest companies like Airbnb, Google, Expedia, like how do you serve these big enterprise accounts? And just give us a bit more color to that.

Marco: I think Translated, what it does is always this symbiosis, so I think the best example… and some of the largest tech companies work with us, okay, on these things, and some of the top AI company in the world also that are doing great things these days are our customer. But the approach is that we come with a solution, translation… where we provide human translation with great, the best translators in the world and while they’re working, really we are training an AI model for the customer, so think about Airbnb. First year we take their production, switch it completely to this new model, single vendor model, so we provide a platform, they connect the content. The human translators are translating. While they’re translating they’re receiving feedback from ModernMT. They fixed the errors at the end of year number one Airbnb had a model that was outperforming Google Translate nine times out of 10. It was perfectly trained on their own domain without an extra cost because they were simply doing the translation. So they take that content and because it was so good, that is something that I think was the first time ever done in the industry, they made translation the default. So they pre-translated every single piece of content they had. So all the reviews, 700 million in 62 languages, all the listings, 7 million listings in 62 languages, all the chat between the users, and the global customer support. Massive amount of content they indexed and they made it the default. If you go on Airbnb, you don’t see a button now with translate this page, you see the translation by default and you click the button to see the original. At this scale, I think it was the first time ever, and this was possible because of the symbiosis between human and machine. Humans are translating now 1% of the content of Airbnb. But actually big amount of volume and one of the largest contract in the industry, probably the largest, but still to that they were able to translate something that was impossible to translate, would cost billions to translate from humans. 300 man years we estimated, and now, so they’re pre translating everything. So this is the way we got customers, is this symbiosis of humans and machine, humans train, adapt, train model that then is used to translate the…

Florian: Now, about a year ago, I think Translated secured like a major growth equity investment from an investment firm called Ardian, like what did they see in the industry? What did they… what attracted them to invest in Translated, but of course, in the industry also at large, they must see this as a positive case, of course. 

Marco: I think that they saw also what we saw first. We don’t celebrate financing round because we consider financing round as the compromise that is required to achieve goal. It’s not a measure of success. The measure of success that Ardian, which is the fourth largest fund in the world, number one in Europe, is joining us on the mission to allow everyone to understand. So we love the idea that there is someone else that shared the vision and really the vision is that translation will play a much more important role in the future. As we are approaching singularity, all of us would like our content that we write in social media to be available to all the world and that will happen because my content will be translated and be understood by everyone. Because messaging, communication everything will have a much more profound use of translation and so we think that this industry is going to grow significantly and will be a lot of changes also, and really it’s a big opportunity. And even recently, one of our competitors, have you seen DeepL? They made a round of investments and they are in the market worth 250 million and I think they got something around a billion dollar evaluation. So they got an evaluation of a single company, which is four times the size of the machine translation market. So either all investors are wrong or there is a beautiful future for the translation industry in the future. 

Florian: Now over the past week we all, of course, watched ChatGPT come online and played around. What are your thoughts on this like, so far playful or like massive change or?

Marco: Every single month, it get more useful. From playful to useful every single month. Okay, and so the GPT-3 model has been there since a while, and it’s always improving. Now, this is really the, what is called the struck model that has been retrained with 33,000 sentences with requests and generation done by humans, and they’ve been fine-tuned on a chat to answer, and in fact, if you use it, you see certain patterns. Okay. I’m not sure I can understand, I can answer this perfectly, but I think the day of a certain structure is learning by those examples. What I think is wonderful. I think big language models will be the future of search, so I think that Google should wake up because in the future, I think the big language models will replace what we have in search. I don’t know when, but this is going to happen, and also I think we should be extremely proud in our industry of that result because GTP stands for generative pre-trained transformer. Transformer was invented for machine translation. It’s the technology behind ModernMT, so ChatGPT is a 1.3 billion parameters model. Reduction, distilled version of the market. ModernMT is 16 billion parameters, so we’re using exactly the same technology and now what we have been using for many years, not just us, I mean even the full industry, is becoming the state-of-the-art of general artificial intelligence. That is a great news, and again, show that translation is a pioneer in AI. And second good news is that natural language is becoming the interface, and so if companies like OpenAI succeed in replacing Google for search and natural language becomes the primary way of interacting, well, I think our industry has got a lot of opportunities in the future. So both from an opportunity standpoint and also proudness, ego. I think ChatGPT is a great thing.

Florian: I’m looking forward to seeing what kind of businesses people are building on top of this now. I think it’s still a little early, and I’m sure they’re going to release it in different iterations, but I’m seeing like some very initial things on Twitter that people are building, like little apps and stuff on top that could be useful.

Marco: Think about the trends. Advertising is going down, okay. Subscriptions are going up, so if you want to guess, people will buy five Euro a month, a new tool that will answer solutions for them. Okay, so it’s relatively easy to predict that will be not something advertising-based, it will be subscription-based. And second, today people that do advertisement will want to drive their business in other ways and so if advertising is less effective and crowded, why not building an ecosystem of APIs around very large language models. And so next time you say, Hey, I would like to buy this iPhone, and I would like that to be delivered to my house, and so maybe Apple will not create a beautiful website only, but we’ll create a set of APIs for language models for them to interact and so customers… So big producers will and wherever produces website now will basically create a new layer of interaction with those. I’ve seen also work, preliminary working interacting with websites directly, so even if you don’t build the API for your websites, well, you can use a generative models in order to navigate a website. So the first one will create an API. The others, maybe the model will be able to browse and click and buy for you. One of the company that… Translated also has a venture fund called Pi Campus and one of the company we invested in, Wanderio, was doing this four years ago. Was really buying flight ticket and so you basically were telling what you want, like Uber, you were clicking to buy and then there was a bot that was going to the Lufthansa website buying the ticket for you and sending you the boarding pass, so all these things were possible four years ago. I think now it’s much easier, so I do see how this replace surge and the new ecosystem that is coming and it’s just a guess, an educated guess, based on what’s happening, but I think it’s plausible.

Florian: For search it’s interesting. I mean, maybe not for like deep search or whatever, when you really need an authoritative kind of source, but like, for just general quick, easy questions, like, why not? I mean, like, I guess 80% of my Google searches would be, kind of could be addressable by something like ChatGPT. 

Marco: Yeah. You mentioned something. Source of truth. You need a reference. But you know what, the biggest difference between the standard GPT and ChatGPT is exactly that they added confidence into the model. Now the model is able, when it’s not sure, it’s able to say, look, I cannot answer this. Okay, and it’s a very light form of confidence, it’s not valid. But I don’t think it’s hard to be able for the model to explain how he came up to a certain conclusion and I think that summarization of the reference may be more efficient than me reading 10 papers, 10 references, and coming to the conclusion. Maybe GPT will be able to summarize the why of a certain thing in a way that is faster for me to understand, so I think trust and reference is something that can be solved soon. 

Florian: Fascinating. I mean, we could just go down the rabbit hole here, but let me ask you another question. You mentioned that the company that’s buying tickets online was part of your Pi Campus, so you run this one, you run Translated, you do the Ocean Race 2023 and the Imminent Research Center. Tell us more about some of these initiatives and how do you find time to do all of them? 

Marco: Well, I find the time, I don’t know. I have to think about this one, but I think these whole things has got something in common and it looks like very, very different things, but they all share one vision, is that… So we strong… Even if we do technology, we do AI, we strongly believe in humans. So really we are all these activities because the boat, for example, an around the world regatta to share our values around the world. 50th anniversary of one of the most adventurous regatta. I have never been on a sail boat before, so it’s not my passion sailing. But by doing that, when we do innovation, we need to get into contact with customers, partners, employees, people that are brave. If you want to do innovation, you want a certain kind of people. The boat was the biggest and most successful recruiting platform we ever had. In San Francisco, we started saying, okay, we’re going around the world, we need 20 non-professional to join. We advertise it in our industry. 400 people in our industry are quiet, 400. 270 people, many of them, I think that they’re listening to us and they say, Hey, brave people, joined it and started in the training for the Ocean Globe Race. So some of them came once and then they say, okay, this is too much. Some people came again and I know for sure and I will not tell the name until the end, that we have some people in industry that are going on a leg of the race. Okay, and some other one decided that he’s organizing the parties between the legs of the regatta, but I think it was a great way to find great people. People that believe in the future, that are willing to overcome problems for a better future and those kind of people is the one that you need in order to plan and create a wonderful future. So the investment fund, same thing, 60 investments. We do invest in people with great ideas and we took 5 million from Translated, we created that fund and then it became an evergreen. Some of the company passed at a billion dollar evaluation. Some exits, and so that created a new object that where there’s about 6,000 people now that work in companies that we have funded. So, it’s a great way to spread our values and create a community around those values.

Florian: Yeah, and you’re still at the helm of the language services and technology providers despite all these different initiatives. That says a lot about also how kind of fascinating and interesting this piece of business still remains generally, and I guess to you personally.

Marco: My passion is language, so all these other things are instruments to achieves the goals, looking for the singularity in language translation, allowing everyone to understand and be understood in their own language. I don’t think there is something more impactful that I can work on and sometimes we joke here in the company, we say Elon Musk is working on the wrong problem because they’re trying to go to Mars thinking that making life multiplanetary is the most important problem. No, I think that what we’re working on in this industry is the most important problem. If we allow people to understand each other, they can cooperate and then they can design that future. So we are the tool for the corporation and creating those things. No climate change if we don’t understand each other. No inter-planetary humanity if we don’t understand each other, so I think this is a wonderful place to work. I love what I do and I don’t get distracted with the other things.