Where DeepL Beats ChatGPT in Machine Translation with Graham Neubig

SlatorPod #175 - Graham Neubig on LLMs and Machine Translation

In this week’s SlatorPod, we are joined by Graham Neubig, Associate Professor of Computer Science at Carnegie Mellon University, to discuss his research on multilingual natural language processing (NLP) and machine translation (MT).

Graham discusses the research at NeuLab, where they focus on various areas of NLP, including incorporating broad knowledge bases into NLP models and code generation.

Graham expands on his Zeno GPT-MT Report comparing large language models (LLMs) with special-purpose machine translation models like Google Translate, Microsoft Translate, and DeepL. He revealed that GPT-4 was competitive from English to other languages, but struggled with very long sentences.

When it comes to cost comparison, Graham highlights that GPT-3.5 Turbo (the model behind the free version of ChatGPT) is significantly cheaper than Google Translate and Microsoft Translator, but GPT-4 (available via OpenAI’s subscription) is more expensive.

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

Graham predicts that companies will likely move towards using general-purpose LLMs and fine-tuning them for specific tasks like translation. The discussion also covers the recent flurry of speech-to-speech machine translation system releases.

Graham talks about his startup, Inspired Cognition, which aims to provide tools for building and improving AI systems, particularly in text and code generation. Graham concludes the pod with advice for new graduates in the NLP field and his plans for Zeno and the Zeno report.


Florian: The Language Technologies Institute at the Carnegie Mellon University, it’s one of the leading global hubs for MT research, so what’s the backstory there? Why this particular university?

Graham: Actually, there’s a much longer backstory from before I was affiliated there. The Language Technologies Institute actually started as the Center for Machine Translation back, I think, in the 1980s and since then, there’s always been a strong tradition of having research on machine translation. It then branched out to various areas of language technology and rebranded into the Language Technologies Institute. So now there’s one or a few people, faculty members working on machine translation, and of course, quite a few students as well.

Florian: I think there is also a few spin-offs and hires into the industry, of course, on the machine translation side. Now, what brought you to the language technology/machine translation side of things? I saw you spent a lot of time or some time in Japan. Has that triggered your interest in machine translation?

Graham: Yeah, so my interest in natural language processing actually came from when I studied abroad in Japan during my undergraduate days. And before that, I wanted to do music processing, but I started studying a language and became fascinated with it. And of course, because I was studying another language, translation seemed like an obvious direction to be working in.

Florian: Very interesting. Music processing, like, just a quick detour. What’s music processing like in the tech concept?

Graham: All kinds of different things. I liked electronic music, so ways to generate electronic music or process kind of existing instruments into different interesting sounds and things like that. But I diverged from that a long time ago, so please don’t ask me about the technical details of how that works.

Florian: I was going to ask you, though, where we stand with AI-generated music because I haven’t heard much about that recently. I’m sure it’s there, but it doesn’t have that same hype as we have with ChatGPT.

Graham: Yeah, I think it might be for a similar reason that we don’t hear a lot about AI-generated novels or other things like this. I think in the creative space, you definitely can make creative things, but there’s also kind of a human touch and also having a human personality behind things. So people might still be favoring that, but I don’t know. It’s a good question.

Florian: I mean, the videos are atrocious still. I’m playing around with some of these AI platforms and trying to do, like, text to video, and it’s like there’s no point at this point. I mean, I think all the stuff I see on Twitter is probably they prompted it for a couple of days to get something useful. Anyway, let’s get back to machine translation or NLP. What are some of the main themes that you guys at NeuLab have explored over the past maybe one or two years? What would you consider some of the more interesting papers published?

Graham: I’m working on a lot of different things. We’re working on machine translation, of course. We’re also working a lot on models to incorporate broad knowledge bases into natural language processing, usually through retrieving knowledge from various sources. I’m also working a lot on code generation. These are from kind of the actual methodological perspectives. But also in the past two to three years, I’ve really focused a lot on evaluation methods, and the reason why is because while I’m a system builder at heart, I kind of feel like it used to be the case where it was very clear what the problems with our systems were. But now it’s becoming less and less clear what the problems with our systems are, where they’re failing, where we can expect them to fail. So because of that, I feel like evaluation is almost becoming as difficult as building an initial system yourself, which is why I’ve been researching it recently.

Florian: Now, how big of a deal was the launch of ChatGPT? I think it was October, November, and now there’s other LLMs. I just got access to Bard today because I’m in Europe, so it took them like, a few weeks to get around the risk. So how big of a deal was that to the established NLP world? And do you think was there an actual disruption to some subfields of NLP or an expert, like you would you have seen this coming and kind of consider it more of an incremental step?

Graham: I think there was definitely a reckoning with respect to what we should be working on. And in fact, we had a summit on large language models at Carnegie Mellon earlier this year, I think in March or April or something like that. And part of the reason why we did that was because it was like, okay, what are the things we should be working on? And in my own lab, I took a poll about what are the things that are more important to be working on now, and what are the things that are less important to be working on now? I like to be positive and not be doom and gloom. There’s still plenty of work to be done. But yeah, another thing that is hugely different, I think a lot of the NLP experts kind of saw the large language models getting better and better. I think none of us expected the huge response that would happen after ChatGPT happened. So now I talk to people all the time from everywhere. And like, my taxi driver in Rwanda when I went there for ICLR had used Chat GPT. That is not something that I would have ever expected a year ago, I guess.

Florian: I went back as well and we reported on, I think it was GPT-2 in 2020, and it barely registered with me, and I had an OpenAI account, but I didn’t really use it. It really needed that interface, apparently, to make people appreciate the thing behind it. And it’s useful. I’m using it daily, not every hour, but probably every day now for certain tasks. Now, you mentioned knowledge systems and context, or you mentioned knowledge system. I want to transition to context machine translation and context. So in human translation, I guess we bring almost everything we’ve ever learned in life to the task of translation as kind of the broadest possible context. So how does machine translation handle this and how do LLMs maybe change this? Just a quick aside. You co-authored a paper titled “When Does Translation Require Context? A Data-driven, Multilingual Exploration” which won best resource paper at ACL2023. So I’m sure you know a thing or two about that particular topic. So yeah, context, LLMs and machine translation.

Graham: You talked about the broader context of everything we’ve learned in life, which is interesting. That particular paper was focused more on immediate context in the past from a document. I think one of the interesting difficulties in handling context is that actually, to translate most individual words, you don’t need context beyond the current sentence. And for many words you don’t even need context beyond the current word. It’s an unambiguous translation, like the name of a common noun might only have one reasonable translation. So there’s a difficulty in actually going in and evaluating machine translation models because let’s say you come up with a model that’s much, much better at using long term context that might only benefit 1% of the words in the output. So if we’re using automatic metrics or even human metrics, you might barely notice the difference between a model that’s much, much better at using context. And so what that paper we attempted to do is we attempted to categorize the places where context is necessary. And these include things like formality, getting the formality or register of the output correct, because if you look at the register of the rest of the document, you can kind of rely on that. Other things include pronouns. So the way we use pronouns is different between different languages. So, for example, we might use I, You or it or something like that in English, and that needs to be gendered in the other language. Another thing is Ellipsis, where people drop something in English, but it needs to be recovered in the other language to be grammatical or vice versa. So the paper has more categories like this, but basically what we did is we tried to automatically uncover these categories and discover where give us as researchers, better ideas of where context is actually useful. And there were some surprising findings in there as well.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms.

SlatorPod – News, Analysis, Guests

Florian: Context is that still now with the special purpose models limited to mostly the document that it has, or can it be expanded? Because I remember like two, three years ago, four years ago, document-level machine translation was like a big thing and I don’t know where we stand with that. Is it mostly the document? And do LLMs maybe change this because they can take in they have much, much bigger context?

Graham: This is a really interesting question and I can answer it in a couple of parts. The Google Translate online interface and API, at least when we used it, was actually not even using context beyond the sentence. And I was very surprised by this but we verified it by translating sentences as a whole document and then sending them again as individual sentences, and the result didn’t change at all. And so this was very surprising to me because it seems like we all know context is important, so why wouldn’t they be using it? I can’t guarantee that this will happen every single time you use Google in every way, but that’s at least what we found. In contrast, something like DeepL was using context well. And I think that was actually it gave a major boost when we didn’t see a major boost in using Google when using context. There definitely are some engines that I think also allow you to add additional things. Like, for example, what is the formality of the output that we would expect? I don’t know a lot of them off the top of my head, I could go back and look around for them. But I think one of the exciting things about large language models is if it’s something that’s important to you, is you can actually just tell it what you want, right? You can say, please translate this formally, please translate this informally. Please make sure that you translate this term into this term, and it should do it for you. Or theoretically, it could do it for you. And who knows if it will or not, that’s a separate problem. But yeah, you could theoretically give it directions about what kinds of translations you want just in text, and it should follow them.

Florian: I personally, did probably a slightly ridiculous experiment, but I kept saying translate this, like I’m an eleven-year-old and things like that. And it was okay. It simplified it down to five-year-old. It became really simple. So, yeah, you’re right. Basically, the prompt would be setting the context. You recently released what I said at the beginning, a really super interesting report called the Zeno GPT-MT Report. I think it’s available on GitHub. And the key question you were trying to answer was, can we now use the GPT models to handle our translation tasks or should we still be using special-purpose translation models? Now again, this is the billion-dollar question in the 25+ billion language services industry. So how did you evaluate the performance and what were some of the key findings from that?

Graham: The way we evaluated the performance was I’m actually building a tool called Zeno, which or helping build a tool called Zeno. It’s an open-source tool that allows you to very easily visualize and explore results, kind of subsegment them along different axes, uncover interesting trends and the accuracy of models. So there’s a bunch of details into how this works, but basically, we take kind of state-of-the-art machine translation evaluation metrics and then we try to find on what types of data are one model better, on what types of data are one model worse. What is the overall performance? Is there a more nuanced story there? And so I went through and uploaded outputs from various GPT models and also outputs from a few special-purpose translation models, namely Microsoft Translate and Google Translate. I also have some outputs from DeepL, but DeepL didn’t support all of the languages that we had in the analysis, so I just haven’t looked at it very carefully. So then after I did that, I kind of subsegmented the data along different axes and found some findings. I mean, I could go through them one by one, but basically, to give the gist, a translation into English, GPT-4 was very competitive with the special purpose models. It was better in many cases according to the metrics that we measured translation out of English into other languages, GPT-4 was a bit worse in many cases, but for the very high resource languages like French or German, it was pretty competitive as well. And it also seems that the GPT models are relatively good at a number of things that require context, like the things I talked about before, like resolving pronouns, other things like this. They’re relatively bad at, for example, handling very long sentences. And when I looked into that a little bit more carefully, they would do things like drop a little bit of content here and there where the special purpose MT models seemed to at least be reasonably faithful. And then non-literal translations. The GPT models were better. And specifically when I say the GPT models, GPT-4 is much better than all of the other GPT models.

Florian: This is super interesting. I mean, when you’re saying that if it’s a long sentence, it starts dropping a little bit here and there in the industry, people would be like, no, you can’t drop anything. This is critical.

Graham: I think the hard part is it drops it in a very natural way. Even in post-editing, you’re in danger of missing the fact that this wasn’t in there. So I think that’s something you should think about carefully. And maybe we need tools to help post-editors out with this. I honestly don’t know the state-of-the-art in the translation industry for how people are handling this, but I think that’s definitely a danger. Some other things I noticed is the GPT models tend to be a little bit less robust and they tend to do weird things every once in a while. It’s not very often, but every once in a while. And just to give a few examples, this was a very interesting example that I saw, which is in translation from English to Japanese, once every thousand outputs or something like this, it would output not only the Japanese, but it would also output transliterated Japanese after the Japanese. And the reason why it did this is because GPT is relying on all of this data that appeared on the Internet to be kind of like natural translation pairs. And of course, some of those natural translation pairs are from language learning textbooks where people who are English speakers are trying to learn Japanese. And there it’s very common to have English, Japanese and then Romaji or translated Japanese. And so I saw this a few times in the data set.

Florian: It’s like in Chinese it’s Pinyin, right?

Graham: The implication of this is that if you’re post-editing, it’s probably not a huge problem. You just have the translator lop off the crazy stuff or go in and fix things. But if you’re actually deploying it without any post-edit checks, you should think a little bit seriously about how you would catch those potentially very bad outputs. So I think that’s another thing you should be thinking about as well.

Florian: Is anybody thinking about putting a layer on top of these? I mean, if they’re kind of outperforming especially setting to English, but you have this occasionally really weird things kind of creeping in, maybe you need to put a layer on top if you’re going to go for yeah.

Graham: That’s something that I’ve definitely thought about. And some of the evaluation metrics that we’re developing and we used in the analysis here are quality estimation metrics and quality estimation metrics basically estimate the quality, how good the output is, and I think in this particular case they would judge that output as being very poor.

Florian: Now, one thing you also measured was costs and that kind of blew my mind. I’m not sure if you’ve revisited this because basically, I think, according to the report, the latest I checked today, you said that GPT-3 Turbo was like twelve times cheaper than Google Translate and like 25 times cheaper than DeepL. How is this possible?

Graham: There’s a caveat, which is that GPT-4 is actually much more expensive than Google Translate and Microsoft Translator. So I want to tell both sides of the story there. I think there’s a couple of reasons why I think ChatGPT, GPT-3.5 Turbo is the model behind ChatGPT and ChatGPT as you know, has taken, taken over the world, basically like chat.openai.com is now one of the top, 20 websites in the world and they’re serving lots and lots of chat. So I think there, there’s basically an economy of scale where they’re serving so much throughput that they can afford to do a variety of things. This gets a little bit technical, but for example, if you have lots of inputs coming at the same time, you can process them all simultaneously and that’s much faster than processing one at a time. So it’s possible that they have the infrastructure behind that to make things faster. Another thing is I don’t know the size of the models that people are using or the complexity of the models, but I think ChatGPT is probably they’re using a smaller, more efficient model to serve it so they can do it cheaply. GPT-4, on the other hand, is a huge model and it’s certainly bigger than the models that people are using in Google Translate or Microsoft Translate. And because of that, on the other hand, that’s more expensive. And there’s different ways that you can provide prompts to these models and tell them what to do. One way is you just say, please translate this from English to Japanese and you don’t provide any examples. And in that case, that’s called zero-shot translation. For zero-shot translation, ChatGPT is cheaper, or GPT-3.5 Turbo, the model behind ChatGPT is cheaper and GPT-4 is more expensive, four times more expensive than Google Translate. But another way you can do it is you can provide examples. So you can provide one example, which is like, you provide one Japanese sentence, one English sentence, and that’s called one-shot translation. Or you could provide five sentence pairs and that’s called five-shot translation. And if you start getting up to five-shot translation, then ChatGPT will be about the same price as Google Translate and GPT-4 will be 20 times more expensive. I think it’s also important for me to point out GPT-4 is very competitive with the special purpose translation models. But I found ChatGPT-3.5 Turbo is not quite as competitive. So I think overall the takeaway is probably the GPT models are still more expensive than the special purpose models.

SlatorCon London 2024 | £ 980

SlatorCon London 2024 | £ 980

A rich 1-day conference which brings together 140+ industry leaders views and thriving language technologies.

Buy Tickets

Register Now

Florian: Now, when you say GPT-3.5 Turbo is the one on ChatGPT, that’s on the free version, right? 

Graham: Yeah, the 3.5 Turbo is on the free version and then if you subscribe, you get four.

Florian: Very interesting. And then of course, all the integrations and things like that and maybe if you start hammering it with millions of words, maybe it’ll crash out and there’s all kinds of other difficulties. But yeah, interesting that it’s kind of in the same ballpark-ish right if you kind of took all of the different parameters overall. Now, do you think it’s a bit of a crystal ball question, but do you think as these general purpose LLMs are continuing to evolve, maybe in the next two years, will they outcompete special purpose models or will there always be a way to make a special purpose model better? And if you tweak it, train it data, or is at some point just these others are just so vastly, they’re just so big and powerful.

Graham: I think another thing is how will the special purpose models evolve going into the future? Will they continue to use the traditional technique of mainly training on translation pairs? Or will we really move into the paradigm where it’s just like we’re training a language model first and then fine-tuning it to do a translation? I expect that it will be the latter. So we’ll see version, companies that are serious about building translation systems will probably move in the direction of taking a general-purpose language model and then fine-tuning it to be a really good translation model. So in that way, if that happens, you could say that the language models have won, but we still have special purpose models. There’s also the huge benefits of controllability. For example, like we talked about at the very beginning, which is like, please translate this formally. Previously I would have to train individual models to one formal model, one informal model, or at least something specifically in that way, but you get that for free with a large language model. So I would take that any day over having to do something specific.

Florian: Now, you did mention that all of these, I mean, there is obviously this proliferation of LLMs and now, as you said, maybe we’re going to build the special purpose on top of the kind of general purpose. Now, there’s a lot going on also in the speech-to-speech machine translation space, just in the last four weeks, we saw VoiceBox by Meta. I got to read this AudioPaLM by Google, Polyvoice by ByteDance, and then something I can’t even pronounce, like Mu2SLAM, also by Google. What’s driving this? Are we going to continue to see these types of releases at this pace or is everybody just kind of coming out of the gate right now and maybe it’s going to slow down a bit? What’s going on there?

Graham: I think I’ve seen a very strong focus on speech translation over the past, I don’t know, two to three years. And I feel like maybe the flurry of recent releases has been both a continuation of that, but also all of the speech people looking at what happened with ChatGPT and all of the language models and saying, we want a part of this too. We want a part of this too or put it in a more positive light, think about the possibilities that this is unlocked. And I like the ideas behind the AudioPaLM, for example, because they’re basically taking the language model and they’re taking a purely language-trained language model and applying it to be able to also do speech recognition in a rather clever way. And I think that’s going to be really important just because there’s so much more text data than there is speech-to-speech translation data. It’s very hard to come up with speech-to-speech translation data. So clever ways to utilize all the resources that we have available is definitely very welcome.

Florian: What do you think they’re trying to achieve because it’s open source? Most of it, right? I mean, maybe some parts of it are not, but most of it is. Is this a bit of a… They’re trying to capture a certain part of the potential user base and then everybody will be developing on AudioPaLM. Is there a corporate interest on top of this or why are they open-sourcing?

Graham: Yeah, so this is a good question. I mean, like a year ago, you wouldn’t have asked this question, probably because researchers in industry were much more open to open sourcing things. And I think back then, the justification for open sourcing was essentially researchers want their research to be publicly available. They want it to be impactful and it will be more impactful with the general public if you make it open source. And so in order to have researchers be happy and stay with the company, companies also were very relatively permissive with respect to open source things. I think that changed very rapidly in the past six months because a lot of companies last few years, but then very rapidly in the last six months because people are looking at OpenAI and saying, like, look at all of the amazing things that OpenAI is achieving and they never tell anybody anything. We’re publishing our research. They’re taking it and using it, but not even acknowledging it or telling anybody. And so I think Google now if you put them in the perspective of competing with OpenAI, it’s like, why should we be telling our competitors what we’re doing if they’re not telling us what they’re doing? So it’s kind of become a more competitive environment. But I think some vestiges of the previous open-source movement are still around. And actually, OpenAI did release their Whisper model, which was a speech model. So maybe people are not quite as worried about having the speech components be released for whatever reason. I’m not sure the justification. 

Florian: Whisper is going to be transcribing this particular podcast because it’s part of the transcription solution HappyScribe that we’re using. And they integrated it very soon after it launched. So Happy transcribing Whisper. There’s also this big push recently, and I think mostly from China, to get LLMs to do machine translation for low-resource languages. At least we’ve picked up on a few of those. Like, have you observed this too? And have you looked into generally how LLMs perform for low-resource?

Graham: I have looked into this a bit, and it’s also included in the Zeno Report to some extent. We have languages like Icelandic and Hausa there, and my general impression is that it’s significantly worse for low-resource languages. It’s dismal into low-resource languages out of low-resource languages it’s competitive but worse than the commercial solutions. But that being said, the commercial solutions are pretty good because they have a very good data pipeline, data flywheel. And if you took a regular open-source model that was just released by anybody, I don’t have that evaluation in the Zeno Report maybe we should, but if we took something like NLLB, which is one of the best open-source models, I think it probably would be similar to or worse than the GPT-4 models.

Florian: I would understand why GPT would struggle with low-resource because, well, it’s very low-resource and nobody I mean, it’s based mostly on the Internet. Somebody would have to manually ingest a lot of that very hard-to-get data, I guess, right?

Graham: That is part of the work that the engineers building these multilingual machine translation models do, right? They go out and find every data source that is possibly available.

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry.

LocJobs.com I Recruit Talent. Find Jobs

Florian: What about synthetically generating data for low-resource? Because I just read something literally today, there’s a couple of students published a paper on that and they try to artificially generate that. Is that a promising direction or not really, or it kind of starts to break down?

Graham: I think it will get you somewhere, especially if you’re creative in your use of other resources. Like, for example, we thought a little bit about generating synthetic data using dictionaries because dictionaries have much better vocabulary coverage than just any data that you can find on the internet, but it’ll only get you so far. It won’t get you natural, well-structured outputs in the target language that go beyond what the model is already able to do, which is what you really want when you’re doing data augmentation. So yeah, I think it will get you something, but it won’t get you all the way there.

Florian: In 2021, you started Inspired Cognition, a startup that helps development teams prototype, evaluate, understand, improve output of text generation, you know, question answering, summarization, code generation and chat. Can you tell us a bit more about it, where you stand with that?

Graham: The basic idea is that we want to provide tooling for making it easier to build AI systems. The work that I’m doing with Zeno actually builds upon a platform that we developed there. So the platform is calculating evaluation metrics for various tasks. Zeno is actually a front-end. It’s an open-source project. I’m working together with it as part of my job at CMU, but also supporting it through the company. Basically the idea there is that we want to make it as easy as possible to find and fix errors in AI systems and Zeno is the front-end interface that people can use to do that. And the way it works is basically by doing exactly what we did in the Zeno report there, which is, identify subsegments of the data where your models are failing and go in and that’s where I stopped for the Zeno report. But let’s say we figured out that we have a particular problem, let’s say the model is outputting transliterated text or something like that. If we’re using a GPT model, we could say, make sure you output only the translation and nothing else or something like that in the prompt and fix that problem. Or if we want to do post hoc detection of the model not working very well, one of the other problems we identified is that outputs tend to be too short or too long. So if we get a very short or very long output with respect to the input, then we could go in and add something to resample from the model to fix that problem. But I’m very excited about the development of the Zeno platform as a way to do this and so if anybody listening to the podcast would like to try it out, I’d be happy to support it as well.

Florian: Who are the target users at this stage of development?

Graham: Yeah, so the target users for Zeno are any developer using AI in a system, they don’t necessarily need to be a machine learning expert, but it needs to be somebody who cares about the quality of the output of their systems and specifically, it actually can be used for a pretty wide variety of tasks, but specifically we’re looking at text generation tasks or code generation tasks. And machine translation obviously is a text-generation task, so that’s very much in the target of the people that we’d like to support and help with this tool.

Florian: Yeah, a few of those machine translation people should be listening to this podcast, so get in touch with Graham. One quick question I have. How do new graduate students think about machine translation today? Like when they first get into NLP solve problem niche, little things still big blue ocean. How do they think about it?

Graham: Yeah, for machine translation, I think it’s obviously not solved for very low-resource languages. There are very difficult topics for higher resource languages as well. Like, one of the things that I’ve worked on recently is translation of figurative language. So metaphors and other things like this. I think at least at the very beginning of a PhD, I wouldn’t recommend that somebody tries to beat ChatGPT at translation right away because you need to have a really good sense of the really good sense of the landscape. And actually, when I was a PhD student not to talk about ancient history, but when I was a PhD student, I really appreciated my advisor telling me, hey, right at the very beginning, why don’t you start with something, that’s a little bit more niche to build up your skills and understand the space better and then move straight into attacking like the main castle. Basically, I think that’s similar to what we would do today. But if people are interested in translation, they should definitely look at the existing problems, maybe do an analysis like I did with the Zeno report here, and understand exactly where GPT is failing, and then cut off a slice of that and then can cut off a bigger slice.

Florian: Now, what are some of the things you’re working on heading into kind of the second half 2023, 2024? Are you going to continue on the Zeno report or is that a one-off?

Graham: Yeah, we’re definitely going to continue on this. I think my interest in the Zeno reports, it’s not just focusing on translation. I want to focus on the newest, most interesting developments in the field of NLP or generative AI in general. But I feel like you were just mentioning of speech translation that sounded pretty interesting to me. So maybe that’s on the next one of the other things that we should think about. So I’m going to be thinking about the most interesting applications to be working on in general.

Florian: To me, that’s still the Holy Grail because it’s got so many. I mean, you got the translation problem, you got the voice problem, you got the latency problem, and you got almost like the philosophical impossibility of getting it right. Unless you wait until the sentence is fully formed, right? I mean, especially in German. We all know sometimes you don’t know if the person has done or has not done the particular thing until you wait until the end of the verb. So, yeah, there’s a few PhDs still to be done in that area as well.