New Trends in Machine Translation with Large Language Models by Longyue Wang

SlatorPod #179 - Tencent AI Lab's Longyue Wang on LLMs and MT

Joining SlatorPod this week is Longyue Wang, a Research Scientist at Tencent AI Lab, where he is involved in the research and practical applications of machine translation (MT) and natural language processing (NLP). 

Longyue Longyue expands on Tencent’s approach to language technology where they integrate MT with Tencent Translate (TranSmart). He highlights how Chinese-to-English MT has made significant advancements, thanks to improvements in technology and data size. However, translating Chinese to non-English languages has been more challenging.

Recent research by Longyue explores large language models’ (LLMs) impact on MT, demonstrating their superiority in tasks like document-level translation. He emphasized that GPT-4 outperformed traditional MT engines in translating literary texts like web novels.

Longyue discusses various promising research directions for MT using LLMs, including stylized MT, interactive MT, translation memory-based MT, and a new evaluation paradigm. His research suggests LLMs can enhance personalized MT, adapting translations to users’ preferences.

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

Longyue also sheds light on how Chinese researchers are focusing on building Chinese-centric MT engines, directly translating from Chinese to other languages. There’s an effort to reduce reliance on English as a pivot language.

Looking ahead, Longyue’s research will address challenges related to LLMs, including handling hallucination and timeless information issues.


Florian: Longyue is a research scientist at Tencent AI Lab, the research division of Tencent, one of China’s largest, if not the largest tech company. Hi Longyue, and thanks so much for joining. So before we start and we’re going to talk about ChatGPT, machine translation, LLMs and how they perform in machine translation, but tell us a bit more about Tencent. Some of the people, some of the listeners in the US and Europe may not be familiar with the company. They may not know that, for example, WeChat is like the major app in China and it’s basically a Tencent product. Just introduce us a bit more to Tencent.

Longyue: Tencent is one of China’s largest tech companies. It is known for its diverse range of services from the social media to game. One of the most popular, as you mentioned, is the product WeChat. It is a kind of multipurpose messaging social media and mobile payment app. But in China today, the WeChat is not just an app, it is integral part of our daily life of many people. For example, we use it for everything from chatting with our friends to paying bills. People can use to share their daily stories in the moments feature and even now I’m in the company, we are working on the enterprise version of WeChat. We discuss our ideas, we email and we schedule our work plan, so it’s really multifaceted. This is the general story about Tencent and WeChat.

Florian: Tencent has been around for quite some time. I remember even back in 2010-ish when I was based in Hong Kong, people were talking about Tencent like it’s a fast rising company, so it’s been around for 15 plus years now already.

Longyue: I think we have 25 years maybe because I have worked here for five years.

Florian: Yeah, it was one of the bigger moments when I switched iPhone here when I moved back from mainland China and I wasn’t able to reactivate my mainland WeChat version anymore, so it was one of the bigger moments. Obviously, when you’re in mainland, yeah, a lot of stuff runs through WeChat. But let’s go on to the NLP, MT side, so tell us more a bit about your professional background and your specific interest in NLP and machine translation.

Longyue: Let me briefly introduce myself. I earned my PhD from Dublin City University, it’s in Ireland, in 2018. My supervisors are Professor Weiming Liu and Professor Andy Way. At that time, my thesis topic is Discourse Aware Neural Machine Translation because during my PhD neural networks just getting hot. I found most neural MT models only deal with sentence level input. Therefore, at that time, I proposed some novel architectures for document-level neural machine translation. This work was honored with the best thesis award by the European Association for Machine Translation. So after graduation, I directly go to Shenzhen and join Tencent, where I have been here deeply involved in both research part and the practical applications, mostly on machine translation and NLP. Until now, I have been here, as I mentioned, for five years. Totally, I have about more than ten years experience in machine translation and NLP. I’m now a Senior Research Fellow at our lab. Recently, everyone knows with the rise of large language models, I have also broadened my research areas. For example, apart from the translation, I also do some other research work like long text modeling, AI content detection, multi-modal, and even some work AI for sense. So this is the general introduction about myself.

Florian: Dublin University obviously one of the hotspots globally for anything related to MT and research. So now, why would Tencent be interested in language technology in general and MT, machine translation in particular? Is there a couple of bullet points there that you can share?

Longyue: Yeah, good question. Actually, Tencent is also a global company. So language technology is also crucial for Tencent because it allows us to serve our diverse user base better. We do it from two parts. The first, MT has been integrated with our many Tencent applications. We still take WeChat for example, because it is very famous. Actually, if you use WeChat, different people in different native languages can also auto translate their messages to each other so we can communicate with each other without the language boundaries. About the Tencent games, MT actually plays a significant role in our overseas expansion because not only Chinese people play Tencent games. Even in Ireland, I have a lot of colleagues, they say, okay, this Tencent game is very good, so MT can ensure that gamers worldwide can enjoy our content in their native languages. But we do not just stop at embedding MT into our apps. We are also in the business of providing translation services. In the past five years, I always working on building our first interactive MT, system we call it TranSmart. You can also call it a Tencent Translate. Actually, we already have a large number of 2B and 2C MT users.

Florian: It’s kind of like, I mean, just comparing it to maybe US products would be like Amazon Translate, Google Translate, so like a cloud native product that big enterprise companies can kind of connect to via API.

Longyue: Our translation have different versions. We have cloud API, also webpage you can try and also the client you can install in your Windows or Mac system. So it’s really friendly to professional translators, not only to give a web page to try some text translation.

Florian: Can we just briefly touch on the perception of language technology in China generally? I haven’t been back since all the COVID restrictions lifted, but before that, I was in mainland quite often, and I had the impression that people are more open to using language tech kind of in daily life. Also and in business, kind of business users tended to be more open to using kind of raw MT for kind of more mission-critical use cases. And consumers, for example, were happy to maybe consume like AI dubbed or subtitled content. Is that an accurate perception? And how do you feel, how pervasive is machine translation use and acceptance in China?

Longyue: That’s a great question. Yeah, I can still use our translation product TranSmart as a means to view this landscape. I think two parts: one part is B2C, so it’s individual users. Another part is B2B because we have a business side, so about the B2B front, we have witnessed significant traction. Many organizations like United Nations and the companies such as Memsource and the China Literature Company have already integrated our translation solutions. For example, the China Literature Company can directly use my domain-specific system to translate their web novels. I know someone like Chinese web novels because it’s really different content. So they tell me after they use our domain-specific system, they can directly publish this text for readers with very fewer human post-editing. So I think this story can indicate a growing trust in machine translation for mission critical-task or professional scenarios. This is about B2B. Also B2C is I think the landscape is incredible variety. We see applications ring from the formal, informal translations. For example, I found some students like to translate their publications or papers, and some other users maybe just translate their casual chats during a meeting. This is different needs I think. About the domains, it’s quite expensive, spanning from general areas like before news reporting, we translate different news from different languages into Chinese. And also we found we have some specific domains like financial and medical translations now. It just goes to show how deep and wide the appetite of machine translation among the Chinese public I think.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms.

SlatorPod – News, Analysis, Guests

Florian: Very much right now. How good is it, like, at the moment, state of the art Chinese-English, English-Chinese and then maybe for some of the other less frequent languages? I guess not how good it is, but what are some of the main challenges that may actually be left because I know it’s quite good already?

Longyue: If we talk about the Chinese to English machine translation, I think it has been significantly progressed because apart from the technology improving every year, we only talk about the data size. In the past 10 years, the data size between the Chinese to English increased quite a lot. Someone maybe know the MTCR task for academics, the Chinese to English translation direction is considered as a high resource MT task, so this is the first part. Another part is many researchers like me or developers, they also like to work on some Chinese-specific challenges. I think there is a difference between English, for example, the Chinese often omits pronouns in a sentence. We don’t frequently use pronouns because humans can infer what we omitted. But we call this phenomena as as pro-drop. However, the English part typically requires all the pronouns, so it poses difficulties when we translate from the pro-drop language like Chinese to a non-pro-drop language like English. A lot of difficulties, but in the past five years we propose a lot of methods to solve such detailed challenges. Now, today when we try some advanced MT products, you can see this problem is well-handled by Google Translator, Tencent Translator and Baidu Translator like this. So therefore about Chinese to English translation, the quality is really comparable with human translators in some scenarios. However you ask if there is some Chinese left. I mainly talk about two aspects which I mainly focus on. The first is Chinese to non-English languages, so for example, Chinese to Portuguese which we want to directly translate the Chinese to Portuguese. It hasn’t grown at the same pace as Chinese to English part because the first thing, we haven’t seen a big jump in data size for this language pair in the past ten years, so there is still some work to be done there. The second part is the more challenging domain. Most times when we talk about translation, maybe we want to translate news or other common text. But for some challenging things about web novels, I have mentioned I work on this for three years, it has various subdomains like fantasy, wuxia, urban tales, different subdomains. These stories are full of cultural references and often have a long plots. So when we translate, it is not just about getting words right, it is about capturing the cultural background, the feeling and the depths of these tales. So I think we needed to do some things on these two directions no matter it is about data construction, method innovation or evaluation method, even think about how to use the large language model to help this.

Florian: Yeah, the web novels you’re describing, it’s probably the final frontier for machine translation. I mean, I can only imagine for a human translator getting all of the cultural references, all of kind of the inside Chinese kind of very compact expressions, right, maybe some of them are barely translatable in the first place. So yeah, maybe with the LLMs it’s going to help a bit, give more context. Speaking of LLMs, on a high level, what’s your current view of kind of the state-of-the-art of machine translation using kind of ChatGPT-like models?

Longyue: Actually in one of my recent paper, the name is Document-Level Machine Translation with Large Language Models. We have provided a systematic evaluation on this. We found that the ChatGPT models have shown greater promise in generating humanlike translations, especially I think in three main parts. The first is challenging domains, as I mentioned before. I also try ChatGPT to translate literary text or dialogues because when we talk about the challenge domains, I think these two domains immediately come to our mind. How about the GPT-4? It has been a game changer I think. We found it outperformed top MT systems in these two domains. Even we use professional human evaluations. The second thing is document-level translation. As you mentioned, one of the standout features of GPT-4 is its ability to grasp the broader contest. When we’re translating entire documents, like the entire chapter of the novels or entire financial documents, we need the flow, the consistency from the start to finish. However, GPT-4 can handle it well. Both can keep good fluency and coherence. We also have some experiments in this paper. You can go to the details. The third part is beyond the translation. I think this is the most amazing part. The GPT-4 isn’t just translating I think. It is more like thinking because it can not only tell which translations is better, even especially when the discourse knowledge is needed, but it can also provide the detailed explanations for the users. So we check every explanations, we ask the professors in the university to check them, but it can go to 90% explanation is correct, so that’s the amazing part. So in general, I want to say that traditional MT models have trouble with language domains, have little data. But however larger language models have vast pre-trained knowledge and generalization ability. They can effectively bridge the gap, so that’s why they can provide quality translation even when sometimes the data is sparse.

Florian: You were playing around with some of this in another paper New Trends in Machine Translation Using Large Language Models, case examples with ChatGPT and in there you said that you’re brainstorming interesting directions for MT using LLMs, including now stylized MT, interactive MT translation memory-based MT, as well as a new evaluation paradigm using LLMs. That’s a lot. Give us a short summary of the main findings there because these four points sound incredibly interesting and I want to talk about them in more detail. But first, maybe just the main kind of framework there.

Longyue: Thanks for interest in this paper. I think this is a position paper. In this paper we list several, as you mentioned, promising directions for machine translation tasks but using the large language models. While traditional MT system have done a commendable I think job, but the capabilities of large language model open up a whole new world of possibilities. We test examples using GPT-3.5 and GPT-4 for different tasks. But most tasks are beyond the basic translation task, like the stylized MT, it is not only require translation, it requires translating original text to specific language, but also in a desirable style. The task is difficult for traditional MT system since they are trained only for faithful translation of the original text. We demonstrate that large language models can perform this task well I think because their capabilities of translation and stylized generation are intertwined. Yes, we also talk about interactive MT. The conversational ability of large language model enable us can interact with them in a dialogue way. For example, we have back and forth with them guiding the translation based on the ongoing conversation. Finally, you got a perfect translation. Although sometimes we found sometimes this ChatGPT cannot strictly follow you or understand your instruction at that time, but we still see some potential in these directions. Okay, this is the general idea about our paper.

Florian: What about translation memory-based MT? What does that mean? You’re using translation memory to train in advance? Are you using it as a prompt or?

Longyue: Large language model can memorize long context. Yes, some memories, like we have some bilingual word vocabulary or something like the example sentence pairs. We want to ask the large language model always remember this key I think, the information every time he translates this one sentence until he finished all the sentences in the document, so I think it is something in the CAT system. CAT system, we also have some memories like this key information, provided by users or some others. We just try to use large language model try to remember this.

Florian: Now, you also hit on another note in that paper in interactive MT that you’re saying one challenge is how to design user interfaces that are intuitive and user-friendly yet also informative and flexible. Is that something you just put out there as a challenge? Or have you thought a bit more deeply about this and can you share anything there? Because that is a big problem, how do people actually work with these like translators? How do they work with these in real life, right? I mean, is that something you thought about more deeply?

Longyue: Yes because when you talk about interactive MT, the user interface actually is front and center because it approaches the user and users and the technology. First, it should be easy to use I think, like a clean design, straightforward instructions and quick feedback. However, the user interface always need, another part should also be very informative and flexible. It should provide the users with enough information because some users may be very professional, they are professional translators. So they try to understand the translation process or choices the system is making. This could involve showing alternative translations, providing the explanations or giving users ability to intervene the translation process. Therefore, designing such user interface with large language model is a challenge. That’s why I think it’s interesting direction for both researchers and the development.

SlatorCon Remote June 2024 | $ 180

SlatorCon Remote June 2024 | $ 180

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Florian: Now, in another post, I think, I’m not sure if it was on GitHub or another post that you made new evaluation paradigm for MT using LLMs. You kind of prompted ChatGPT in natural language to evaluate translation. What’s a potential use case for this? Was it more just playing around with it or is there something deeper there?

Longyue: Yes, we tried this both in this paper and another paper I just mentioned: Document-Level MT with Large Language Models. It’s interesting. Traditionally we always rely on the professional translators or automatic metrics to measure the translation quality. But with the large language model like ChatGPT, we have a new tool in the field. Imagine there is a case, we need a quickly evaluation of machine translated business document. Instead of waiting for human translator’s feedback, ChatGPT can swiftly assess if the translation is up to mark or it needs a human touch. So once more, as I mentioned before, LLM can offer a multifaceted evaluation. It do not just give a binary score or one to five score, good or not, or one point or two point. They can also provide nuanced feedback, even explaining their judgments in a natural language way. So that’s why I talk about the new evaluation method with LLM.

Florian: Machine translation quality estimation is obviously one of the fields that really needs to integrate these new technologies quite quickly. I think the large language models are making quite a sizable impact. Now, moving to one last topic there, personalized machine translation and multi-modal machine translation you also touched upon in your article I think. Actually, I did pull it up now, I think it is on GitHub, right? It’s on a GitHub post you made, so tell us more about personalized MT and multi-modal MT.

Longyue: About the first one, our research indicated that LLMs have a potential to make machine translation more personalized. This means the translation could be adopted to specific needs or preference of users. For example, the models can take into account their preferred style, their vocabulary or the level of formality. So this could enhance the user’s experience and make the translation more useful and relevant I think. That’s why I also talk about this part. About the multi-modal MT, I think it gets even more exciting. Traditionally, yeah, machine translation has been all about text. In the future, I think with multi-modal large language model, it might be possible to translate not only text but also other types of data like audio image. As you just mentioned, another research direction is about this part so we build multi-modal large language model. We call it Macaw-LLM. This work is trying to integrate the text, image, audio, video with LLM, but this is first try. In the future, the first future work I think we want to try explore the image translation task based on this model, so maybe later, in one month later we can update our GitHub and papers about this.

Florian: Macaw is M-A-C-A-W, right?

Longyue: Yes, it’s the bird because I think this bird is very colorful and parrot can also speak some different languages. That’s why I use this name.

Florian: Let’s take a kind of 30,000 foot view on language tech and machine translation like from a regional point of view. So do you see any differences in what’s being pursued in China versus the US versus the European Union? Or is everything kind of moving at the same, maybe not speed, but are research interests somewhat different in China or is it very similar in MT and NLP at the moment?

Longyue: Of course. Maybe you also found that, for example, take myself for example as a researcher the native language is Chinese. Every time I think about the methods or about the applications, the first question is how it performs in my mother language, Chinese. So there is a strong focus on Chinese-specific challenges and Chinese-centric translation. Like I mentioned, not use English as a bridge language. We want to directly translate from Chinese to other languages because try to avoid the error propagation and also have related works I have mentioned for zero pronoun translation. I have been here for five years. Recently I also quickly adapt the Llama 2 to a Chinese version of Llama. We want to enhance the understanding and the generation in Chinese language and now we are also building our own Chinese-centric large language models. I think most Chinese researchers from companies and universities, they are also now working on building the Chinese-centric large language models.

Florian: One of the key challenges or what we’re trying to do is not going via the English pivot but like having enough data to use Chinese directly to other languages, right? Is it still going via English quite often or is Chinese now very much kind of independent from English and can go to many of the other higher resources languages directly?

Longyue: It’s only about from the technical and academic aspects because if I want to build a system, translation system, Chinese users, if it is based in China, Chinese users always try to use it as one direction, so for Chinese to other languages. If we don’t have enough data to build the direct translation we need a lot of subsystems like pipeline. We first translate from Chinese to English and English to Spanish so it’s not very good way for technical I think. So the system will be very huge and once we had some bad feedback about the translation quality, we needed to fix the bugs one by one maybe. Yeah, so that’s only considered about the academic and development scenes.

Florian: What are some of the cool things you’re working on in the second half, 2023, 2024? Are you staying with GPT and LLMs and machine translation? Or you’re starting some other initiatives?

Longyue: Next year it’s hard to see, but in the next half year I have two directions I think. The first is about the major problems about larger language models. If you try ChatGPT, it is good, but it also have problems. The first thing is hallucination. Yeah, hallucination is maybe my next step. I want to try especially about the consistency with facts. The second thing is problems with timeless information because today maybe the US president is A but tomorrow it’s not A so it’s different. The innovation, and the last thing is very interesting is we can have some innovative applications based on ChatGPT API. Even we want to solve some MT problems, we can use large language model as a key component of our MT system which can solve some long text translation inconsistency problems like that, so this is the first part. The second part is about MT task with large language model. The first I have mentioned is a super long document translation. In my previous pitch I didn’t try much longer, but in the future I want to try to translate for example 20 chapters in a book. So how does a large language model handle extensive documents without harming the coherence and the content? So traditional models often lose the context in a long text, so I want to solve this problem. The second is I have mentioned but I will continue to do that, literary translation. In MT this year I have held shared tasks on this but this time because of the data copyrights, this year we only have Chinese to English, but next year we also have more languages like Chinese to German, Chinese to Portuguese, like this. So this is another fascinating area for large language model. I want it to do some more challenging things and I think it’s cool, so that’s my future work.