The State-of-the-Art in Machine Translation with Language Weaver’s Bart Maczynski

SlatorPod #199 - MT at Language Weaver with Bart Maczynski

In this week’s SlatorPod, we are joined by Bart Maczynski, the VP of Machine Learning at Language Weaver, the translation tech brand of Super Agency RWS, to talk about the challenges and advancements in enterprise-grade machine translation (MT).

The discussion delves into the distinctions between enterprise and consumer-grade MT, with challenges including data security, scalability, adaptability, user experience, and risk mitigation.

Bart touches on the impact of large language models (LLMs) on the landscape, noting potential risks, such as deceptive fluency, and the need for control in enterprise settings.

The VP discusses the recent launch of Evolve, an automated post-editing solution that combines auto-adaptive neural MT, machine translation quality estimation, and a secure, private LLM.

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

Bart talks about the evolving landscape of language AI and the integration of MT into broader workflows, driven by innovations in orchestration and automation platforms.

Bart shares insights into the future plans of Language Weaver, with a primary focus on bringing Evolve to the market and broadening its applications, supporting more languages, and exploring improvements and adaptations in various components.

Transcript

Florian: Today we welcome Bart Maczynski, so Bart’s the VP of Machine Learning at Language Weaver, the translation tech brand of Super Agency RWS. Tell us a bit more about your career background. You’ve been in the machine translation, language AI space for quite a bit and just kind of leading up to your current role at RWS. Tell us more.

Bart: Yes, that’s right. I inconceivably started in the year 2000. I remember my first interview at Trados. I think Clinton was still president. My first job was with Trados headquarters in Alexandria, Virginia. I am still in the DC area, as I mentioned. I held various positions at Trados, SDL, and then RWS. Also kind of had exposure to other types of corporate cultures through Language Weaver, and before that, through Idiom, another acquisition. My work mostly focused on enterprise and government customers. I started with translation with CAT tools, translation management, then moved to terminology management, and then finally machine translation. This job that I have now, I have held for about four years now.

Florian: Okay, government makes sense. Washington, DC. Close to the clients, always the best. Tell us a bit more about the story behind Language Weaver. It’s a brand. It kind of got set up. Then there was the SDL acquisition, and then there was some rebranding. But now, not now, but I think in 2021, RWS brought back the brand Language Weaver, so tell us a bit more about that.

Bart: Language Weaver was founded in 2002 by two researchers in NLP and machine translation from University of Southern California. I actually had the pleasure of working with one of them, Daniel Marcu, on some government work. So some of the early work on machine translation at Language Weaver was done on behalf of the US government and through a grant from DARPA, the Defense Advanced Research Project Agency, and then SDL acquired Language Weaver, I think, was 2010, commercialized the operation further, integrated the solution into its own translation management systems. And then during the course of that path, there was a bit of rebranding. We were called SDL MT. We were called Be Global. But one thing that SDL had the foresight to do is to keep developing both the cloud and the on-premise version of the product, and maybe we can talk about it a little later. And it started as a statistical solution and then came the neural MT revolution. We moved all of our models to that technology. Then we moved again when we adopted the transformer based architectures. We did a lot of work to optimize the solutions for CPUs, for example, to lower the cost of ownership. We developed adaptation mechanisms even for on-premise customers and then RWS acquired SDL and merged its MT team with that of Iconic, which they had acquired a little earlier. And we thought it was a great time to bring back the original brand, especially that it still had a lot of currency within the government space, so that’s where we are right now. And the most recent change is that the Trados brand and Language Weaver, as well as internal tech services within RWS, were formed into a new group that we call RWS Linguistic AI.

Florian: I want to talk about that in a second. Before that though, can I just zoom in on one random detail? You mentioned CPU. Is that still an issue or anything that anybody cares about that any models run on CPUs?

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms.

SlatorPod – News, Analysis, Guests

Bart: Yes. When you talk about scaling, you always think about scaling up, right, and of course we support that. We have customers that can translate with throughput of 500,000 words per minute in some cases, but we also have customers that need to scale down, that may operate machine translation models on field devices for use cases like digital forensics. And then it becomes important to be able to run inexpensive, off-the-shelf hardware.

Florian: Actually, can you give us some kind of an idea what type of interesting client projects, research projects you’re working on at the moment? Obviously don’t break any NDAs, but just to give us a bit of a flavor.

Bart: Sure, so one of the customers that I really enjoyed working with, and it took a long time, so I took a lot of pleasure from that engagement, was US Forces Korea. We deployed Language Weaver Edge, so the on-premise private cloud solution on a military network to support the US mission in its communication with the Korean allies, and up until then, there were no solutions really. It was mostly human translators. And Edge is currently used on an ongoing basis by a pretty substantial user community on the peninsula. Especially during exercises, because they have a special sort of urgent communication regime during those exercises, with briefings, documents, they produce a lot of presentations and so on. So here, that’s where the scaling came in as well, because we are able to support thousands of users in the community, both on the US and Korean side. Additionally, our tech services team has built an integration with the chat platform from Cisco called Jabber that they use for real-time communication. So this was enjoyable also for non technical reasons. I’ve learned a lot throughout the engagement. I came away very impressed with the sort of methodical approach used by the US Armed Forces and the kind of insight into the mission there.

Florian: Wow, so that is a government-grade machine translation. Now, I was going to ask you about enterprise-grade machine translation, but actually I didn’t really have government-grade machine translation on my radar. So can you unpack, is there major differences there?

Bart: This is a good point. There isn’t a huge difference between that maybe other than government is still a little less likely to jump on the cloud. But this has been sort of something that’s very dear to my heart, what enterprise-grade machine translation is, and my background is in translation. I came from the linguistic work side, and throughout my career I learned a lot about what makes a technology solution successful. And I would offer a definition that enterprise machine translation is not only optimized for a specific linguistic outcome, but for a specific business outcome. And there are a lot of things to consider when you look at it that way. Enterprise MT has to survive the scrutiny of a sophisticated enterprise or government buyer. And that means multiple aspects like data security, scalability, ease of integration, adaptability, user experience, and things like risk mitigation, or even acceptable licensing models. Now the trick here is that the solution has to tick all of the boxes all the time, not some of the boxes some of the time. So, for example, if you’re, I don’t know, a Japanese technology company with US presence and you need to translate PDF files, it’s not enough to answer the question, can your system translate from Japanese to English? And it’s not enough to answer can your system translate PDF? But you have to answer, can it take a scanned Japanese PDF and translate that scanned Japanese PDF into English? Because you’d find that some solutions cannot do that. And having this approach, that the sum of capabilities forms the solution, that’s really what makes a distinction here.

Florian: How big is the technological kind of, I don’t know, Grand Canyon between enterprise MT compared to all of the kind of consumer-grade MT, single SaaS licensed subscription stuff we’re now seeing pop up more and more and more and more, right? I mean, it was there four or five years ago, but now it’s just, there’s a wave.

Bart: Well, the interesting thing about it is that enterprises are formed of user communities, and these user communities become familiar with non-enterprise tools and they expect the same kind of experience or ease of use and so on, without perhaps being fully aware of all those boxes that need to be checked that we talked about. I think the main challenge here is that consumer-grade MT by definition, cannot be controlled by the enterprise customer to the same degree. So with free online translation tools, at times you may take the risk of exposing your data, or even worse, customers data to the outside world. In many industries, this is burdened with a very high regulatory risk, not to mention loss of trust potentially. There’s also often not an easy way to manage consumer-grade translations at scale. There may be no way to train the models to make them more relevant for your business. And last but not least, I think it’s also the kind of provider you’re dealing with, right, so, entity you do business with. So enterprise solutions are more than just technology. For example, we have dedicated teams of, we call them TAMs, Technical Account Managers, and they support enterprise customers not just through the implementation, but every single year that they use the solution throughout the whole duration of the license. So I think that’s a very important aspect as well.

Florian: Also, just another technical question, you mentioned cloud Edge and on-prem, and I think a lot of people may not be fully kind of aware of what that actually means in MT and kind of being able to offer all of these three solutions, right? A lot of people would just build on a cloud today, but just tell us who needs what.

Bart: There is also a hybrid solution. Yes, I will mention that too. So, Language Weaver is fundamentally available for two types of deployments, with some variations within each of them. So there’s the SaaS subscription-based cloud-based system that we host out of AWS in Oregon here in the US, and for GDPR compliance in Frankfurt, Germany. Typically, the licensing model follows the cloud model, which you get a subscription for annual consumption type price, you get a certain set of billions of characters every year, access to all the generic models and all that. Now, this is already a very secure solution, so by design, for example, we don’t keep any material that gets translated, it gets purged automatically. But some customers, often by policy, are not allowed to send their content outside of their IT infrastructure. And for those we have the on-premise deployment system, we call it Language Weaver Edge, to differentiate from the cloud solution. And while it can be deployed on private cloud, on physical hardware, it also supports Kubernetes, so in a dockerized form. It uses the same linguistic underpinnings, the same type of translation models, the same capabilities. If you want to train the model, you don’t have to send your data to us. We will never know what you are doing with your deployment. And recently we also introduced a hybrid model that uses the business logic and file processing and the API on-premise or in private cloud. But we host the actual models in our cloud so the customers don’t have to host all of those models. Because if you want all languages, you will host, I don’t know, 154 models, right? So we break the documents down, we encrypt the sentences, send them out for translation, and you automatically retrieve them in your Edge application.

SlatorCon Remote June 2024 | $ 180

SlatorCon Remote June 2024 | $ 180

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Florian: Wow. Yeah and that’s where you need the TAMs for, Technical Account Managers for some of this implementation.

Bart: Yeah, there are options, and increasingly people ask for dockerized implementations because this is kind of a very convenient way of doing things at scale.

Florian: In October 2022, a lot of the discussion around language and language AI has changed when GPT-4 was thrown at us or ChatGPT. Now it’s been like, I don’t know, 15 months. What were the top three things for you, maybe, or top five that have changed in machine translation since that big moment?

Bart: It’s an excellent question, and I think we’ve all lived through this, right? The enthusiastic adoption of these technologies. I think the first thing that comes to mind is the increased exposure via the powerful AI narrative that you see everywhere. This is something that’s happened across all knowledge based industries, not just translation, right? So last week we were at the Legalweek conference in New York, which we go to every year, and there was hardly any booth without something AI related plastered on it. Guilty as charged. We did the same thing. Now, this, of course, can lead to good things, because you get exposure to the sort of higher level decision makers, budgets get unlocked, and so on. But it may also lead to a set of preconceived notions where people think AI is the solution to all their problems. I have to say that our enterprise customers have shown a lot of prudence here and understand the risks involved. I think the freelance community jumped at it immediately, because the risk to reward ratio is different there. And the main risks that we’ve seen identified by our enterprise customers, which kind of in positive light, show the neural machine translation, by contrast, are, for example, the deceptive fluency displayed by LLMs at the expense of accuracy. I call it sychophantic translation, where they really want to please the user. And I have an interesting example. I’ll try not to mention the customer name, but it was a life sciences company, ran a comparison test, took translation output from several services, and ChatGPT was one of them, and one of the strings mentioned a name of a drug that was authorized for use in another market by the local health authority, the regulator, and nobody knew how to translate that. None of the services but ChatGPT, while it didn’t know how to translate the drug name into English, it did know that it was a hypertension drug. So in English translation, it used the name of another hypertension drug from a different manufacturer just to make the sentence sound good. It was good, but it was the wrong product. So I think that also gives you an insight in how important it is to control the technology that you use to understand the world.

Florian: Can I just dwell on this for a second? That’s so interesting, because that just would never have happened with a more narrow NMT model. That model would have just said, well, I don’t know.

Bart: Yes, I think there is a philosophical difference between NMT as such, where your semantic payload is already in the source segment, right. Whereas the approach that the generative AI takes more broadly is, well, I’m instructed to produce something, right, that’s what happened. I think it also highlights the value of using smaller models in general, where you can control and adapt as needed. On the other hand, by contrast, and you may disagree, it also potentially demonstrates what kind of MT solutions are going to struggle most under pressure from GPT and similar systems, which is essentially the non adaptable consumer-grade cloud systems, predominantly. And at the same time, I just want to make sure I’m not overdoing the case, large language models are too good not to use them. Okay. They are amazing, so they have some fantastic capabilities, for example, the ability to use context to improve the outcome. So rather than rip out NMT, replace it with an LM, a clever combination of these technologies can offer a way forward.

Florian: Now, let’s assume that that wrong term would have made it through in a real case and it would have created a legal issue. Then maybe eDiscovery in a court case would have been needed. I’m trying to make a nice little segue there to asking you about what you do also in multilingual eDiscovery, because in my past life, we used to actually offer eDiscovery as an LSP because it was kind of complimentary when we’re talking to legal clients. So this is also something that Language Weaver does and tell me a bit more about that.

Bart: Yeah, we’ve had quite a lot of experience in this space, and it’s one of the reasons we were at that Legalweek conference I mentioned. I think we have about 40 deployments specifically for eDiscovery, and we integrate with partners like Relativity. Most of the customers we work with are law firms. I think we work with about maybe 14 out of 20 top law firms in the US, but we also work with enterprise customers and government agencies, do a little bit of that too. eDiscovery and broadly litigation support is very interesting because it’s sort of the same category of use cases as digital forensics, compliance, media exploitation, even things like open-source intelligence or even captured enemy material. So unlike localization, which is a kind of content dissemination use case, eDiscovery belongs to the, let’s call it, content assimilation category, where you face an enormous influx of data that you need to analyze in order to gain meaningful insights, and in many cases this data is multilingual. And then once you know what’s in there, you’ve translated, analyzed that, you can use a human-based service to support the litigation directly for things that need to stand up in court. And there’s a whole ecosystem of solution providers, Relativity, Nuix, Exterro, extending into forensics like Cellebrite, and our technology is used often to make those systems multilingual.

Florian: On the backend being the engine there, I want to talk about Evolve, something you recently launched. You call it a linguistic AI solution. Evolve kind of came out, big PR, give us the elevator pitch. What kind of user should be an early adopter? Where do you see it in one or two years?

Bart: Evolve is automated post-editing, so this is the kind of ultimate supervised learning approach for the translation use case. It uses three technologies, auto adaptive neural MT, so these are models that can learn on the fly, translation models. The second is machine translation quality estimation that’s been calibrated by human linguists. And the third one is a secure, private large language model that’s fine-tuned to execute the post-editing task. And the flow is as follows: you submit a document, the system translates it using the adaptive model, and then automatically sends this document translation to machine translation quality estimation step, which gives you three potential results. Good, adequate, bad, and then we ignore the good translations and we send the bad and adequate translations to the LLM to improve them. Once it’s done editing the sentence, we actually don’t rest here. We back propagate it to the QE model again to see if an improvement happened and we actually do this up to three times. We don’t want to spend the whole day doing it, but rewrite for better results. And the result is something you can still review, you can still use human post-editing on it if you want to, but the field of work you have to do is much more narrow because a lot of the heavy lifting was done by the automated service. And the idea of Evolve came from the drive to address what we think is currently at least the last frontier in the translation process, the human intervention, of which post-editing is probably the most intensive kind. And as you know, RWS has what, 1700 language specialists internally and some 35,000 in an external network. So at this scale, any improvement in productivity leads to savings both in effort and reduction in turnaround time. So we set out with a premise that even a small improvement can make a substantial difference, and we actually had a beta phase. Recently we’ve worked with some of our biggest customers to understand the impact. We decided to focus specifically on the biggest localization programs in the world, and we ran tests on their content that they kindly provided because we wanted to get empirical data and the proof is in the pudding. Evolve approach does indeed work very well. We already have a list of companies that are eager to try Evolve. In that press release you mentioned, you may have seen a quote from Dell, for example. So the next step, as we are productizing it, is to expand to more languages. This takes time. We hope to have about 20 languages supported by the end of the year.

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry.

LocJobs.com I Recruit Talent. Find Jobs

Florian: What about speed? I mean, you mentioned all these steps and speed and cost. I mean, if you have these complicated individual steps overlaid, does one see, or is it fairly instant, or do you see this kind of populating?

Bart: So if you want to see it, it’s going to be slower, right, but that’s what we did during the beta, just to provide the feel with the visual impact of what is actually happening. But Evolve will be available as an API as well, so you will be able to request a translation and specify from your app, from your CAT tool, from Trados initially whether you want a regular Language Weaver MT or Evolve MT result, and we can do this automatically at scale.

Florian: Obviously, a lot of this is closed source. What do you think about this whole movement, especially pushed by companies like Meta, where just a lot more language AI, for lack of a better term, is now being open-sourced. People are building on top of it. How do you see generally the open-source movement in machine translation impacting it?

Bart: I like it, but we are a provider of our own machine translation systems, right? But we are not dogmatic about the approach. If your goal is to go to the market with an enterprise solution, you will want to control a lot of aspects of it, right? And that means work both on the research side, but also the engineering side. As I said, we are not dogmatic about it. I understand some providers may not have the advantage of the in-house science team like we do, and using open-source models may be the only path for them. This is in fact the path that we decided to use for Evolve, because we had a lot of experience deploying models. We are very good at creating adaptable neural machine translation models. We are very good at creating human-calibrated quality estimation models. And we did experiment with the big foundational APIs, OpenAI and others, early on to see what kind of results we can get. We actually did experiments. We integrated it with our test platform, Language Weaver, and we ran translations, and we also ran post-processing and preprocessing steps, so it was interesting. Like, hey, here’s text with missing diacritics and a lot of misspellings, can you fix it before we translate it? Or here’s text, can you rewrite it for a different audience using different register and so on? And I think that what you want to do is get the best of both worlds. If you have expertise in-house to build part of your solution, go for it, because it’ll be your differentiator. If you find a use case where you can use an open-source model and you know how to integrate it within your infrastructure, go for it as well.

Florian: Now, it used to be that kind of machine translation was kind of this contained universe, tech wise, conference wise, maybe even developer wise. And now, since these LLMs have come on, there’s this massive blurring I’m feeling of lines between all the different categories in NLP, natural language processing, that people have kind of grown familiar with over the past couple of decades. How do you think about this? Am I the only one thinking? I’m sure I’m not, but that everything’s getting so complicated and blurry, and a lot of the knowledge, kind of we had from before, is becoming maybe less relevant.

Bart: There have always been sort of two reasons for machine translation being a bit insular, if you will. One is the one you mentioned, which is the kind of academic NLP space. The other was the localization industry, which understood machine translation as a means to a particular end within the localization use case. But if I were to look at our customer base for Language Weaver specifically, about two-thirds of use cases are not localization based. And this is where we had already, before the big AI revolution, a lot of exposure to other NLP requirements. In some cases, MT was in a much longer analytic workflow, where content would be optimized in some way, or acquired, stored, optimized, translated, and then would be analyzed and so on. The way we call it, the way we talk about it is the left and the right of translation, if you will. So the left would be content generation offering support, et cetera, and I think that’s a natural expansion for MT providers like us. Here we have a unique position, I think, because we have access to our own structured content management technologies like Tridion or Propylon, as well as an interested and somewhat enthusiastic enterprise customer base that utilizes those solutions. The ROI for authoring is pretty clear and generative AI holds a big promise there. And of course you have to make sure you properly tune the solution to understand both the relevant nuance of the customer as well as the publication structure, right? To the right is the content analytics and other related categories. For example, a few years ago we added extractive summarization capability to Language Weaver and to Edge. It works across all the languages we support, so the common use case would be you take a report or an article in another language, run a quick translation on it, and then ask the system to summarize it for you and it will create a list of salient points from the article. And I think if anything, it is now much easier to contemplate additional solutions from those spaces because you don’t need to build another point solution. I think the LLM technology is a bit of a shortcut of universal capabilities, that entity recognition, data cleanup generation, and all this kind of stuff.

Florian: What do you think about a few of these kind of Zapier-style orchestrator that are coming out. I think one that’s mostly in the lead is Blackbird, but Phrase came out with it as well, Phrase Orchestrator. Do you think they will impact the MT landscape because you can kind of just move the MT block into these complicated workflows? Or how do you feel about that?

Bart: They will impact both MT landscape and translation management landscape, maybe even more so. Blackbird is a great company. Love those guys. We partner both through Trados and Language Weaver. I’m very impressed with their capabilities and the pace of innovation. We have a very similar integration with another partner, this time from outside of the translation industry. A Dutch company, Betty Blocks, this kind of low-code, no-code orchestration automation. I think these approaches reflect the changing realities in translation. For example, nonlinear workflows, workflows with iterative translation needs, right? You translate now, publish, and then maybe improve later based on feedback. The general idea of bringing the translation execution closer to content sources, maybe even within content sources. I think we’ll see more of these type of approaches. There are other factors at play. There is a decentralization of translation production even within big corporations. Traditional localization programs may not always be able to cover all the sub cases that may, for variety of reasons, fall off the wagon. And in many cases, automated translation solutions are so good that customers need internal review based on the subject matter expertise, but not the linguistic expertise, and it’s much easier to do it through your own workflow. So I think these are all the drivers that will make Blackbird very successful and I wish them the best.

Florian: Now, we spoke a lot about text, or mostly about text. How about that multimodal machine translation, voice and other areas. Are you looking at that at all or not very much on the radar at the moment?

Bart: We are looking into that. The initial support for voice will come from the R&D we’ve been doing on Trados, supporting the subtitle localization use cases, but now we are one team, so we’ll be able to reuse that expertise. Back in the past, on the pure MT side, we’ve typically partnered with other ASR providers to bring solutions together. I think it’s now more likely that you can build a solution quickly without a partner if you want to. Although we have had success there for some very big customers. I think the world of ASR has changed. There are some really interesting open-source models available and customers come to us with some very interesting use cases. For example, things like, I don’t know, voice intercepts, so we are looking into it. I don’t want to talk about the details today, but it’s something we are focusing on.

Florian: Anything you can talk about, anything exciting, launches, innovations in 2024 that are coming up?

Bart: One thing that I want to say is that we have never had that level of opportunity, not only when it comes to market, but also internally. Because we are now one linguistic AI group, we have a stronger voice within the company, we have a lot more capabilities and types of experiences. We started 2024 with Evolve and it’s going to be our main focus to bring it to the market in meaningful ways. I like how the Evolve story ties in with the auto adaptive MT, since it’s essentially a workflow that generates its own training input, so what’s not to like? So we are looking into broadening it into more types of applications and use cases. For example, we have customers that for various reasons may not be able to work with an LSP. Is there something we could do for them? The immediate goal is to support more languages and in the future, this is the research we already started. We are looking into improving the Evolve capability by adding more adaptation to the other components of it. So you can think of adaptive quality estimation, you can think of injecting external context into the LLM to make it adaptive as well. And I think that we are going to have at the end of the year, alongside with other innovations that we recently brought in, like the fluent terminology, a very compelling modern translation experience for the enterprise.