1 month ago
September 17, 2021
Inside DeepL and Welocalize VP Olga Beregovaya on Language Tech Evolution
Olga begins with her journey into the language services and technology industry, where she currently leads AI innovation for Welocalize. She unpacks some of the broader applications of AI, ML, and NLP outside the language industry, referencing personalization engines and AI-enabled chatbots.
The AI Innovation VP recounts the early days of implementing NLP algorithms and the evolution of the computer-assisted translation environment. She shares the role of humans in MT workflows and injecting AI into the global content transformation ecosystem.
Olga shares her thoughts on text-generative models, such as GPT-3 and BERT, and the importance of synthetic data. She identifies the under-utilized NLP applications in the language industry, from multilingual sentiment analysis in digital marketing to how terminology plays into MT.
She outlines key AI language trends in light of increasing translator productivity, advancing neural machine translation, and integrating dynamically trained MT.
First up, Florian and Esther take a deep dive into DeepL, as they discuss the leading MT company’s approach to neural networks, quality performance, enterprise solutions, new features, users, and growth.
Esther shares highlights from AI Media’s ASX Small and Mid-Cap Conference Presentation about multilingual access services, where they raised AUD 40m to fund the acquisition of US-based EEG. Esther then covers Keywords Studios’ H1 results, which saw their Audio, Localization, and Localization Testing units contribute 27% to total group revenues.
Florian talks about Zoom’s plans for live, multi-language transcription and translation features, which comes as no surprise after they acquired German simultaneous speech translation provider Kites this year.
Florian: Tell us a little bit more about your background and your route into the language services and technology industry. You have been in this space for quite some time and were the Head of the American Machine Translation Association, 2016 to 2018.
Olga: I always knew that I would go into languages and linguistics and I have never done anything else. I have two advanced degrees in linguistics. My first job in the language industry was building a lexicon for rule-based machine translation systems. Then I got into localization and as you get into machine translation, you learn more and more about computational linguistics on the fly. Then I was in localization on the buyer’s side for some time, and then I was CEO of PROMT America for some time, driving the building of PROMT enterprise systems and deploying them with enterprise customers so that was exclusively machine translation at the time. I then joined Welocalize and have been driving machine translation and machine translation broadening and NLP broadening, so ML, AI. Now leading AI Innovation for Welocalize for the past few years. It has just been this linear progression doing nothing else ever.
Esther: You have got this lovely AI Innovation title, but what is it that we mean precisely in your title AI? What does that consist of and generally help us understand what AI is?
Olga: What is AI? We have talked about AI suddenly over the past five years, obviously more than ever over the past three years. If we go by the general Wikipedia definition, we can say that AI is a branch of computer science that teaches machines to mimic human reasoning and human behavior, tasks which are generally executed by humans. That would be the generalized definition. Now, obviously, I work for a translation or global content transformation company. One of the world’s super agencies, so everything I do in AI applies to human language or human content processing.
What I do at Welocalize is what would be referred to as AI, as it applies to NLP. How can we apply to global content transformation? I look under every rock and have the vision for innovation. Where can we take AI? Where can we take NLP? Where can we take ML? How can we transform global content? Generate global content faster? How can we help our customers achieve their global content goals faster, less expensive? How do we deliver on time and how can we chase and predict that content? There is a lot of mythology around AI and part of my job is to take a chunk of that urban mythology and make it applicable.
Florian: You mentioned initially working on rule-based, but outside of the language industry, what are some other applications of broader machine learning, AI and NLP that you are seeing?
Olga: Taking a step back from NLP, let us look at broader applications of AI. First, use AI and machine learning in one sentence, and I think there is a tiny bit of confusion that machine learning and AI are interchangeable. Somebody said that “if it is written in PowerPoint, it is AI and if it is written in Python, it is machine learning”. I love this expression and I hope I do not sound too controversial when I say that maybe AI has become a little bit of a buzzword, where machine learning is the actual technology base. What do we mean by machine learning? We mean actual techniques and knowledge that enable machines to reach success, reach a certain point and then self learn and evolve. That is what we mean when machines learn.
Whereas AI is a machine’s ability to mimic human behaviour and now in the modern world, we see AI everywhere. AI powers logistics. For instance, AI is behind Amazon’s warehouse, it is powering logic behind their supply chain, behind their logistics. If you go online to buy tickets, AI is behind selecting that flight for us. Pretty much every area has matching algorithms, personalization engines. You go on Netflix, on Amazon Prime, all the personalization engines are AI-driven based on our past behavior and based on our previous selection. Recommendation, personalization engines are all driven by AI algorithms.
Now, if we zoom into NLP, natural language processing, the areas are conversational AI and chatbots. We log onto our bank page, we interact with AI-enabled chatbots. Practical applications of AI have evolved over the past five years and even more so past three years. They might have been a little bit blurry previously and not as evolved. All of us, I am sure, have yelled at virtual chatbots, then called our bank, begging for a human. Now we see those technologies evolving, so I would say that practical applications are way more flexible.
Florian: What were some of the first attempts at using AI, NLP and CAT tools? Tell us about some of the earlier initial bridges where AI had a use in the language industry and how it evolved from there.
Olga: I was there in the early days of deploying natural language processing systems in the translators day-to-day. First and foremost, there was the introduction of translation memory systems. Those were pretty successful implementations of early days natural language processing algorithms based on fuzzy matching algorithms. Those were translators’ first introductions to actual productivity gains from natural language processing. There was an attempt at the evolution of translation memories, substantial alignment and substantial fuzzy matching. We looked at chunking and then matching segments, and those obviously used more advanced natural language processing algorithms, so we looked at evolution.
Then we came to the whole terminology management universe, and that was coming in crazy flows to more advanced natural language processing because we started using character-based, frequency-based algorithms, extracting terminology. I would say a lot of those statistical terminology extractions are still in use today. We started plugging in rule-based machine translation systems, and no offense to rule-based machine translation systems, they did not render humongous productivity. Some rule-based legacy we are still using now.
If we look back 15 years, there was that foundation for the modern translator universe ecosystem. You harvest your terminology, you harvest from the translation memory system, and you plug in your productivity tools such as a machine translation. Is translation memory going to survive or is machine translation becoming so good that we might not have a need anymore? I am still struggling with that idea of where translation memory is going? Where is the whole work surface as we know it going? Is it going to give place to actual AI-enabled translator work surfaces? The early days were the precursor to the modern-day AI-enabled translator environment as we remember it.
Florian: How transferable was that expertise? People have been in this space for 10, 15, 20 years, how many times did they have to reinvent their level of expertise? Was it gradual or was there a complete reinvention of the skillset?
Olga: We are all exposed to the CAT work surface. We follow the evolution of the CAT environment, the TMS work surface, and what the standalone CAT environment looks like. Frankly, I have not translated much in my life, but I follow the evolution of the CAT environment. I would say that they look very similar to what we have known. Obviously, the UX is evolving. The whole world’s UX is evolving. The Microsoft Ribbon as we knew it 20 years ago has changed tremendously. The CAT environment itself probably is pretty transferable. If you started translating 30 years ago, you evolve with a UX experience, but the whole CAT experience might not have changed much. If anything it is becoming simpler.
Now, how you interact with MT engines is definitely changing. At Welocalize, one of the groups on my team is the program management group and they chase the evolution of MT because what you did was rule-based and how you interacted with rule-based output, it follows the syntax, it plugs in the new vocabulary as you customize the engines. You did not train rule-based, you customized, but you have more or less robust syntactic and morphological structure and then statistical becomes way more random and way less predictable than morphology so they needed to rewrite their training course. Training courses for it are changing as well and we contribute to in-house training courses.
Rule-based courses, they rewrite and they retrain our post-editors and URL. Again, fluency is there, but as we know neural can generate pretty random meaning and then suddenly you have your fluency, but what happens to post-editors with neural post-editing, you can be completely misled by the fluency. The way you post-edit neural output is pretty different. You have your post-editing and your interaction with a CAT platform, and skillset, but you do need to reinvent your skill set quite a bit as the AI technology evolves. Not only is it my opinion, but it is empirical evidence from the field.
Esther: If we are delving into the human-in-the-loop workflow, whether that is post-editing or some other form of human-in-the-loop, where do you see that heading? Are there other new different ways that we can interact as humans with machine translation?
Olga: Let us cast the net a little bit broader. You asked me what my role is at Welocalize, it is injecting AI into the global content transformation ecosystem. I have to give credit to Alon Lavie, formerly of Safaba, then Amazon Translate and now Unbabel. The first time I heard the notion of content drift was from Alon when we were talking about training and retraining machines translation systems, and dynamic retraining of machine translation systems. Content is always evolving and your machine translation systems not only should play chase but should predict where the content is going and be prepared to handle that. I would take it further and say that AI applications, as they apply to natural language, human language, should be there. That is the same in the world of NLP, as a subset of AI, it should be able to capture and handle those new content types.
Human-in-the-loop is not only interacting with machine translation, humans are now interacting with annotations, post-editing and validating outputs of summarization. Professionals in our language field are interacting with the outputs of AI in a variety of tasks. That whole language profession is evolving in so many directions and the quality of AI output is still lagging. The quality of machine translation will always be lagging, because quality is arbitrary, utility is practical.
How usable is your machine translation? The output of the summarization engine, the output of your sentiment analysis, the human translator, or post-editor, or let us call it language expert, will always be needed to compensate for that lag and this is the role that I would see. This is the evolution of this platform. Will it always be the CAT platform or will it become that work surface that provides facilities for a human professional to interact with the output of various NLP applications? This would be my broader view of where it would be going, and back to MT, there is so much talk about human parity. All MT developers are talking about human parity. We still get a lot of feedback from the field, even more so when we are talking about domain-specific customized applications, yet it sounds like human parity is still lacking when we talk about high visibility, high impact content.
Esther: What I want to know is whether you see any NLP applications that are currently untapped, so applications for the language industry that you think we could benefit from? What do you think has potential but is just not being utilized in the right way?
Olga: If we take the occupational translator as a language professional, I would say that as of right now for translator productivity, transforming contents from one language into another, we are pretty much there. If we just take subsets, localization and translation. From here, we should just be evolving the tools, referencing terminology, and enhancing translator experience when it comes to how terminology plays along with machine translation. Again, I am not sure what the role of translation memory going forward is going to be. A lot of companies are trying ICE matches and 100% matches and everything else goes to machine translation. Just evolving machine translation further, maybe thinking along the lines of what modern MT is doing in terms of dynamic training.
Again, maybe plugging in more AI interaction into CAT work surfaces rather than making them traditional work surfaces, like predictive typing, for instance, has huge potential, predictive guessing in the work surfaces, things like this. Making the actual work surfaces more AI-enabled, but that is when it comes to the translator occupation. Now, what I think is under-tapped is services we as LSPs could be providing to our customers, and things that our customers from the buyer side would be thinking about when it comes to multilingual global content or technology ecosystem, which are underused. If you think, for instance about multilingual digital marketing or market analysis, what I do not see still enough of is multilingual sentiment analysis.
We do get some requests from the customers and we are telling customers that we have those capabilities, that we have learned enough about NLP to provide multilingual sentiment analysis services. Think how much you can learn from the field about how people in Malaysia truly feel about your product. Think about feedback from the field that you can get or internal communication or even the role of neural machine translation. We have learned to plug machine translation into our translation workflow but think about all the work that we could do, plugging in machine translation into corporate chat.
There is a whole gamut of NLP applications, say, summarization, going into HR, going into employee comms, summarizing them and instantly translating them for HR processing. If we look at the whole spectrum of NLP application, there is so much that the translation industry can do that is not being done yet, not even as much content generation, but in terms of content analysis and processing, that we are just scratching the surface with. The services are there, we could do it all, we are just not doing enough of it.
Esther: I wanted to get your take on the concept of chatbots. How does one go about making chatbots multilingual? What kind of training data and requirements are required?
Olga: That is something we did not ever think that we would be doing 10 years ago. Let us first approach it from the client perspective. We were building huge user manuals, thousand pages user manuals. Then multi-page, super crazy, complex links, online health systems generated separately. Then we go into the blessing of single sourcing. Where you generate and then single source, and then our customers realize that we still cannot produce online help. Although they then transform it into Wiki pages and then they are getting simpler and shorter, and then they become interactive. Then the realization comes that the same customer service assistance experience can be much more interactive, much more usable through the help of chatbots, and the whole customer interaction is your tremendous online health system. I am not saying user manuals do not exist anymore. Have they evolved? Yes, they have and then they started giving place to chatbots more and more.
Let us imagine you bought something and now you can log onto the page and then you have your virtual assistant. QB’s still exist, but how many knowledge bases have now been converted to your chatbot experience? Instead of building huge online help systems, you will be developing training data for chatbots. Those chatbots are data-hungry and that will become the whole line of service for us that we would be providing to our customers. We did not know that those services would become multilingual and the chatbots would equally be data-hungry for English and for global chatbots.
How do you make them multilingual? There are two schools of thought. You can use machine translation, and there are multiple chatbots systems where you build your chatbot in English and then you plug in MT. But I recently dealt with a Japanese request where it quickly became obvious that you do not get the same level of user engagement with machine translation that you get with English. Or you rewrite your scenarios and you build a whole library of utterances for the language you build the chatbots in and basically, you rebuild your multilingual chatbot from scratch. Both approaches generate substantial amounts of data to train your global chatbots. Again, you see many companies that do nothing but generate and annotate data for global chatbots, or super agencies like ourselves have a whole line of business that does nothing but generate and annotate chatbots data. The time may come when chatbots will be trainable on unlabeled data but for now you need annotated data to train chatbots.
Florian: What are your thoughts around GPT-3 and these models that are out now? How do you use it at a large agency like Welocalize?
Olga: First and foremost, chatbots equally operate in the NLU and NLG space. There is a passionate debate about which one is harder. Is it harder to comprehend human language or is it more difficult to generate? So far, the sentiment is that understanding is more difficult than generation. What I am aware of now is synthetic data. Synthetic data is a huge deal, it is generally used to augment human data, so generated data in the modern systems. Is it a threat? That is a huge question. Is it a threat to our industry that synthetic data will take over? So far synthetically generated data is used to augment human data and then that humongous corpus of human data is augmented by synthetic data. There you can get pretty good results with data augmentation, but so far it is more of an enabler than a replacement. To validate the quality of synthetic data, you still have some room for human-in-the-loop, although in some instances you are just ingesting synthetic data.
Then there is the whole paradigm of self-annotation, self-labeling. That is something that we are looking into as well. How much human labeling do you need before the machine learns to annotate as well? If we think about what BERT was created for and the most common applications of BERT, enhanced Google Search. Roughly 10, 15% of Google Search algorithms are relying on BERT, summarization, and some text generation. The applications of BERT were not as aggressive when it comes to replacing human content generation.
Now, GPT, GPT-2 was a fairly big splash. Obviously, GPT-3 was a humongous splash. That is why we started talking about it writing poetry, it generating human parity level texts and solving problems. They tested it on 12,000 spoken texts and it solved roughly 3%, 6% of problems, so problem-solving is left with a big question mark. I personally would be a little bit skeptical about GPT-3 coming in and solving all the problems of humanity and, even less so, chatbots and using GPT-3 as the practically applicable response system and text generation system. I would say I am not alone in this. Once it is published, I might not be making that many friends out there because I know that GPT-3 has a lot of proponents. It has huge potential most definitely.
Ilya Sutskever from OpenAI said that GPT-4 will be trained on roughly a trillion parameters and using some completely crazy hardware behind it.GPT-4 will likely have many more practical applications and practical capabilities, but I would see much a more narrow application of the current GPT-3, around things like code generation. We know that Microsoft is using it for writing potential chunks of code. GPT may work for things like this, but for solving human content problems, to the best of my knowledge, so far, there are some success cases, but the failure cases are so alarming that I would say that we should be pretty cautious. I am following it very closely. It can generate extremely human-like text and extremely human-like responses, but then you inject some content that it is not prepared to answer, you can basically take it astray fairly easily. Then the randomness and the bias of the response is going to be so high that it becomes dangerous before it becomes useful, and it takes very little.
Esther: What are the key trends in language AI that you at Welocalize are looking at? What do you think are two or three important trends for the coming two to three years?
Olga: We are definitely looking forward to increasing translators productivity with further advancements in machine translation, for machine translation to come closer to human parity, and for neural machine translation to become more enhanced when it comes to accurately capturing the meaning of the sentence. First and foremost, advancements in neural machine translation and even more so when it comes to long-tail languages because right now we are still lagging a little bit, the economy is not staying steady, and the expansion into more remote regions automatically drives demand for translation in long-tail languages. Expansion of MT into long-tail languages and domains is definitely something that we are very interested in and things like zero-shot translation between language pairs, other than English, is definitely something that we are very, very interested in and following very closely.
Subsequently, trainability of engines and dynamically trained, not pre-trained, dynamically trained machine translation and tight integration of dynamic machine translation into what I would imagine should be NLP and AI-enabled work services. Then let us look outside of just NLP. Looking at NLP is just a tiny chunk of what we can do with AI in our industry. AI is a part of the overall digital transformation in our industry. We started talking about AI driving supply chain, AI driving logistics, the same should be happening in our industry. I am a huge fan of what Unbabel is doing with its resource assignments and crowd management. It is an awesome example of what the industry should be looking at, at large, assigning resources, selecting resources on the fly, managing the whole logistics and AI-enabled platform. That is definitely something that we are doing at Welocalize and looking to further expand at Welocalize.
I know that a lot of other companies are looking at the whole digitization and injecting AI into overall processes. Then, as I said, data is a self-nurturing system, AI craves data, and you use AI to generate data. The whole concept of synthetic data generation, using natural language generation to create more data, to feed into AI systems. Although, not to forget about the role of human-in-the-loop. We still need humans to generate the initial datasets to further generate data to create robust AI systems. Then removing the bias and making generative models less biased and more accurate. Then inject reasoning into those language models, right now they cannot reason and we want to teach them to reason and then we can use much more AI languages.