Google Translation Hub — Mallika Iyer on Launch, Features, and Roadmap

SlatorPod #146 - Google's Mallika Iyer on Translation Hub

Google’s Head of Product, Translation AI, Mallika Iyer, joins SlatorPod to talk all about the company’s new Translation Hub.  

Mallika begins with her journey from software engineer to leading all of the translation products for Google Cloud, most recently Translation Hub. She shares the motivation behind launching the hub, where they saw that overall demand for translation had increased, but budgets necessarily hadn’t.

Mallika discusses the basic and advanced tiers the hub offers, with the latter including domain-specific translation, translation memory, and translation quality prediction. She explains how the hub has gained traction early on with languages of lesser diffusion in addition to the major FIGS or CCJK combinations.

Mallika talks about a case study with Avery Dennison and how they rolled out Translation Hub to all their employees to improve language communication and promote an inclusive workplace culture. She expands on the different sectors the hub serves, from those with limited budgets in the public sector to those on the opposite side in retail and manufacturing. 

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

The pod rounds off with Mallika laying out the Hub’s roadmap for 2023, with plans to add more document types, improve user experience, and integrate with third-party products. Translation Hub’s ultimate goal is to make the hub more accessible with new features while keeping its simplicity.

Transcript

Florian: We are keen to hear about the new Translation Hub that Google launched recently. Before that, tell us a bit more about your professional background. You went from software engineer to now leading Google’s Translation AI team. What was the transition and what brought you to Google eventually?

Mallika: My background was in computer science. I did my undergrad and my master’s in computer science. I started as a software engineer, but what was interesting to me about software engineering was solving problems. The next logical step was I started working a lot with customers of the products that I was working on. At my previous organization, I became more of a customer-facing software engineer and then I pulled back and decided I want to do some more pure code. I went back and forth and then I realized that my heart lies with solving user problems. The other interesting thing that drives me is certain problem spaces interest me either because of their complexity or because they seem hairy and large. Let us go see how we can make something somewhat concrete and crystallized that will help people. Started out with this thing called NoSQL before NoSQL even became a buzzword. We are talking about 12-plus years ago. Started out with big data before people started talking about big data everywhere and then when they did, I was like, I am out of here. Then I moved into platform as a service, containerization before that was a thing, and eventually found my way to AI because the potential in AI is enormous. The problem space, depending on what you are trying to solve, can still be very ambiguous. That is how I ended up at Google. I joined Google towards the tail end of 2017 and I took over as the head of translation for Google Cloud at the beginning of 2022. 

Florian: Can you expand a bit more on the current role as Global Head of Product and Translation AI? Where does it sit in the Google organization and what are some of the other units that you are working closely with at Translation AI? 

Mallika: Absolutely, one of the biggest arms of Google is Google Cloud, and we serve the enterprise users of Google. I lead all of the translation products for cloud and for our cloud customers. In that capacity, my role is to make sure that any of the translation research we bring to our enterprise customers/users. In this role, I work very closely with all of the different research arms of Google because they are the ones who are working tirelessly on the latest and greatest in neural machine translation, AutoML translation, and so on and so forth. As part of the multiple products that are part of Translation AI within the cloud, the most mature one or the oldest one is neural machine translation. Also available freely on Google is Translate through translate.google.com or you might know it as Google Translate. We have the enterprise version of that for the cloud. We have a number of additional features that we have built out over a period of time into that API as well. We also have AutoML translation, which is custom translation or domain-specific translation where we allow the users to bias our underlying model so that they get more domain-specific translations right out of the box. Then we have media translation, and Translation Hub, which is the newest launch. It was launched last year. 

Florian: You outlined a bit of the difference between the Google Translate API and the Translation Hub. The Translation Hub launched back in October 2022. If you had to describe it in a very short 30-second elevator pitch, how does it differ for the enterprise users from the raw API or even the web-accessible Google Translate?

SlatorCon Remote March 2023 | Super Early Bird Now $98

SlatorCon Remote March 2023 | Super Early Bird Now $98

A rich online conference which brings together our research and network of industry leaders.

Register Now

Mallika: First of all, Google Translate is a free translation interface. If you are a company that wants guardrails around how your data is retained, how your data is translated, and those enterprise security features that all companies rightfully care about, then the free web translation tool is not the way to go. It is great for casual users and it is an API, so you have to hook it up programmatically as part of your workflow. There is a decent amount of developer time that is required to make sure that it is part of your workflow. The main difference between all these things and Translation Hub is you just make an account on Google Cloud platform and start adding users. We have had customers onboard their entire organization or their entire team in under 30 minutes. Most of that was going to Google Cloud platform for the very first time and putting your credit card or information in. The actual setting up of users and assets and so on in the Translation Hub is very small, maybe five minutes if you have all your assets ready to go because it is just adding users or bulk-adding users and go. We integrate with Google sign-in or with whatever your email domain is and there is a lot of flexibility in that way.

Florian: What were some of the rationales for launching the Translation Hub? Was it we have all these existing individual features, options, ML products, and now we are combining them into one? Or was it driven by a large number of users requesting something like this?

Mallika: It was neither. What we started noticing was the overall demand for translation and the amount that users or companies are translating has increased over the course of the last several years. The interesting thing is the budgets have not increased in the same way. It is like, doing more with less. What we noticed is the ones that have the budget do not suffer. The ones that need translation but do not necessarily have those kinds of elastic budgets tend to suffer from the lack of access to high-quality translation. We have all of this great tech, so why do not we make something so that translation is accessible? It should not be, here are five APIs, go build your own. That is not accessible. It is possible, but that is not what we mean when we say accessible. As we started looking at this problem, we were like, what can we do to bring the best of Google AI to ensure that our users can continue to keep up with their language, their content translation demands while not compromising on the quality? Can we do something to increase the speed of translation while at it? There is this traditional route to translate, which is very much required depending on the content you are translating, where you send content out to a language service provider and then they provide a high-quality translation. These are human beings translating, so it is not going to be instantaneous, for a good reason. Does every piece of content that needs to be translated need to go through that path? If it does not, then can we somehow give back the cost advantages and speed to the user for types of content that do need to be translated, but do not need that same level of high-touch translation? That is the rationale behind why we created the Translation Hub, which is also the reason why it is very lean. If you have tried the Translation Hub, it is intentionally a very lean workflow because it is not meant for five or six or seven reviewers looking through the content over and over again to ensure that it is fit for purpose. 

Florian: Let us also talk about some of the components and features. You have AutoML, there is a translation memory component, and there is a linguist UI. Can you list this up for us and also what type of content format and types the hub supports?

Mallika: Absolutely. What we have tried to do within Translation Hub is bring all of Google’s translation innovations together in a way that makes sense. We have two tiers and in the Translation Hub there is a basic tier and an advanced tier, and basic is what it says it is. Here is neural machine translation, you can do some customization with terminology control, and that is about it. You can retain your document format to a very large extent and you are good to go. Within the advanced, we expose a lot more levers for the user, including AutoML translation or custom translation where you can automatically get the models that you have built. You can use domain-specific language models to translate so that right out of the box you will get a higher-quality translation. You can also leverage translation memory in our advanced tier. This is all in addition to neural machine translation and glossary terminology control. In the advanced tier, we have neural machine translation, glossary and terminology control, AutoML translation, which is your custom domain models, translation memory, translation quality prediction, and the ability to have your format retained. Today we do that for PDFs, DOCX Microsoft formats, PowerPoint, Google Docs, and for slides. 

Florian: Let us touch on two of those. First, there is a post-editing tool for linguists. Can you tell us a bit more about some of the features there? 

Mallika: Absolutely. Again, keep in mind this is built in an intentionally lean way, so we have two ways to do editing. A user can self-edit and then they can send it out to a translator to post-edit. The reason for this is, as we look through our user base, we learned that many of them, especially where there are budgetary constraints and so on, are usually well-versed in at least one of the target languages that they need to translate this content into. They can translate some of this content themselves. We figured, let us give them the ability to self-edit, and at the same time let us ensure that they can reuse these edits if they want to by capturing it in translation memory. The nice thing about that is once you have enough high-quality content in translation memory, you can use it as a source of data to train a new AutoML model because a TMX file can be fed into an AutoML model and it will be trained.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms. Subscribe Now.

SlatorPod – News, Analysis, Guests

Florian: You also mentioned the quality estimation feature, which is something that is ahead of even some of the more specialized industry platforms. Can you tell us more about this and how it helps users? 

Mallika: It is MTQP, so machine translation quality prediction. We launched it first for the quality prediction for neural machine translation and our goal there was to give the users an easy way to visually get a sense of which content is the lowest on the confidence score so that they can attend to that first. This is again, keeping in line with our goal of, is this piece of content fit for purpose, and what levers can we expose to our users that allow them to get there very fast. Once they edit the content it will automatically be retained in translation memory if they want. If they do not want it, it will not be. 

Florian: What if an organization is not running on Google Workspace? Is that even possible to use the hub or not at all?

Mallika: It is absolutely possible because Translation Hub is a Google Cloud product so if you have Workspace, great. If you do not have Workspace, that is fine too. This is the reason we have two ways to sign in. We have a sign-in with Google and we have a sign-in with email and password. Independent of your domain you could be using Office 365 or you could be using something else. It does not matter. You could be using Workspace, but you do not want to leverage that for whatever reason, so you could sign in with your regular email and password. Your Translation Hub admin can set up all of these things during the process of onboarding users to the Translation Hub, so it should be pretty straightforward.

Florian: Let us talk a bit about language and language combinations that you have on the Hub. Google Translate is probably the most comprehensive technology there for translation in terms of language coverage in the target languages and all the combinations you get. What are some of the most used languages you are seeing currently on the Translation Hub, and where do you see momentum building as well? 

Mallika: I expected to see the most common languages being used and when I say common languages, the most spoken languages in the world or the most spoken languages in the United States. If the users are United States users, maybe we will see Spanish as one of the highest. That did not seem to be the case at all because where the hub seems to be delivering on solving for user pain is the languages where it is harder to find experts or it is possible to find experts, but because there are not as many of them, it is expensive. We have seen that the less frequently spoken languages are the more commonly used languages in Translation Hub. Along with the usual suspects, depending on which country or region you are operating it from.

Florian: That is also in line with the mission, lowering the barrier very far and having a lot more people being able to access this. Interesting, so it is not immediately the FIGS, the French, Italian, German, and Spanish.

Mallika: The usual suspects are there, but they are not the only ones. We saw quite an interesting trend there which made sense because we wanted translation to be accessible. We wanted the content to be accessible in all languages. The ones who are always disadvantaged by the high price or lack of accessibility are the ones where there are not that many of those language speakers. We are seeing an interesting trend there and I will tell you a little bit more when we go into the sectors that we see use translation more and so on and so forth.

Florian: In some of the earlier reviews we saw some industry observers mention that they are missing some connectors to third-party content repositories like Adobe Experience Manager, Drupal, Eloqua, Zendesk, Salesforce, et cetera. Are there any plans to open the hub up to those content on-ramps or how do you see that? 

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. Browse new jobs now.

LocJobs.com I Recruit Talent. Find Jobs

Mallika: I expected this review initially because it was a trade-off initially between how lean we want to keep it on the initial launch because we want people to focus on what is available in the hub today. There is absolutely a plan to open it up and make it more accessible and easier to integrate with other third-party products. That is definitely on the roadmap. It is coming sometime very soon, sometime over the course of 2023 for sure.

Florian: It is a never-ending list. If you look at the MarTech stack, there are 5,000 logos in there and there are so many content repositories. You had a very interesting case study, Avery Dennison, on the launch. What was their key pain point, and how did the hub help to address it for them?

Mallika: Avery Dennison is a global manufacturing company and their global comms team is a very interesting group. They are ahead of the curve in a lot of ways because they look at technology and AI and go, where can we use this to solve problems that will drastically change or enhance the quality of life of our employees or our customers or our users? Avery Dennison rolled Translation Hub out to all of their employees in their organization. Their goal was to promote employee engagement and an inclusive workspace. These are things that people say, but to do it, you have to roll out technology and make sure that these things are accessible. Language is only one part of it. There are other things when you talk about inclusivity and employee engagement and so on, but since we are talking about translation, I will stick to the language aspect of it. What they did was they rolled Translation Hub out to all of their employees and they rolled out a number of AutoML models as well. What they started seeing was their employees were translating everything, even factory information sheets that are tacked up on the notice board, and things like that. I am paraphrasing here from my peer at Avery Dennison but it was very well received. He shared publicly that they were at some point translating from 32 to 48 languages if I am remembering correctly. It was fairly simple for them to send out their global communication message from the CIO and leadership to all of their employees. Alongside that, they also mentioned that they did not get any tickets on not being able to access it or it being problematic or not working which is a dream for anyone who is in charge of making sure that any tech stack is running. Ideally, you want the number of tickets to be very low. I thought it was phenomenal what they did because they delivered what it means when it comes to language and translation and content accessibility with inclusivity and employee engagement. They are doing a lot of other cool things, but we will keep this to translation here.

Florian: Is it generally quite broad what you are seeing in terms of user uptake or are there particular sectors or user groups within the enterprise that are leading the way in adoption?

Mallika: What we are seeing is where there is a tremendous amount of content, we are seeing a significant amount of traction. What comes to mind when you think about that? Government, public sector, education, and on the flip side, healthcare, retail, and manufacturing. You do not think about a budgetary constraint when you think about these sectors, but translation needs are very real and nobody’s budget is increasing, so we see both sides. They are all looking at Translation Hub for somewhat similar reasons where they do not have a huge budget, but they have a lot of content to translate and need it translated yesterday and they cannot wait for two plus weeks. Or they do not want to pick and choose which content to translate because it takes XX dollars to translate one page or so on and so forth.

Florian: You already gave us a preview of the features when you talked about the additional connectivity to these third-party systems, but what else can we expect from the hub in 2023? 

Mallika: Some of the things that we are looking at are more content types. I told you what document formats we support today. We want to keep expanding that and we are looking at additional formats. Expect more document formats to be supported. We are also looking at additional fine-grain control within translation memory. We are looking at making AutoML even more accessible to our users. This is again very research heavy and this may not be available in 2023, but we are continuously trying to make it so that our domain-specific translation models are easier and easier for users to build. That is something that we will be working on over the course of 2023 because that has a huge impact right out of the box on the quality of translation. These are some of the top things that I can share. There are a lot of little things that we will continue adding. A lot of them are around user experience, intuitive user journeys, and so on and so forth. We will add more kinds of roles for users. It is a fine balance because we do not want to make it complex. It is so easy to make something complex. 

Florian: Those systems can get very complex. You do not want to rebuild that again. 

Mallika: Correct, we do not want to go in that direction. Those systems exist for a reason. Our goal is to keep it simple and lean for the users who want that simplicity. It is going to be a balancing act but we are going to try to introduce as many features as we can while keeping it simple. We will not compromise on the simplicity and ease of use of the Translation Hub.