Enterprise Localization in Times of Large Language Models with Centific’s Jonas Ryberg

SlatorPod #193 - Centific Chief Globalization Officer Jonas Ryberg

Jonas Ryberg, Chief Globalization Officer of Centific, joins SlatorPod to talk about the evolving language services landscape and Centific’s approach to navigating the changes brought about by generative AI and large language models (LLMs).

Jonas looks back on the opportunities and challenges of 2023, with the impact of ChatGPT triggering clients, who are facing budget constraints, to redefine translation quality metrics and optimize processes.

Jonas sheds light on Centific’s proprietary platform, Honeybee, an accelerator that integrates into clients’ localization workflows through APIs. This platform harnesses generative AI to streamline processes such as content creation, assessment of style, and even provides a bot for translators to interact with instead of navigating through style guides.

Jonas explores the growing trend of multilingual content generation with the advent of LLMs. He believes while the output from LLMs is promising, there is still a need for human expertise to review and refine the generated content.

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

Looking ahead to 2024, Jonas reveals Centific’s plan to bring Honeybee and JourneyMate, a customer experience tracking platform, into a SaaS-based platform, targeting both individual users with lower-cost access and custom enterprise clients.

Transcript

Florian: Today very happy to have Jonas Ryberg on the pod, so Jonas is the Chief Globalization Officer of Centific. Centific is a global digital and tech services company. They got core capabilities in data, intelligence, experience, and globalization and localization. So tell us a bit more about your professional background, career journey with Centific. I think previously there was a previous iteration of the company, or previous it was called Pactera EDGE, so just walk us through that a little bit.

Jonas: It was called something else before that. So when I joined this was back in 2011, it was called HiSoft. If I take one step further back, I was running my own translation agency before that in Sweden. I was based out of Sweden. I was doing translation and then sort of expanded into a translation agency and did that for about 10 years. And then around 2011 time frame, I felt I needed a new challenge, so I joined this company that I had no idea who they were, what they were doing, really. I found them on LinkedIn, a company called HiSoft, and they were looking for a Director of European Operations, and the objective for them was to expand into Europe at the time. So I joined as Director of European Operations, and the job was essentially to acquire a company to kickstart European setup for HiSoft at the time. And this was a great learning experience for me in the industry because I got to meet with dozens of companies, agencies, small and large across Europe. And eventually we acquired a company called Logoscript in Spain, in Barcelona, and that became our Barcelona office, and still today, that’s an absolutely critical part of Centific. From 2011 throughout the years at Centific, my role has changed. I’ve been in sales, in vendor management, in operations, and in various delivery related positions. I had a quick intermission as Director of Globalization at business intelligence company Qlik as well before I returned to Centific in early 2017. So today I’m managing all the service lines and supporting functions of those across Centific.

Florian: Were you originally a translator or how did you get into the translation business?

Jonas: Yeah, I was. I did sort of freelance translation back in my senior year in college just to make a little bit of extra money. And then when I graduated, I realized that it was relatively easy and straightforward to get jobs in translation if you just had a little bit of sense of marketing and sort of reaching out to customers and so on. So that work sort of evolved into that translation agency set up that I mentioned.

Florian: Yeah, so you know the business from, I guess the ground up, as they say. Now, Centific and HiSoft and Pactera I’ve always associated with these big kind of globalization, localization programs over the years. So is that correct, and if so, how has this changed kind of these kind of large programs over the past maybe five, 10 years? Because the company I worked for before we sold it to Lionbridge, and before started Slator, was then again sold to Lionbridge, and they had these kind of huge global programs for globalization, localization. So again, I associated Pactera with something similar running these programs, so is that correct and how have they changed over the past five, 10 years?

Jonas: That’s correct, so we play in that same space not only, but it’s a significant part of the business. I’d say if I look at the last five years, this year has definitely brought the most change, as I guess is the case for pretty much everyone. If I look at the four years preceding this year, I think we’ve seen the same sort of slow, incremental changes as most of the industry. Smaller volumes for each handoff, more handoffs, more MTPE, more multimedia content, more marketing content in general, so basically the well known trends in the industry, right? But then this year it’s been very different. So I would say during Q1, Q2, as a response to ChatGPT, we did see a slowdown in general, in demand. Clients were sort of reconfiguring, figuring things out and during that period we also got a lot of RFPs, so a lot of requests for essentially new vendors at companies that we didn’t work at already. And all of them were asking for pretty much the same thing, two things, I would say. So one is a redefinition of quality, what does localization or language services quality mean, and how to do more with less? So I guess a response both to generative AI, LLMs, but also the fact that the main sector that we work in, which is high tech, of course, faced some challenges early this year in terms of budgets and things like that.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms.

SlatorPod – News, Analysis, Guests

Florian: How much do you think of this was macro versus kind of the ChatGPT trigger? Like, were people just all right, so tech generally is tightening their budgets a bit, or was this kind of compounded, like the budget tightening plus, well, here’s a trigger, let’s go out and talk to our vendors?

Jonas: I think definitely a combination of both. There was a very real tightening of budgets with these tech companies. Obviously, we all know about the rounds of layoffs that they went through early this year, late last year, and early this year. But I also think that most of them had incentives internally to try to do more with LLMs, just see how to leverage it. And companies like Microsoft, obviously close to OpenAI and Amazon and so on, they all worked on their own LLM implementation, so they wanted to see what they could do with their vendors as well.

Florian: Give us a bit of an overview of Centific, like your division, the globalization division, like what type of clients you service? We mentioned those kind of big tech localization programs, but other than that, what type of key services, client segments, and what’s your technology look like in terms of the TMS, the CAT, the MT?

Jonas: I’d say Centific in general is pretty different to most of the companies on the Slator top 20 list or top 100 list, right? We’re not the language service provider only. We’re that, of course, as well, but it’s not only that. So we’re a leading global provider of AI-driven platform services and real world data solutions. That’s sort of the pitch, the quick statement there of who we are. And we offer localization language services, but we also offer AI data services like many other companies in the loc industry today, but also AI development, platform engineering, as well as some SaaS-based product offerings at this point. So globalization fits neatly under that platform services piece, and it corresponds to a quite significant part of the overall business. It’s hard to say exactly how much because a lot of the work is sort of in between AI, AI data, and localization language services, and I’m happy to talk more about that, but it’s a pretty wide set of capabilities beyond the globalization piece. And the scope that I manage today covers all of that, essentially, which is an opportunity for us to leverage the expertise in platform engineering, in AI and bring that into localization, and then flip side bring localization into those platform engineering or AI capabilities or services as well. In terms of the tech stack, we have a combination of both proprietary platforms and solutions, accelerators, as well as partners that we work with. So we don’t have our own TMS in a traditional sense. We have some platforms internally that function essentially as a TMS, but we haven’t invested in building a competitor to some of the big household name TMS’ in the translation, language services industry.

Florian: Maybe if you don’t have it at this point, it’s probably good to just go with something off the shelf. They’re getting pretty advanced at this point, so if you have it internally, why not keep developing it? But starting from scratch now probably wouldn’t make a lot of sense. Interesting, so you mentioned the data piece. How have these LLMs kind of changed the kind of data that clients would request, right? We had these kind of big crowdsourced, unstructured huge data sets maybe in the past, and now it seems like it’s going more towards very kind of highly curated annotated data. Have you seen that too and what type of data is in demand there?

Jonas: It’s definitely evolving and it’s evolving fast, much like on the localization side. There was a bit of a slowdown down early this year when they were figuring out what to do with generative AI as opposed to traditional machine learning. That business still remains and it remains at high volumes. It’s maybe changing a little bit and it’s very sort of incremental in that it’s getting more advanced. So the more simple use cases are to some extent solved, so the more straightforward data sets are not in demand in the same way anymore. But if you look at the number of markets that an AI solution or ML solution is supposed to cover, then that’s growing. It’s also increasing in complexity in terms of things like accessibility and diversity in general. So the data sets are getting more advanced. But the biggest change this year with the impact of generative AI and large language models would be that when we look at AI and AI data specifically, we look at the pre, the during and the post for our services. So some of the services would support sort of the training, the pre-deployment of AI. Some of it would be to monitor AI and sort of help manage deployed AI solutions. But the post piece, so after the release of AI is what has really increased in demand this year. And it’s because of how LLMs are trained, right, that they are trained on huge amounts of public text data for the most part, if we’re talking about LLMs. So in that case, you need a feedback loop, so you need reinforcement learning with human feedback. So it’s shifted from pre-AI to post-AI demand to a great extent and that’s happened very fast. And the volumes on the post-AI side is already quite substantial when we look at our business. So it’s really changing how we work with, I think you mentioned, large scale sort of crowdsourced engagements, it’s changing to a different type of talent pool. You need experts now that can provide this expert feedback loop into the LLM to fine-tune it rather than train it in the first place.

Florian: I recently watched a really interesting one hour intro by Andrej Karpathy, like former Tesla AI, and kind of now I don’t know where he works now, but he’s putting out really great YouTube videos and he also mentioned the reinforcement by human feedback component. And I think he mentioned that it’s usually selecting between three, four, five, seven, whatever options. Is that what you’re doing as well, or is there a different, like mostly it’s about the AI output and then some human expert selects it?

Jonas: I think that’s the most common approach, that you would get a few different responses to a prompt and then you would grade them, essentially rank them. That would be the feedback to the LLM for fine-tuning essentially. There are a couple of other ways of doing it, especially on the language side. If you look at LLMs and deploy LLMs to support language programs, then you could look at things like acceptability and other types of putting a score to a response or an output from an LLM.

Florian: Is your experience historically from kind of the translation, localization, globalization side and managing these kind of pools of experts, right, translators predominantly in the localization workflow? Is that helpful as kind of a company expertise, managing this at scale for these types of tasks as well? Or would you have to kind of build it in a very different manner?

Jonas: It’s definitely helpful. It definitely helped as we started out in this space and I think it’s a natural fact today that some of the big tech companies go to former or current LSPs that have added this AI data piece because they have that experience of global workforce management and global program management. That’s quite important as well. If I compare Centific and a couple of our competitors that also have a LSP background, I think that’s where we really excel compared to more startup players in the Bay Area, Silicon Valley that might have a strong platform, but not the global program and talent management experience, so for sure it’s very helpful to have that.

SlatorCon Remote March 2024 | $180

SlatorCon Remote March 2024 | $180

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Florian: Yeah, the Scale AI’s that get like billion dollar funding rounds, et cetera. You guys recently published a report called The State of Localization, kind of looked into the future of the industry. Can you just walk us through some of the key highlights and kind of challenges you found in that report?

Jonas: I’m kind of happy to say that we were pretty much on point with the predictions that we put in there and then sort of the commentary around that. At that point when we released it, it was kind of obvious that there was an economic downturn, so we did talk about that. We talked about how the loc industry would need to do more with less, so what can you do as a buyer and as a provider to address that. We talked about how quality would need to be reconsidered or we thought that it was ripe for a redefinition, that we had talked about the same quality metrics for years, if not decades in the industry, right? So we started seeing already more questions from our customers on that too, what might be a better way to look at quality. And of course, we talked about AI, ChatGPT was already around, and we had worked with AI already for years at that point. So it was clear that it would be a very important year in AI and how that would influence localization as well. And then finally we talked about the human factor, which remains of course super important even with AI, or maybe even more so with AI.

Florian: We spoke about this kind of human reinforcement learning and where the expert sits there. But in the translation, localization piece, what are you hearing from buyers? How do they think about translators at this point> we had, for example, at SlatorCon Zurich, we spoke about, well, it’s transitioning from translator to language specialist. Is that something that the buyers also see? Is that something that they’re kind of coming to you and say, well, look, translator is great, but a lot of this is now AI and we need people that are in this loop. So how do they think about it?

Jonas: I don’t think we’ve seen that much commentary from our customers around the talent. It’s definitely something that is high on the agenda for us, but we haven’t had that question yet. As to the shift from translators to language specialists, I totally agree with that sentiment. It’s very much in line with how we look at the evolution of the people that we work with. But having the right talent is still super important for our clients too. It’s still something that they ask about, for sure. And sometimes when we go and we pitch an RFP or something and we talk about AI, anomaly detection and LLMs and stuff, they don’t care. They just want to make sure that we have the right translators for the job, essentially. So we get a little bit carried away sometimes talking about these interesting new things and forget about the core aspect, at least in those conversations, because still really the human factor is maybe the most important factor in order to deliver good quality.

Florian: It’s fascinating. It’s also because we’re talking to, I mean, as an industry, to so many different sets of clients. I mean, some of them are very kind of tech forward and they’re very open to adopting these latest and greatest solutions. And others are like, well, I just need perfect quality and I’m okay, maybe also to pay the price as long as you find the right people. And has it gotten harder maybe, I mean, just conceptually, has it gotten harder for you to find the right translators? And maybe there’s, I mean, I’m hearing from some universities they’re having trouble kind of signing up new people to even study translation because some of the younger folks think this is maybe not something they want to pursue as a career. Is that something you’re seeing or not really?

Jonas: Not really. I would say we should expect that to happen because people in the industry, I think, understand that there will be jobs for translators, whether language specialists or translators, I think that they will be needed. But there is a sentiment generally, especially in the AI community, that translation is a problem that is solved essentially with AI, right? But at scale, with the right quality and so on, it’s not really right. But something that maybe to counter that point of shortage of translators, what we did talk about in the report was sort of the lingering effects of the pandemic. So during the pandemic, obviously all of us, we were sort of stuck at home and people worked from home instead of going into offices. And what we observed was that there was not an insignificant number of people that liked that. They liked working from home and as they were told to come back to the office, they actually started looking for work online and they moved into translation in some cases, in many cases. So we started seeing an influx of people with a completely different career path or trajectory than we would have seen in the past, like people that studied translation and so on. So experts in other fields that moved into translation because they liked this way of working from home and so on. So it actually opened up a new influx of translators or language experts, new experts that wanted to try their hand in translation.

Florian: That’s probably fairly sustainable because while some people like to go back to the office, I think a fair number of people actually like the new way of working. So on the tech side, we came across when we did some of the research a toolkit you call Honeybee, which you say enhances the efficiency of localization tasks. What is Honeybee and who do you see as kind of its main users?

Jonas: It’s essentially an accelerator that our customers can integrate within their localization workflows through APIs and it puts together processes and technology, for example, to streamline how reference materials are used to affect the large language model generation and fine-tuning of that. And we provide a few different capabilities through Honeybee. So things like the obvious things that a lot of companies are talking about, like MTQE, content creation, of course, is something that is coming more and more, but a few other more specific solutions as well, like style assessment, looking at the source content and provide input to translators on what to consider when they translate the content. We have some other really what I think cool solutions or tools as part of this, for example, a bot that translators can use to talk to instead of having to search through style guides or term lists and so on, to really help translators do a better job or faster get to the information that they need. So it’s a collection of different solutions based on generative AI, essentially.

Florian: It’s interesting. Yeah, so you don’t have to go into some kind of old school terminology database or style guide. You can just kind of ping a bot that’s sitting somewhere next to you, like in your UI. Very much integrated.

Jonas: Exactly, integrated into the TMS or the environment that the translator would work in. And to your question about the users, and obviously translators would be one type of user and of course our customers then that want to do more with less to that point, leveraging LLMs, it enables that through APIs. And then, of course, Centific we use it internally also for various processes to elevate how we work.

Florian: Now, one question that I want to explore with almost everybody who’s on the podcast is where do you see scalable use cases for multilingual content generation? So it may achieve the same aim in the target, but it’s not a translation and that’s something that LLMs have brought because of their generative abilities. That was something that just didn’t exist two years ago. Do you see any demand for this? Are any clients coming to you and say, look, we need this to achieve this in 20 languages, but we don’t really need a source anymore, we just prompt it and boom, off it goes, those 20 versions?

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry.

LocJobs.com I Recruit Talent. Find Jobs

Jonas: Yeah and that’s where this style assessment and source analysis piece of Honeybee makes sense. So we can create a summary of the text or a text or a prompt or brief if you like, and then provide that to whoever is going to review the output of the LLM. So we generate content, we have a brief based on whatever the text is supposed to be about and then we have a human-in-the-loop component to review that as the final step. It’s mostly on the marketing side. That’s where we’ve seen success so far, marketing type content, but of course, always with a human-in-the-loop component to review it and edit it. So it just changes a little bit the workflow, but it does still require language experts and experts in the specific field.

Florian: I guess it could be language experts, but they don’t need to be fluent in any source, right, they just need to check the output and like, all right, that works for this particular brief?

Jonas: Definitely, and in fact, we’ve seen better output in some of the content creation pilots that we’ve done when we don’t use a source as like translated or transcreated compared to just creating it from scratch based on a certain brief, sort of this is what we need.

Florian: Yeah, sometimes I’m testing it all the time and getting it to write things. It’s still a little bit in this bland, still a bit bland sometimes when I just use the off-the-shelf ChatGPT, et cetera, so I wonder when it gets a little more creative.

Jonas: That’s where you need reinforcement learning with human feedback to fine-tune it over time.

Florian: I don’t have access to that. So, one thing that I found interesting, you mentioned MTQE as kind of one of the hotter areas. So you offer MTQE, and then you offer human translation quality estimation. How do these differ? It’s a bit of a kind of niche question.

Jonas: Yeah, and I think I can try to answer it. I’m not a technical expert on it. I think specifically between MTQE and HTQE, we see more success with MTQE. Human translation quality estimation is pretty unpredictable at this point. We can’t rely on it as a way of deciding what to review and what not to review. Whereas on the MTQE side, based on the programs where we’ve used it, it’s very accurate in terms of this is something that we need to do post-editing on, and this is something that doesn’t need post-editing. So I think HTQE will need some more time before it can be deployed.

Florian So HTQE, the I guess upside would be just even higher quality, like you’re measuring your eventual human output at scale and you can jump in when there’s an issue. One of the kind of theories or thesis I posited was that translation is kind of increasingly becoming a feature of many, many other SaaS products and there’s kind of a convergence of all of these abilities now, machine translation, text-to-speech, speech-to-text. How do you think about this multilingual and the translation component becoming so pervasive and all of these products that are just kind of popping up now everywhere? Because a lot of people are building on OpenAI, on Cohere, on Anthropic, and on all of these foundational models.

Jonas: I think it will be something like that, like a utility that you can access, but not necessarily have a localization department that buys only localization. And especially when we look at something like marketing, and it’s already quite common that marketing department would manage the marketing localization as opposed to product localization, where you would have a localization manager specifically for that. I think that’s going to be more common. And it’s just something that it’s an add-on that you access through APIs or whatnot. And it might be that to your point about content creation, that it’s more of a content creation thing, that I need this type of content, and then you have a language expert come in and look at that. I think it’s very likely that it will continue to converge and just become something that is part of something bigger and not necessarily a localization offering as such.

Florian: Do you feel the need to really keep on top of all of these new product launches, kind of thinner OpenAI wrappers, maybe a little bit deeper products, so you can maybe preempt your customers kind of looking at it or trying it out and kind of building their own workflows around it?

Jonas: Our experience has been this year to be on top of it for sure and be ready to answer those questions when they come, because they definitely come. I mean, they wouldn’t do a good job if they wouldn’t ask about it at least, and want to try it out. So I think it’s necessary to be on top of it. It’s probably impossible to track all of the releases across everything because there’s so much happening now. But yeah, I would say yes, definitely. I mean, this is the big trend now, and we’ll see how it evolves, if it completely disrupts everything and it becomes a different industry or not, but I would say it’s a good idea to keep track of it.

Florian: All right, so what’s on your agenda for 2024? Any new products, new, I don’t know, acquisitions? Anything coming up?

Jonas: We have a couple of solutions that we have offered to the market this year. The Honeybee solution that you mentioned. We have something else called JourneyMate, which is essentially a customer experience tracking platform. We’re bringing those solutions and a couple of others together into a platform, so a SaaS based platform for language services. An AI language platform, if you like, but it’s not necessarily a localization platform. So it’s quite aligned with your previous question about how the industry is changing and sort of the convergence of these solutions, because this is more or less, it is taking those different previous accelerator solutions into a platform concept. And the theory is that generative AI and LLMs will be in use for sure. Not everyone, but most companies will want to use it before too long. So we bring this framework through APIs where they can access different options, right? So they can access OpenAI, Google, Llama from Meta and Bedrock from Amazon and so on through that framework and use the different tools that we’ve talked about, whether it’s content creation or MTQE or whatever LLM or generative AI-based accelerator they might want to use. But then the challenge with that then, is the human-in-the-loop factor, the feedback loop factor so that’s where the solution that I mentioned, the customer experience assessment tool, comes into play. So you can use the AI solutions, but then you have a human-in-the-loop through this customer experience assessment tool to provide a feedback loop and ensure that it is actually meeting requirements and providing good quality output.

Florian: And when you’re saying SaaS, but it’s not like website and you can get a subscription for 5.99 per month, or is it that? Or is it more kind of custom enterprise, give me a call and I’ll give you access to it?

Jonas: It will be both. I mean, we follow the typical platform pricing, that there will be a sort of lower cost access where you have access to certain features. But then of course, we work with clients that would require enterprise-type subscriptions as well.

Florian: Interesting, because that’s still a little rare in the kind of industry that established players have launched something where there’s a pricing page and people can kind of sign up in a consumer type way. I think a lot of companies have tried but kind of didn’t really fully commit to it. So yeah, all the best with that.

Jonas: It’s not going to replace the traditional language services work that we offer to our clients today, but it is definitely a response to what we see now in how the market is changing and people want to try it out. They want to access ChatGPT type solutions, even though it might be Llama or Bedrock, but they don’t know how, how to do it in a way that is secure for one and it’s also fine-tuned for their use cases, for their language and so on. So this can help with that, provide access to that and then give that feedback loop to have peace of mind regarding the quality output of it. So we’re looking to launch it early next year. We’re doing a soft launch probably before the end of the year. We’re already deploying some of these accelerators with clients and then we’ll see if it works. Much like you, I’ve seen others try in the past in the localization industry and it hasn’t always worked, but I think a lot has changed this year, so we’ll give it a shot.