Google Partners with Welocalize to Evaluate its Adaptive Translation LLM Solution

Google Partners with Welocalize to Evaluate its Adaptive Translation LLM Solution

Findings show a potential path to higher-quality LLM outputs, which could result in improved accessibility to quality translation outside of traditional workflows. Specifically, Google’s adaptive translation feature outperforms other tested methods in fluency and customizing writing style to the customers preferences. 

Google, a leader in the field of artificial intelligence (AI) and machine translation (MT), has made significant strides in applying large language models (LLMs) to language translation tasks. Their journey in AI, from their evolution from neural machine translation (NMT) to recent developments like Imagen and PaLM to now adaptive translation with LLMs, showcases their long-standing leadership in the field.

Welocalize, in collaboration with Google, conducted a case study to evaluate the effectiveness of Google’s adaptive translation LLM solution. Welocalize is one of the first language services providers to conduct studies on these models.

The joint study evaluates early stages of adaptive translation —the latest feature addition to Google’s Translation API Advanced. Adaptive translation is an integrated API method that works in concert with a Large Language Model that Google fine-tuned for TextTranslation. The result provides customers with a quick and easy method to optimize a translation output to better fit customers’ styles and use cases in real-time. Findings show a potential path to higher-quality LLM outputs, which could result in improved accessibility to quality translation outside of the traditional workflows.

Adaptive Translation and Large Language Models 

The introduction of LLMs has revolutionized machine translation, making it more flexible and context aware. Google capitalizes on this technological advancement by integrating generative AI models specifically fine-tuned for translation use cases into their Translation API. These models, which are specialized versions of Google’s foundation models, are designed to optimize translations for specific customer needs and are available in Public Preview through the Translation API Advanced.

Welocalize Translation-AI-Portfolio
Source: Translation AI portfolio presented by Google

Comments Mikaela Grace, Head of AI/ML Engineering at Welocalize, “Our joint research efforts with Google reflect our dedication to AI-based innovation in the localization space. This work on adaptive translation with LLMs represents an exciting potential path to higher-quality LLM outputs, which could result in improved accessibility to quality translation outside of traditional workflows.”

Study Objective

Welocalize’s study aimed to benchmark the quality of Google’s adaptive translation LLM solution against Google’s custom and generic MT systems. It involved selecting existing models with small data sets across various language pairs and content types, ensuring a fair comparison by maintaining equal data volumes in Adaptive and AutoML approaches. The study involved customizing three iterations in Google Adaptive via preview with different data sizes and conducting a detailed human evaluation.

Results and Findings 

The adaptive translation method excelled in accuracy, fluency, style, and locale convention, with fewer critical errors and better client style customization. However, traditional MT models like AutoML performed better in terminology and tags (e.g., html). Notably, the adaptive translation method using a smaller example set achieved the best overall adequacy and fluency scores, while the larger adaptive dataset strongly outperforms AutoML in accuracy, fluency, and style. 

Google’s Adaptive solution is ideal for content types with minimal client-specific terminology, low data for customization, and a focus on style. Traditional MT models, particularly AutoML, outperformed adaptive models in client-specific terminology, so it may still be a better choice for technical writing. 

Detailed Analysis

  • Adaptive models with 20K and 3.5K data sets scored highest in fluency.
  • The generic model had the lowest fluency scores.
  • AutoML excelled in terminology, showing the fewest errors.
  • AutoML had the highest number of accuracy errors, while adaptive models with lesser data had the least.
  • The second adaptive model (20K data set) performed best in adapting to style.
  • Adaptive translation does not require any prompt-engineering and will be more easily integrated into existing workflows.
Description: One of the human evaluation methods applied to evaluate and compare the five models was Adequacy & Fluency using a Likert scale of 1-5, where 1 is a terrible translation and 5 is a perfect translation.

Conclusion and Implications 

This comprehensive study demonstrates the effectiveness of Google’s adaptive translation LLMs in specific use cases requiring high fluency and style adaptation while also highlighting areas for improvement in handling client-specific terminology. 

Adaptive translation is a feature of Translation API Advanced, which many translation management systems (TMS) and CAT tools integrate with to access Google’s translation models for their customers. Adding it as a feature to the API enhances the ease of use in integrating it into workflows. TMS and CAT tool developers need to update their plugins to incorporate the new feature so customers can take advantage of this advancement. 

Comments Elaine O’Curran, Senior AI Program Manager at Welocalize, “General content, and especially marketing and creative content, could do very well with adaptive models. Until the terminology adaptation improves in future releases, we don’t recommend adaptive translation for terminology-heavy projects, such as user guides or help articles, that have a high concentration of UI references and other client-specific terminology.”

Adds O’Curran, “The maximum allowed data size for adaptive models is 30K sentences. This is a relatively small dataset for our traditional custom MT models. The original premise for the evaluation is that traditional MT doesn’t perform well with small datasets, and adaptive translation would be a good alternative. The results are so encouraging; however, adaptive translation may also replace traditional MT with large datasets. We plan to test this further with marketing content, for example.”

Continuously benchmarking and assessing the performance of leading and current MT solutions against LLM alternatives is crucial to help corporations stay ahead of the curve and reap the benefits of emerging GenAI solutions for their global content. This innovation will make high-performative and tuned MT more broadly available.