Unbabel’s New xTOWER LLM Explains Translation Errors and Suggests How to Fix Them

Unbabel’s New xTOWER LLM Explains Translation Errors and Suggests How to Fix Them

In a June 27, 2024 paper, researchers from Unbabel and Instituto de Telecomunicações introduced xTOWER, a large language model (LLM) designed to generate “high-quality” explanations for translation errors and use them to suggest improved translations.

The researchers explained that machine translation (MT) systems, despite their strong performance, often produce translations with errors. “Understanding these errors can potentially help improve the translation quality and user experience,” they said. 

Built on top of TOWERBASE — an LLM designed, trained, and optimized for MT-related tasks —, xTOWER offers detailed, human-readable explanations for translation errors and suggests corrections based on this analysis.

Specifically, the process involves inputting a source text and its translation into xCOMET, which annotates the translation with error spans and assigns a quality score. The complete input (i.e., the source text and its translation), the annotated translation, and the quality score are then passed to xTOWER, which generates explanations for each error span and proposes a new corrected translation based on these explanations.

Ricardo Rei, Senior Research Scientist at Unbabel, discussed with Slator how xTOWER addresses certain challenges encountered since the release of COMET. The first challenge is understanding the quality score and the specific translation errors highlighted. While xCOMET partially addressed the first point by providing annotations with minor, major, and critical labels, it lacked explanations about the nature of these errors. xTOWER can “enhance the interpretation of xCOMET outputs,” as Rei mentioned in a recent Tweet, thus offering “more insightful and detailed quality reports.”


The second challenge is fixing identified errors. Inspired by Chain of Thoughts (CoT) reasoning, xTOWER performs automatic post-editing (APE) using the quality score and annotations. 

xTOWER is designed to function independently of reference translations. Moreover, it is “agnostic about the source of error spans” and can handle errors obtained either manually through human annotation or automatically through tools.

Understanding Translation Errors 

The researchers asked expert translators to assess xTOWER’s explanations based on their relatedness to error spans and their helpfulness in understanding the nature of the errors and improving translation quality. 

They found that xTOWER improves error interpretability by providing explanations that relate to the identified errors. The researchers highlighted that “xTOWER can improve the interpretability of machine translation outputs in an automatic process.”

Expert translators endorsed xTOWER’s explanations as “helpful for understanding translation errors and generally useful for improving translations.” The researchers found that the corrections suggested by xTOWER improve the overall translation quality of the original translations across all language pairs, especially when the initial translations are of low quality.

They also evaluated the quality of xTOWER’s corrected translations by comparing them against other LLMs, such as GPT-3.5 Turbo, Mixtral 8x7B, and TOWERINSTRUCT 13B. xTOWER outperformed TOWERINSTRUCT 13B and Mixtral 8x7B, but not GPT-3.5 Turbo. However, they found that xTOWER can better leverage error spans and explanations to fix errors, compared to GPT-3.5 Turbo.

The researchers highlighted xTOWER’s “potential towards not only producing plausible and helpful explanations of automatic translations, but also leveraging them to suggest corrected translations.”

Hybrid Approach

The researchers also proposed a hybrid approach that dynamically selects between using the original translation or querying xTOWER for a correction.

In this hybrid approach, the COMET or xCOMET quality score of the original translation is calculated. If the quality score is above a certain threshold, the original translation is retained, and there is no need for xTOWER to provide a corrected translation.

However, if the quality score is below that threshold, xTOWER provides a corrected translation. In this case, if the quality score of the corrected translation is higher than that of the original translation, the corrected translation is chosen. Otherwise, the original translation is kept.

Rei explained to Slator that this hybrid approach mirrors a machine translation post-editing (MTPE) workflow, where post-editing is performed only when a quality estimation (QE) model confirms that the translation needs improvement.

The researchers suggested that this hybrid approach can significantly improve translation performance and reduce inference costs by only querying xTOWER when necessary.

Authors: Marcos Treviso, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, José Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, André F. T. Martins