Here’s a New Dataset for Evaluating Metaphorical Language in Machine Translation

Evaluating MT Quality of Metaphorical Langu

In a June 19, 2024 paper, researchers from the University of Sheffield, the University of Waterloo, the University of Manchester, the University of International Business and Economics (UIBE), and the tech company 01.AI introduced a multilingual dataset for evaluating machine translation (MT) quality of metaphorical language.

This new dataset aims to fill a gap in MT evaluation by focusing on the complexities of translating metaphors, where the intended meaning differs from the literal interpretation.

Metaphorical expressions pose significant challenges for MT systems because their meaning extends beyond individual words. As the researchers highlighted, “metaphor translation is more challenging than literal translation.” 

Despite the fact that “metaphorical expressions are widely used in daily life for communication and vivid description,” the challenge of accurately machine translating them remains largely unaddressed due to resource scarcity and difficulties in handling the variation in linguistic forms and cultural norms inherent in metaphors.

To address this challenge, the researchers created the MMTE (Metaphorical Machine Translation Evaluation) dataset, marking the “first manually annotated multilingual metaphor translation evaluation corpus.” This high-quality corpus includes sentences with metaphorical and literal expressions in English, Chinese, and Italian, along with reference translations.

“MMTE is the first work to systematically investigate how translations are affected by metaphor in a fine-grained and multi-lingual setting,” the researchers said.

By providing this dataset, they offer a way to test MT models’ performance on metaphorical language, rather than just overall translation quality. 

Evaluation Framework

The researchers explained that traditional MT evaluation methods focus on fluency and factual accuracy, often neglecting the quality of figurative language translation, even though the “appropriate use of metaphor has been shown to dramatically improve user satisfaction.”

To that end, they proposed “the first systematic human evaluation framework for metaphor translation.” This framework allows for the evaluation of MT outputs in terms of their metaphorical expressions, enabling a more comprehensive analysis of their effectiveness in capturing the nuanced meaning conveyed by such expressions.

According to the researchers, the evaluation should focus on four key areas:

  • Metaphorical equivalence — evaluates how well the figurative meaning of the source language is preserved in the target language translation.
  • Emotion – assesses how effectively the translation conveys the emotions intended by the original metaphorical expression.
  • Authenticity — measures the naturalness and appropriateness of the translated metaphorical expressions in the target language.
  • Overall quality — considers the fluency, coherence, and fidelity of the translation to the original text, providing a holistic assessment of the translation’s effectiveness.

This framework can also be adapted for automatic metric design for metaphor translation. However, the researchers have so far only provided thoughts on designing these automatic metrics rather than presenting ready-to-use. They plan to develop these automatic metrics further in future work. 

Authors: Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin