Ghent University’s MATEO Project Improves Access to Machine Translation Evaluation

The exterior of Ghent university building in Belgium

Machine translation quality estimation seeks to compare the quality of machine translations with reference translations using evaluation metrics. A team of researchers now says that despite the existence of several automatic metrics for this purpose, a unified and user-friendly approach to utilizing these metrics has been missing all along.

As Bram Vanroy, Arda Tezcan, and Lieve Macken from Ghent University explained in a 2023 research paper, similar platforms exist — such as Asiya Online, an interactive BLEU score evaluator by Tilde MT, MT-ComparEval, and MutNMT — but they are either not maintained or offer limited functionalities.

MATEO (MAchine Translation Evaluation Online), a project developed at Ghent University by the Language and Translation Technology Team (LT3), aims to bridge this gap by providing an easily accessible web interface that incorporates a diverse set of automatic, reference-based MT evaluation metrics, including both established and cutting-edge methods.

The project has received funding support from the European Association of Machine Translation and the Bridging Gaps initiative, ensuring its sustained efforts and progress.

This project aims to make automatic MT evaluation accessible to both experts and non-experts, such as researchers, MT system builders, teachers, students, and even individuals from Social Sciences and Humanities. “By providing accessible evaluation tools, MATEO can streamline and simplify the MT research process, contributing to advancements in translation technology,” said Vanroy, Tezcan, and Macken.

MATEO’s open-source web interface will be hosted on the infrastructure. The tool uses a general-purpose evaluation framework called “evaluate” by Hugging Face, expanded with additional MT evaluation metrics tailored to the project. 

Users can evaluate single-sentence and multi-system machine translations using metrics like BLEU, ChrF, TER, BERTScore, BLEURT, and COMET. The interface also provides bar-chart visualizations and allows users to download evaluation results in Excel format for further analysis.

MT Evaluation in Translators’ Training

As the project continues to evolve, it seeks to not only streamline MT research but also become “an instructional resource for educators and students because it emphasizes the importance of evaluating language resources,” according to the authors.

MATEO has already been applied in MT classes at Ghent University, where students used the tool for assignments to improve their MT evaluation skills. Their feedback was gathered to improve the user experience of the interface, and their suggestions will be incorporated into future versions of the tool. 

Improvements planned for MATEO include enabling file uploads for system-wide evaluations, separating the translation and evaluation components, updating the translation engine, and evaluating and incorporating promising metrics from the WMT22 Metrics Shared task. Additionally, visualizations for edit operations and various export options will be added to the interface. 

MT Literacy

Another noteworthy contribution of MATEO lies in enhancing MT literacy among non-expert users. By providing an accessible evaluation tool, it empowers non-expert users to evaluate machine-generated translations with ease. This critical approach to MT not only ensures accurate results but also encourages users to think critically about the implications of relying on MT systems for their specific tasks, topics, or domains.