Machine translation quality estimation seeks to compare the quality of machine translations with reference translations using evaluation metrics. A team of researchers now says that despite the existence of several automatic metrics for this purpose, a unified and user-friendly approach to utilizing these metrics has been missing all along.
As Bram Vanroy, Arda Tezcan, and Lieve Macken from Ghent University explained in a 2023 research paper, similar platforms exist — such as Asiya Online, an interactive BLEU score evaluator by Tilde MT, MT-ComparEval, and MutNMT — but they are either not maintained or offer limited functionalities.
MATEO (MAchine Translation Evaluation Online), a project developed at Ghent University by the Language and Translation Technology Team (LT3), aims to bridge this gap by providing an easily accessible web interface that incorporates a diverse set of automatic, reference-based MT evaluation metrics, including both established and cutting-edge methods.
This project aims to make automatic MT evaluation accessible to both experts and non-experts, such as researchers, MT system builders, teachers, students, and even individuals from Social Sciences and Humanities. “By providing accessible evaluation tools, MATEO can streamline and simplify the MT research process, contributing to advancements in translation technology,” said Vanroy, Tezcan, and Macken.
MATEO’s open-source web interface will be hosted on the CLARIN.eu infrastructure. The tool uses a general-purpose evaluation framework called “evaluate” by Hugging Face, expanded with additional MT evaluation metrics tailored to the project.
Users can evaluate single-sentence and multi-system machine translations using metrics like BLEU, ChrF, TER, BERTScore, BLEURT, and COMET. The interface also provides bar-chart visualizations and allows users to download evaluation results in Excel format for further analysis.
MT Evaluation in Translators’ Training
As the project continues to evolve, it seeks to not only streamline MT research but also become “an instructional resource for educators and students because it emphasizes the importance of evaluating language resources,” according to the authors.
MATEO has already been applied in MT classes at Ghent University, where students used the tool for assignments to improve their MT evaluation skills. Their feedback was gathered to improve the user experience of the interface, and their suggestions will be incorporated into future versions of the tool.
Improvements planned for MATEO include enabling file uploads for system-wide evaluations, separating the translation and evaluation components, updating the translation engine, and evaluating and incorporating promising metrics from the WMT22 Metrics Shared task. Additionally, visualizations for edit operations and various export options will be added to the interface.
Another noteworthy contribution of MATEO lies in enhancing MT literacy among non-expert users. By providing an accessible evaluation tool, it empowers non-expert users to evaluate machine-generated translations with ease. This critical approach to MT not only ensures accurate results but also encourages users to think critically about the implications of relying on MT systems for their specific tasks, topics, or domains.