April 30, 2021 –
Automatic translation-quality evaluation metrics are indispensable for the fine-tuning of customized machine translation (MT) models as well as fundamental natural language processing (NLP) research. BLEU–a precision-based metric–remains the most popular. However, more accurate metrics, which consider precision and recall in addition to other factors such as hLEPOR, have demonstrated better correlation with human judgments. Previously, among the factors that prevented the wide use of more advanced hLEPOR was the lack of public Python implementation.
AI (Artificial Intelligence) developers from Logrus Global, in association with Lifeng Han, the main author of the original metric, have completed the Python port of the compound hLEPOR metric, as presented in the original article, and made it available to the entire Python development community via PyPi.org.
The hLEPOR is more precise with respect to the factors of precision, recall, sentence length and differences in word positions. Additionally, it allows per-sentence evaluation scores as well as document-level score (as opposed to BLEU) and is available free of charge. The uniform, single-source automatic baseline metrics are easily available to everyone, benefiting practitioners and researchers alike. Further improvements with the integration of deep learning language model technology into the metric are on their way too.
The library is available at https://pypi.org/project/hLepor/