MQM Council Releases Multi-Range Theory of Translation Quality Evaluation with New Scoring

MQM Council Logo

The new MQM update offers a breakthrough. Translation Quality Evaluation (TQE) is the cornerstone of any translation and localization process, and it has only grown in importance with the advent of machine and neural translation. In an effort to standardize TQE efforts, the MQM (Multidimensional Quality Metrics) framework was developed and has been expanded by a community of translation quality experts.

MQM provides a model for analytic Translation Quality Evaluation (TQE). It was first introduced just before the arrival of NMT, and originally published as a deliverable of the EU-funded QTLaunchpad project. Since 2018, the widely used DQF subset of MQM has been improved by the MQM Council and updated to become MQM Core, together with its expanded MQM Full variant.

The original metrics have been widely represented by scorecards that only feature a Raw Scoring Model. The Raw Scoring Model had drawbacks: the score that the model provided was not very human-readable, not adaptable, and scores were hard to compare and use due to variations in the threshold levels between different metrics. 

The new Linear Calibrated Scoring Model enables implementers to create metrics that are comparable across different content types, use cases, and service levels. Calibrating the metric involves setting quality thresholds that are relevant to client expectations and specific use cases. This approach is reflected by error tolerance limits that are much easier to understand and more flexible, making the PASS/FAIL decision clearer and more human-readable. 

In addition to the MQM Error Typology itself, the MQM Council now offers two separate scoring models, the Linear Calibrated Scoring Model for medium-sized text samples and the Non-Linear Scoring Model for very large samples. 

Although in the past many adopters have used the Linear Scoring Model, it has failed to provide the same consistent results for texts that were either very large or very small. The non-linear model takes into account the fact that human perception changes throughout the process of content consumption. Human tolerance for errors falls sharply with the increasing size of the sample. This means that perception of the rater regarding content quality may become more subjective over time and may diverge from actual statistical TQE results achieved using a scoring formula. The non-linear scoring model is based on the standard MQM analytic approach and typology, but it introduces a logarithmic function to define the score, reflecting this non-linearity of human perception. The Non-Linear Scoring Model can produce accurate scores across a wide range of sample sizes, from small ones to infinity. 

The MQM Council has also tackled the problem of low Inter-Rater Reliability among human linguists evaluating very small samples (such as one sentence). For very small samples, the MQM Council recommends methods used in Statistical Quality Control (SQC). 

Today, we are pleased to announce that after 18 months of close collaboration, the MQM Council working group has developed and published a full and detailed paper – and suggested methods for their resolution depending on the sizes of the TQE samples. 

The paper is freely available on ArXiv at: 

In particular, this paper explains why it is impossible to assess the translation quality of one sentence as a sample and suggests the SQC method as the solution. SQC does not provide a quality rating as such, but rather it provides a risk assessment, judging the probability of producer’s and consumer’s risk.

The MQM Council hopes that this work will engender both new research and promote changes in the TQE processes, enabling linguistic experts to work faster, and the clients to have real faith in the results of evaluation. 

To spread and share the news, research on quality measurement (including AI), reviews, opinions, comments, the work of colleagues, and developments in the MQM universe, Tte MQM Substack newsletter is established. 

Subscribe to the MQM Matters Substack by visiting this page:!