A Quick Primer on Edit-Distance, a Key Metric in Post-Editing Machine Translation

The language services industry has discussed “edit distance” for years. But as machine translation (MT) quality improves, it has moved from the fringes to the center. In many content areas and text types, MT is now adopted as the “first round” of translation. More and more linguists find themselves working on post-editing machine output rather than translating source texts from scratch.

Post-editing MT (also known as PEMT) is a skill in and of itself; and the resulting translations have even been found to have their own unique features. One major challenge in post-editing is determining a fair rate for the service. Should it be based on the word count of the source text (i.e., the word count of the initial MT output), paid for by the hour, or calculated by some combination of these and other factors? In this context, edit distance provides a useful metric to inform a more standardized approach to post-editing compensation.

Edit distance has been explored since the 1990s by organizations such as the US Department of Defense as a metric used to evaluate the quality of machine translated text. It may also be used to calculate the productivity of a post-editor.

MT output is the starting point and the final version of the target language is the end point. Edit distance is the minimum number of changes required to get from the original MT output to that final version of the text.

Changes can include additions, deletions, substitutions, and (sometimes) changes in position, and can be calculated at the character level or the word level. The absolute edit distance (i.e., the number of changes from initial text to post-edited text) is divided by the maximum number of words in the text between the initial and final versions to provide a percentage of how much the text has changed.

The concept behind edit distance is, in some ways, at the core of contemporary MT quality scores, such as BLEU and TER, which compare an original MT output to a final version. 

It also plays a role in automated spellcheckers, in which suggestions are words in the dictionary with the smallest edit distance from a given misspelled word. Some major translation productivity (a.k.a. CAT) tools, such as memoQ and Memsource, now have edit distance features integrated into their workflows, allowing project managers to see how many changes each translator or reviewer makes in a document.

For all its usefulness, though, edit distance is probably not the silver bullet that will solve the challenge of appropriately tracking and pricing human-machine work in language services.

One major drawback is that it tracks only the number of changes made, and not the time spent working. If post-editing effort depends on the domain of the text, the editor’s experience, the client’s demands, and the quality of the initial MT, then relying solely on edit distance to measure the post-editor’s effort will not provide the full picture.

A more comprehensive calculation, then, will combine edit distance and other relevant factors. Until then, it is likely that researchers will continue to investigate what makes a good post-editor, and linguists will continue to champion hourly rates.