Fuzzy Matches Can Refine Machine Translation by GPT-3, Other Large Language Models

Improve Machine Translation via GPT-3 Large Language Model

Across the language industry, OpenAI’s ChatGPT has inspired plenty of discussion. Topics of interest include its “thoughts” on translation; its “capabilities” as a project manager; and its performance in machine translation (MT), at least compared to commercial MT products

ADAPT Centre’s MT researcher Yasmin Moslem co-authored a January 30, 2023 paper that goes one step further. She and her colleagues conducted several months of experiments to find out whether GPT-3 — the large language model (LLM) behind ChatGPT — is capable of enhancing “adaptive MT,” defined as simultaneously improving the quality of new translations based on user feedback.

User feedback can take several forms, including corrections to previous translations, terminology and style guides, and ratings of translation quality. 

The paper, Adaptive Machine Translation with Large Language Models, also explored a less common method of adaptive MT: learning from similar translations (i.e., fuzzy matches) found in approved translation memories. In particular, researchers were interested in real-time adaptation.

“Instead of asking the model to translate a sentence or providing random examples, it turns out that showing the model 1-10 domain-specific translation pairs similar to the sentence to be translated can improve the translation of the new sentence immediately,” Moslem wrote in a January 31, 2023 LinkedIn post, adding that this method is “especially useful for high-resource languages.”

For low-resource languages, the team was able to use fuzzy matches to improve translations from “stronger” MT systems, such as those from DeepL and Google.

Researchers extracted sentence pairs similar to each segment in a test dataset of 3,070 segments covering five language combinations (English into Arabic, Chinese, French, Kinyarwanda, and Spanish). 

They found that few-shot translation with GPT-3 using fuzzy matches resulted in the highest-quality translations — to a point.

“At some point, there might be diminishing returns of adding more similar sentences,” the authors conceded. “Increasing the number of fuzzy matches from two sentences to three, four, five, and 10 sentences incrementally improves translation quality, but with smaller quality gains.”

Putting Results into Context

For certain language pairs, GPT-3’s adaptive MT with fuzzy matches was able to outperform more standard encoder-decoder, Transformer-based MT models: for English into French and Spanish, just five fuzzy matches were needed; for Chinese, at least 10. 

Results for Arabic and Kinyarwanda were not on par with those for other language pairs, since GPT-3.5 mostly supports high-resource Latin-based languages. 

Authors attributed the disparity to limited support (Kinyarwanda) and issues with GPT-3’s tokenizer (Arabic).

Overall, though, researchers described the results as “very promising,” and noted that users might adopt one of several “pipelines” in production based on the level of support an LLM offers for a given language pair.