How Well Can GPTs Translate?

How Well Can GPTs Translate

How well can GPTs translate? The answer depends, in part, on whom you ask. It also depends — drumroll for the classic translator joke — on the context.

In other words, the quality of GPTs’ machine translation (MT) output varies based on a number of factors, among them content type, subject matter, and source and target languages. (The same caveats also happen to apply to non-GPT MT systems.) 

Generative pre-trained transformers, or GPTs, are the large language models (LLMs) that underlie user-facing products. OpenAI’s ChatGPT is among the best-known.

Despite being designed to handle general NLP-related downstream tasks, GPTs have impressed many in the translation industry thanks to the relatively high quality of their MT output. 

Herein lies the question — not just how well can GPTs translate, but how well can they translate compared to other available systems, especially those designed for translation? The answer to this question will determine how users choose to engage with MT. 

One such use case is the April 2023 decision by Canadian newswire service TheNewswire to integrate GPT-4 into its workflow, to automate English-Quebecois French translation of press releases.

Academics and companies’ R&D departments aim to quantify and standardize comparisons of GPT output and that of MT systems. 

A January 2023 paper from Tencent AI Lab found ChatGPT to perform “competitively” on high-resource European languages, with poorer results for low-resource languages. 

One possible strategy around this limitation might be “pivot prompting” for distance languages: asking ChatGPT to translate source text into a high-resource pivot language before the target language. 

Commercial systems also outperformed GPT on biomedical abstracts and Reddit comments, but GPT did well with spoken language.

“With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted, becoming comparable to commercial translation products, even for distant languages,” the researchers wrote. 

Better Together

In February 2023, Microsoft published a paper that came to much the same conclusion and suggested a hybrid approach, in which users could combine GPT models with other MT systems. 

Developers are already considering this hybrid method in their work. In an OpenAI community forum, a developer explained in November 2023 that he needed to include a translation feature in an app, and wanted help deciding between Google Translate, DeepL, and OpenAI i.e., GPT).

“I’m inclined towards DeepL but since I’m already using OpenAI for text generation I’m wondering if it’s better to translate directly within OpenAI call,” he wrote.

Responses varied, but a good number chimed in with praise for GPT, including descriptions calling it “phenomenal” and the “best option short of actual bilingual human translators.”

An example of a hybrid workflow, supplied by another respondent, is using a combination of DeepL (or another MT system) and GPT. 

“In terms of accuracy, DeepL is better than GPT-4, but GPT-4 is sometimes better from the readability point of view,” they wrote, adding, “Having a two-step process with DeepL (pure translation) + GPT-4 (improving fluency) is working well.”