Machine Translates Literature and About 25% Was Flawless, Research Claims

A neural machine translation (NMT) system was trained in the domain of literary translation and between one sixth and one third of its translations were indistinguishable from professional human translation – at least according to the people asked by researchers to evaluate the machine’s output.

During a panel discussion at SlatorCon Zürich in December 2017, NMT expert Samuel Läubli was asked when he thought NMT systems would be fluent enough to handle stylistic problems like irony. Läubli, who presented three reasons why NMT was a breakthrough that day, declined to forecast a timeline.

Yet it seems that by switching to neural networks machine translation is getting closer. Dr. Antonio Toral, Assistant Professor at the University of Groningen and Prof. Andy Way, Professor in Computing and Deputy Director of the EU’s ADAPT Centre for Digital Content Technology, filed a research paper on

Their research is titled “What Level of Quality can Neural Machine Translation Attain on Literary Text?”

The answer: NMT was “significantly better” than phrase-based statistical MT (PBSMT), and more importantly, “human evaluation… shows that between 17% and 34% of the translations… are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.”

Doing It over Again

Toral and Way have done this before in 2015, strictly on PBSMT and a smaller scale of training data and results analysis. Now, they did it again with neural MT. They said the availability of online ebooks and their translations as training data encouraged them to pursue the research.

They trained PBSMT and NMT systems with over 100 million words of in-domain training data (they used parallel, monolingual, and also out-of-domain datasets as well). They then set them to the task of translating 12 well-known novels published from the 1920s to the present day. Specifically:

  1. Auster’s Sunset Park (2010)
  2. Collins’ Hunger Games #3 (2010)
  3. Golding’s Lord of the Flies (1954)
  4. Hemingway’s The Old Man and the Sea (1952)
  5. Highsmith’s Ripley Under Water (1991)
  6. Hosseini’s A Thousand Splendid Suns (2007)
  7. Joyce’s Ulysses (1922)
  8. Kerouac’s On the Road (1957)
  9. Orwell’s 1984 (1949)
  10. Rowling’s Harry Potter #7 (2007)
  11. Salinger’s The Catcher in the Rye (1951)
  12. Tolkien’s The Lord of the Rings #3 (1955)

They chose English to Catalan translations for two main reasons. First, it was more challenging than their initial salvo in 2015 when they used PBSMT to translate Spanish to Catalan.

Second, Catalan is a mid-size European language with a lot of available training data but much room for future novel translations compared to other major languages if research shows that NMT is useful in assisting literary translators.

Orwell Seems Harder than Salinger

Toral and Way used automated BLEU (Bilingual Evaluation Understudy) scoring to compare the results of the PBSMT and NMT translations, but also made use of blind human evaluation from two native Catalan speakers with advanced English skills and a background in linguistics.

In automated BLEU scoring, NMT consistently outperformed PBSMT with an overall “11% relative improvement.”

For human evaluation, they assessed 10 passages of 10 contiguous sentences from three of the 12 novels translated: Orwell’s 1984, Rowling’s Harry Potter #7, and Salinger’s The Catcher in the Rye.

Their findings: “In all three books, the percentage of sentences where the annotators perceive the MT translation to be of equivalent quality to the human translation is considerably higher for NMT compared to PBSMT.”

“If NMT translations were to be used to assist a professional translator (e.g. by means of post-editing), then around one third of the sentences for Rowling’s and Salinger’s and one sixth for Orwell’s would not need any correction.”

Slator reached out to Prof. Andy Way regarding their research. Asked whether this same system would perform equally as well in marketing or other “more literary” domains that language service providers are active in, Prof. Way said “I worked in industry for three years building cutting-edge MT systems for a range of leading international companies, and the one area that I used to tell the sales team to avoid was marketing material, which requires more transcreation as a solution than translation per se. So I think this will remain an area where human translation/transcreation will continue to dominate.”

Asked if more and better in-domain training data can improve performance, Prof. Way was more optimistic: “I know of no examples where additional such data is not extremely useful, so yes, absolutely!”

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.

For expert analysis and insights on the current state-of-the-art in neural machine translation, purchase Slator’s Neural Machine Translation 2018 Report.