Meta Reveals How ‘Toxic’ Machine Translation Can Be

Meta Toxicity in Multilingual MT at Scale

Machine Translation (MT) is not perfect yet with MT systems still producing various types of errors. Some of them are venial, while others can be critical given the negative impact they may have on end users. But metrics for automatic or human evaluation do not necessarily distinguish between the two.

“Not all machine mistranslations are of equal scale of severity,” wrote Khetam Al Sharou and Lucia Specia from Imperial College London in a recent paper. “Mistranslating a date or time in an appointment, mistranslating a number or currency in a contract, or hallucinating profanity may lead to catastrophic consequences for the users,” they said.

In the same paper, Al Sharou and Specia defined critical errors as “instances of translations where the meaning in the target text deviates drastically from the source text where such translations can be misleading and may carry health, safety, legal, reputation, religious or financial implications.”

In the Findings of the WMT 2021 Shared Task on Quality Estimation, Specia et al. proposed a taxonomy for critical errors. According to the authors, one of the main categories of critical errors is “toxicity.”

What Is Toxicity?

Toxicity refers to the usage of words or phrases that “induce offensive utterances and bad sentiments,” according to a 2022 Meta study

Toxicity may be present in the source text or it can be introduced in the target text — described as added toxicity.

SlatorCon Remote June 2024 | $ 180

SlatorCon Remote June 2024 | $ 180

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

“MT systems should be able to translate any source content adequately regardless of the domain or register, which includes translating language that may be regarded as toxic,” while at the same time “they should remain faithful to the source content, and should not add through the translation process any elements of toxicity that are not found in the source,” explained the authors.

Toxicity detection — both for user- and machine-generated content — has lately received significant attention, according to the same study. Different approaches can be used to detect toxicity based either on wordlists or machine learning (ML) techniques.

Quantifying Toxicity

Most recently, researchers from Meta and the Polytechnic University of Catalonia (Universitat Politecnica de Catalunya) focused their research on toxicity; added toxicity in particular. They analyzed a large evaluation dataset to quantify the amount of added toxicity.

More specifically, they translated the HOLISTICBIAS dataset from English into 164 languages. This dataset is a new, more inclusive dataset consisting of 472,991 English sentences. It has nealy 600 descriptor terms across 13 different demographic axes: ability, age, body type, characteristics, cultural, gender and sex, nationality, nonce, political ideologies, race and ethnicity, religion, sexual orientation, and socioeconomic class.

They then used the word list-based automatic toxicity detection method proposed by the paper of the NLLB Team et al. The word list includes items from the following toxicity categories: profanities, frequently used insults, pornographic terms, frequently used hate speech terms, some terms that can be used for bullying, and some terms for body parts generally associated with sexual activity. In order for a sentence to be labeled as toxic, it should contain at least one entry from the corresponding language’s toxicity word list.

Given that toxicity word lists are context-independent and seldom exhaustive, the authors also performed a human evaluation — human annotation on false positives and false negatives — on a subset of eight directions (i.e., Kinyarwanda, Basque, Spanish, French, Western Persian, Catalan, Simplified Chinese, and Traditional Chinese) in order to confirm the prevalence of added toxicity.

The analysis revealed that added toxicity varies from 0%–5% across languages, while the languages with the most added toxicity tend to be low-resource ones. Moreover, the demographic axes with the most added toxicity include sexual orientation, gender and sex, and ability.

“In the future, we want to explore if the amount of toxicity in the training data may play a bigger role in correlation with added toxicity,” the authors said.

They also observed that much of the added toxicity can be due to mistranslations, hallucinations, and the stability of translations in different contexts.

“Given these findings, our recommendations to reduce added toxicity are to curate training data to avoid mistranslations, mitigate hallucination and check unstable translations,” the authors concluded.