Toxicity in MT: Causes, Challenges, and Solutions

Welocalize - Toxicity in MT

In 2017, a Palestinian construction worker posted on Facebook his photo leaning against a bulldozer with the caption “yusbihuhum,” which means “good morning.” However, Facebook’s machine translation translated the word into “hurt them” in English or “attack them” in Hebrew. The Guardian reported that the police arrested him the next day and questioned him for hours until they realized their mistake.

While machine translation (MT) continues to advance and produce more accurate translations, it remains far from perfect. It still makes various types of errors, including critical ones with economic, legal, and safety consequences, known as toxic MT.

What Is Toxic MT?

Toxicity refers to “instances where the translation may incite hate, violence, profanity, or abuse against an individual or a group (a religion, race, gender, etc.) due to incorrect translations,” notes an Imperial College London study of critical errors in MT.

This toxicity error may originally be present in the source text or introduced in the translation when not in the source, referred to as “added toxicity.” This added toxicity can be because of mistranslation, such as an incorrect lexical choice. Or it can also be in the form of a hallucination, when the toxic element, such as a profane word, in the translated sentence doesn’t seem to have any corresponding elements in the source sentence.

Causes and Challenges of Toxic MT

Toxicity in MT happens for a variety of reasons. The confluence of the factors below has contributed to the occurrence of toxic MT.

Colloquial Language

There has been an explosion of user-generated content (UGC) on social media, online forums, and other digital channels. Accurate MT is challenging because UGC often uses colloquial language or slang. People posting on social media are not particular about what and how they write. Spelling mistakes, informal contractions, abbreviations, and grammatical errors are common, making it difficult for generic MT engines to understand and translate.

Offensive Language

Offensive language can increase the number of critical errors found in translation. Hate speech, racial and gender slurs, and extremist views are common in UGC. And these offensive words in different languages need to be detected and moderated. Machine translation may produce wrong translations when a sentence has many swear words and offensive words. MT engines use various strategies, such as literal translation, transliteration, omission, random translation, or substitution, which can be mixed up.

Symbols and Special Characters

Using symbols, emojis, and special characters, such as star signs and hashtags, adds context to sentences or disguises words. MT may overlook words that contain symbols and special characters. These are either left in the source language or are rendered with an incorrect meaning. So, when the content is translated, the original sentiment or intent may be lost.

Low-Resource Languages

The “No Language Left Behind” study commissioned by Meta showed that added toxicity ranges between 0% and 5% across languages and that languages with the most added toxicity are the low-resource ones due to a lack of training data. In addition, the content with the most added toxicity includes gender and sex, ability, and sexual orientation. It’s challenging to detect toxicity at scale for hundreds of languages, primarily less widely spoken ones.

Lack of Human Post-Editing

Automatic machine translation without any human post-editing is common, especially for UGC. The sheer volume of content that users share makes manual editing of machine-translated content impossible. With so much data posted in real-time, it becomes too expensive and too late to have human translators or reviewers correct machine-translated content.

Built-In Bias

Unfortunately, human biases can be incorporated into AI algorithms, whether it’s the intention or not. AI developers are subject to biases that affect how neural machine translation (NMT) systems are designed, scenarios are framed, and training data is labeled. This data can be skewed toward specific groups, resulting in non-inclusive and offensive translations.


NMT can be manipulated with prompts containing specific words, phrases, or alphanumeric symbols. A University of Melbourne study revealed that attackers could use back-translation to produce toxic words from an NMT system using “monolingual poisoning.” By inserting only a few words or sentences into the training data set of an NMT system, it will induce a specific, targeted translation behavior, such as peddling fake news.

How to Resolve Toxic MT

To combat the rise of toxic MT, the language services industry and other stakeholders must take a multipronged approach.

Machine Translation Literacy

Popular MT tools, such as Google Translate, have become so accessible and easy to use that people take translations at face value. However, users need to be critical thinkers and understand the limits of machine translation.

The language services industry must advocate for more widespread machine translation literacy through outreach programs and promotional campaigns for various stakeholders. These include students, researchers, companies, and professionals who deal with people who speak foreign languages.

Training Data

As training data for NMT systems can be a source of toxicity, organizations and language services providers (LSPs) must improve the development of training data sets. Data labels are critical to training AI algorithms. Labels can train models to recognize hate speech, offensive language, and fake news.

Using metadata, which is information that describes other information, allows LSPs and AI developers to tag data with relevant attributes. It improves the matches when using AI to translate, which makes translations more accurate. Metadata can include information about the source content, including the target audience, language, localized keywords, and purpose of the content. Descriptive metadata also identifies the level of formality of the source content and its reading level.

Post-Edited MT

There are still limits to NMT, even with more robust and reliable training data sets. This means human translators are required to do post-editing work on machine-translated content. Called MT+PE (or PEMT), machine translating plus post-editing is critical in detecting toxic translations missed out by algorithms.

Poorly machine-translated texts are automatically routed to human editors for translation quality assessment (TQA) if identified at the LQA and MT quality estimation stage. TQA is used to edit machine-translated content and improve the NMT system by incorporating correct translations into the training data.

Work With Welocalize

Welocalize can help you build high-quality, multilingual data sets. Contact us to find out