He Said, She Said: Addressing Gender in Neural Machine Translation

Artificial intelligence technology has run into a potentially delicate issue: gender bias. In November 2018, mainstream news media reported that Google’s automatic suggestion tool for Google Mail will not suggest gender-based pronouns to avoid autocompleting a sentence with the wrong gender.

The feature (called Smart Compose) will avoid suggesting genders because, as Gmail Product Manager Paul Lambert put it, “not all ‘screw-ups’ are equal…[gender is] a big, big thing.” Google Translate, which now largely runs on neural machine translation (NMT), had also recently addressed the question of gender bias.

He Is a Doctor, She Is a Nurse

On December 6, 2018, Google published a first blog post about its efforts to reduce gender bias in Google Translate. A few days later, on December 10, the Google AI blog went on to provide more details.

Google Translate had previously offered one translation for queries, which reflected the gender bias of the underlying training data. Translations would generally skew toward masculine pronouns for words like “strong” or “doctor” and feminine ones for “beautiful” and “nurse.”

To address the issue, Google updated its translation framework so single word queries from English into French, Italian, Portuguese, or Spanish would provide both masculine and feminine translations. The same applied for Turkish to English phrase translations where the source was gender-neutral.

Longer phrases or full sentences, meanwhile, required a more complex process. So complex that Google had to make “significant changes” to its translation framework. Gender-neutral phrases or sentences are identified via a new machine-learned process, while masculine and feminine translations are produced through two more steps that involve adding gender attributes to the training data and filtering out rejected translation suggestions.

Google claims this new NMT system can “reliably produce feminine and masculine translations 99% of the time.”

No One Solution Fits All

Eva Vanmassenhove, PhD student at Dublin City University and team member of ADAPT Centre, noted some shortcomings in the Google approach.

“It is not translating from [the aforementioned] languages into English that is problematic, but the other way around. Different languages have different ways of expressing gender and it is important to realize that there won’t be one solution that fits all,” Vanmassenhove told Slator.

Vanmassenhove has been active in machine translation research since 2015, and has studied the gender translation problem before. She pointed out, “Even context-aware NMT systems that can take some context into account while translating would have a hard time getting this right, as (cross-genre) gender prediction, especially for languages that do not mark gender explicitly (such as English), remains an unsolved task.”

‘I am a nurse’ will still be given feminine translations while ‘I am a surgeon’ will result in masculine ones.

While Google’s system generally works for the aforementioned languages, according to Vanmassenhove, the limited language coverage means that in French and Spanish, for instance, ‘I am a nurse’ will still be given feminine translations while ‘I am a surgeon’ will result in masculine ones.

“Even more problematic is the following set of translations: ‘I am beautiful’ is translated into the male form in Spanish. However, ‘I am a beautiful surgeon’ is translated into a female form,” Vanmassenhove said.

“Neural networks’ strength to learn patterns and associations turns out to be its weakness too, and these types of errors and biases are particularly hard to notice and fix,” she added.

Vanmassenhove qualified, however, that Google’s blog post indicates that they are just at the first stage of reducing gender bias in machine translation.

SlatorCon Remote June 2024 | $ 180

SlatorCon Remote June 2024 | $ 180

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Exaggerating Biases

Touching on the wider implications of the gender translation issue, Vanmassenhove said recent research “has shown that neural models do not just reflect controversial societal asymmetries but ‘exaggerate’ them — I can imagine situations where such deviations could have a negative impact for certain groups of people.”

She offered this example: “Let’s say a search engine or a selection algorithm uses an MT system internally. How can we make sure we aren’t eliminating many perfectly good candidates or hits just because a gender-neutral term gets translated from one language into a male / female variant in another language?”

Google’s initial solution is not only limited in linguistic scope, but also only accounts for gender bias in training data.

“I believe de-biasing [the training data] has its value; but, as biases might appear on many levels (gender, race, age, minority groups), I can’t help but wonder how we would practically go about removing all possible biases,” Vanmassenhove said.

She recounted some difficulty with her own work due to similar issues. “For my Master’s thesis, I worked on clustering Dutch words together. I remember being ashamed to present the results of my clustering techniques as some of the clusters obtained were simply racist, specifically towards some minority groups in Belgium and the Netherlands. Limiting de-biasing to simply scrubbing off ‘gender’ might not be enough.”

She concluded that biases, gender included, are “significant concerns” because it is not immediately understood how MT algorithms perpetuate them, and they often go unnoticed because “neural algorithms are very good at delivering what they think we want to see.”

“Scrubbing gender-biases is a good starting point, but more measures need to be established to deal with similar problems in an appropriate way,” she said.

Slator thanks Eva Vanmassenhove and Professor Andy Way of DCU and ADAPT for their assistance.