Google Admits Neural Machine Translation Can Fool Its Search Algorithm

Google usually hates automatically generated content, unless it is unable to tell the difference. Google Senior Webmaster Trends Analyst John Mueller said Google could possibly be fooled by machine translated content when it comes to ranking search results.

According to SERoundtable, Mueller was asked in one of their regular Google Webmasters Hangouts sessions whether Google will now rank automatically translated content as recent developments in neural machine translation (NMT) have made machine translated content much more fluent.

A participant in the Hangouts session asked:

“I’m seeing a lot of websites with auto generated content… ranking with domain name extensions which are very weird and very new. And all the content is basically autogenerate[d]. And here is the tricky question because most recently, black hatters are abusing the Google Translation API: is it possible that the Googlebot is fooled by its own services because it uses artificial intelligence to translate the content and the translations are getting better and better. So when somebody creates hundreds of pages of auto translated content, the Googlebot is fooled by this content and thinks that it’s human readable, [so] that’s normal content?”

“That could always be the case. We can’t exclude that possibility completely,” Mueller admitted.

AI-Powered SEO

When it comes to search engine optimization (SEO), automatically generated content is a no-no. This includes content sloppily run through a free translation service such as Google Translate, as Google itself clarified in 2015.

Google’s ranking algorithm has been fluidly changing for years now, and it has also accommodated the likewise evolving optimization measures the SEO industry has been using. The result: best practices like organic linking, deeply engaging content, and properly optimized webpages are rewarded, while shady “black hat” tactics like keyword spamming, link farming, and automatically generated spam are penalized.

Today, the top two factors that affect ranking in search results for Google are still backlinks and content. In 2015 to 2016, Google rolled out a ranking algorithm AI called RankBrain that actively learns search context, and it was supposedly ranked third in the factors that affect search engine results pages (SERPs). Optimizing websites for AI is a rather vague notion, however, so SEO industry experts focus on the concrete ranking factors, such as being mobile-friendly, ensuring efficient page load speed, using SCHEMA codes, etc.

So with Google’s AI-powered SERP ranking algorithms, the search giant can weed out poorly constructed, automatically generated content. One such method blackhat SEO uses is to translate massive volumes of content in another language.

So what happens when machine translated content has become fluent enough to fool even Google’s own ranking algorithms?

It All Boils Down to Content

“I think that’s something that has both pros and cons,” Mueller said in the Hangouts session, “in that it might be used by by sites that are essentially spamming content.” On the other hand, Mueller said “it could also be used by sites that are legitimately providing translations on a website and they just start with the auto-translated version and then they improve those translations over time.”

Ultimately, Mueller concluded “it’s more a matter of the intent.” If machine translated content is used for automatically generated spam, then Google will still penalize the offending website. This begs the question, of course, of whether Google’s ranking algorithm can still detect content that underwent NMT.

“[Neural Machine Translation] could also be used by sites that are legitimately providing translations on a website and they just start with the auto-translated version and then they improve those translations over time” —  John Mueller, Senior Webmaster Trends Analyst, Google

In a nutshell, Google Translate has pit itself against Google’s ranking algorithm. While this may seem like a novel way of gauging the quality of NMT for SEO use—test whether Google thinks it is automatically translated or not—it is worth reiterating a fundamental tenet in SEO: optimize for people, not search engines.

Even if automatically translated content can fool Google, it might still be off-putting to target users. As for how fluent NMT has become, it has outgrown current quality evaluation measures. Meanwhile, MT certainly has a place in ecommerce and marketing, as a recent economic research paper noted: “the introduction of a machine translation system has significantly increased international trade [on eBay], increasing exports by 17.5%.”

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.