A senior Google employee has said the search giant treats AI-generated content as “spam” — and this could possibly extend to machine translations (MT).
According to John Mueller, Senior Webmaster Trends Analyst at Google, automatically generated content (or AI content) is considered “spam” because it violates Google’s Webmaster Guidelines.
Mueller spoke on the topic during a Google SEO Office Hours hangout on April 1, 2022, in response to a question on writing tools powered by GPT-3. As previously mentioned, GPT-3 was, at one time, the world’s largest language model. (Slator recently featured another model, called Pathways Language Model (PaLM), also by Google.)
So why does content generated by large language models (LLM) like GPT-3 violate Google guidelines?
According to Google, AI content is generated programmatically. Therefore, because there is a high possibility that AI content can be used to manipulate search rankings (i.e., SEO) rather than help users, Google may take action.
Mueller pointed out that Google’s position on AI content has always been clear “since almost the beginning” — content created with GPT-3-based writing tools falls into the category of automatically generated content.
He explained, “People have been automatically generating content in lots of different ways. And for us, if you’re using machine-learning tools to generate your content, it’s essentially the same as if you’re just shuffling words around, or looking up synonyms, or doing the translation tricks that people used to do.”
Mueller added, “My suspicion is that maybe the quality of content is a little bit better than the really old-school tools. But for us, it’s still automatically-generated content, and that means for us, it’s still against the Webmaster Guide. So we would consider that to be spam.”
Cookie-Cutter Pages
Google also flags pages with little or no original content, or what it regards as “thin content”; that is, content with little or no added value, created by site owners to improve their page ranking and attract visitors. The search giant will also take action against domains that try to rank by merely showing scraped or cookie-cutter pages that do not provide users with substantial value.
Does the same rule apply to GPT-3-generated, multilingual content? Very likely.
In another Google SEO Office Hours hangout, Mueller advised, “If you use an automatic translating tool and you just translate your whole website automatically into a different language, then probably we would see that as a lower quality website because often the translations are not that great. But if you take a translation tool and then you rework it with maybe translators who know the language, and you create a better version of that content, then that’s perfectly fine.”
In short, Google frowns upon whole websites with multilingual, automatically-generated content, but is perfectly okay when there’s a human in the loop.
It is interesting to see what the Senior Webmaster Trends Analyst had to say about the use of Google Translate in generating content back in 2018, when Mueller and the rest of the world were just getting their bearings following the sweeping impact of machine translation.
Google Algo Limits
When asked whether Google can tell the difference between human-created and AI-generated content, Mueller declined to answer definitively and would only say, “If we see that something is automatically generated, then the web spam team can definitely take action on that.”
As it currently stands, Google’s algorithms are unable to automatically detect content generated by GPT-3 and other LLMs. At least for now, automatic detection is virtually impossible and would require human effort to go through LLM-generated content manually.
There are times, however, when Google’s algorithms can become overly zealous — as reported by one user who complained that the search engine was not indexing translated content. Mueller said Google can sometimes treat the translated version as a mere copy of the original and, thus, not worth indexing.
Gary Illyes, a Webmaster Trends Analyst from Google, in a Twitter thread on how GPT-3 underperforms, provided a funny example of why Google doesn’t want machine translated content in their index. (Check out Illyes’s making the case for PEMT below.)
fwiw gpt-3 underperforms compared to current translation models, it was just not designed for that. and even for (short) text generation, while it’s really really impressive, the majority of its output is gibberish (60-70%, cf. Sam)
— Gary 鯨理/경리 Illyes (@methode) August 15, 2020
i should’ve mentioned that curated (human reviewed) is fine
— Gary 鯨理/경리 Illyes (@methode) August 17, 2020
In the same April 2022 Office Hours hangout, Mueller admitted that, maybe, over time, AI content will evolve “in that it will become more of a tool for people, kind of like how you would use machine translation as a basis for creating a translated version of a website — but you still, essentially, work through it manually.”
Google, the search engine used by three-quarters of the world’s online population, has spoken — and human translators and content creators aren’t disappearing from the loop anytime soon.