How do you compete with that? Foreign language service providers (LSPs) trying to make a buck in the already cut-throat Chinese market face a formidable competitor. As Bloomberg reported on March 16, 2017, Chinese Internet giant Baidu is offering clients human translations at rock-bottom rates.
In December 2016, according to Bloomberg, Beijing-based Baidu had thousands of translators work out of the company’s offices all across Mainland China, translating 15 hours a day for an entire month. The same report said Baidu now continues to regularly stage such mass translation events all year round.
Baidu leverages one of China’s traditional competitive advantages: Mobilize a large number of people in a short time to achieve a very specific goal. That goal for Baidu is to get its hands on as much parallel translation data as possible.
Parallel translation data (i.e., high quality sentence or phrase pairs) has become a prized asset as the world’s leading tech companies jostle for leadership in the burgeoning field of artificial intelligence.
Baidu now continues to regularly stage such mass translation events all year round
According to Bloomberg, while “new AI products aren’t contributing much to Baidu’s bottom line yet,” China’s leadership has realized it cannot afford to miss the boat on AI and designated Baidu as its national champion.
And because understanding and processing natural language is a core challenge in developing practical AI applications, translation has become one of the key yardsticks for big tech to measure progress.
As Slator reported in early 2016, Baidu has been working on neural machine translation (NMT) for some time. Baidu’s Chief Scientist Andrew Ng actually claimed in early 2017 that China was first in pioneering NMT.
According to a September 2016 Science magazine article quoted by Bloomberg, Google’s Chinese-English corpus numbers 500 million words. In turn, Baidu’s makes do with 100 million for the same language pair. As a point of reference, language industry think tank TAUS claims its platform contains 70 billion words in 2,200 language directions.
China has such a large population and this method could attract a lot of people to quickly contribute to what Baidu is aiming for — Henry Wang, Executive Vice President of UTH International
Shanghai-based UTH International, meanwhile, has invested heavily over the past few years compiling a vast trove of parallel data. According to a November 2016 press release, the company’s “collection includes 2.6 billion Chinese-English translation units and 1.8 billion English-other-language units.”
Henry Wang, Executive Vice President of UTH International, told Slator his company has not yet sold any data to Baidu. Asked about Baidu’s labor-intensive collection method, Wang said, “It’s not the best way, but it’s effective. After all, China has such a large population and this method could attract a lot of people to quickly contribute to what Baidu is aiming for. I believe Baidu is taking other measures as well in the meantime.”
LSPs operating in China can only hope Baidu is not offering their clients a sweetheart deal in exchange for data.