Baidu Translate: The Inside Story

Artificial intelligence is on the rise in the world of machine translation. A string of recent news about tech giants bolstering machine translation engines with deep learning underscores just how central integrating deep learning into machine translation products has become for companies like Google and Microsoft. But what about Baidu, one of China’s largest internet pioneers?

Slator reached out to a representative of Beijing-based Baidu, who is authorized to speak for the company, to get an exclusive look at what the Chinese tech leader has in store for its translation technology.

100 Million Translations Daily

Baidu began R&D on Baidu Translate in 2010, launching the product in June 2011. The company felt that translation was in line with what their search users needed. While our respondent declined to give specific numbers, we learned that the Beijing-based Baidu Translate team consists of only “a few dozen people… mostly algorithm engineers and software developers.”

The team is part of the company’s Natural Language Processing (NLP) department under the management of Wang Haifeng, Baidu’s VP of Technology, who is himself a computational linguist by training.

According to the Baidu Translate representative “machine translation is one of the directions of the Baidu NLP team, which is one of the largest NLP R&D groups in the world. It covers most of directions in the NLP field, including morphological analysis, syntax parsing, semantic analysis, summarization, natural language generation, question answering, machine translation, dialogue modelling and so forth.”

Baidu Translate can translate 27 languages and growing, with the team intent on adding the languages spoken by countries that “have frequent exchanges with China, either economic or cultural.”

“Deep Learning technology performs much better than phrase-based machine translation” – Baidu Translate

The machine translation engine processes around 100 million requests every day. It also supports optical character recognition (OCR) similar to Google Translate and Waygo, and it can also “translate” objects. Point your camera at a plant and it will identify it as a “plant” and translate the word into your preferred supported language.

“Baidu Translate is based on deep learning technology, a completely different method [from MOSES], the most popular open source phrase-based machine translation system,” our respondent said, adding that it “performs much better.”

Other machine translation experts, however, see deep learning as complementing, not replacing SMT. As KantanMT CTO Tony O’Dowd put it in an email to Slator: “The opportunity to embed Deep Learning technology into machine translation systems, especially statistical systems, provides an augmented approach to increase translation quality for morphologically rich languages.

This technology is another incremental improvement in statistical systems to increase the quality threshold for languages that to-date have suffered lesser quality outputs stemming from the limitations of phrase-based approaches.”

baidu pic phone
Object Translation

Baidu reportedly made a breakthrough with deep learning in 2015, said our respondent, after six years of research.

Close to Human Translation in the Next Five Years

Baidu “devotes a lot of time and resources in developing artificial intelligence technology” and NLP, our respondent said. The company has been using deep learning in products such as the company’s digital secretary, Duer, their service discovery platform, Nuomi, and of course, their core search service: “Recently, Baidu has significantly improved the ranking of the search results based on deep learning models.”

In terms of deep learning-enabled machine translation, Baidu has started showcasing its potential through presentations of technologies like the conversant translation robot Xiaodu.

“In the next five years, (…) machine translators will be able to understand texts or human speech demands more accurately and achieve breakthrough translation results.” – Baidu Translate

According to our respondent, Baidu is not expecting direct revenue from Baidu Translate, but it is “now embedded in quite a number of products, including Baidu’s own products like Baidu Encyclopedia, Baidu Library, and the Baidu Browser.”

The company does offer commercial subscriptions priced per volume similar to Yandex, Microsoft, and Google. Furthermore, Baidu announced additional investments in deep learning and machine translation towards the end of 2015. They are also open to “all kinds of cooperation and business opportunities.”

Baidu admits to the complexity of translating human language. “As we know, machine translation is one of the difficult and complicated research areas of artificial intelligence.” Still, the company is optimistic about the future of machine translation quality – Baidu Translate, specifically.

“Although there are still gaps between machine translation and human translation, the quality of machine translation, in some specific domains, is already nearly as good as that of human beings. In the next five years, with the development of artificial intelligence quickly advancing, machine translators will be able to comprehend and understand texts or human speech demands more accurately and achieve breakthrough translation results.”

Image: Shutterstock