Yandex Puts Machine Translation at the Core of New AI Strategy

On February 7, 2017, Yandex announced it had set up a Machine Intelligence and Research (MIR) division and appointed Misha Bilenko to head it. Until December 2016, Bilenko had been the Principal Researcher and Group Manager at Microsoft, where he had worked for over a decade.

Yandex is often referred to as the “Russian Google” and its Founder and CEO Arkady Volozh, compared to Google’s Larry Page. The dominant search engine in Russia has also benefited from Google’s woes.

Advertisement

Case in point, when the country’s antitrust body ruled against Google and slapped it with a USD 6.8m fine last year for not opening up Android enough to local rivals (i.e., Yandex). Alexander Shulgin, Yandex COO, was quoted as saying that the company’s share on Android devices “started to grow again…as a result of new distribution deals” that followed the ruling.

Be that as it may, experts expect Yandex to be forever fighting an uphill battle with Google; not only in the device arena but, more crucially, in the milieu of machine translation.

We reached out to Yandex, whose spokesperson immediately echoed the Google ethos of “it always becomes a matter of getting enough data,” claiming Yandex is “one of few companies in the world with access to enough data to meet today’s quality standards for machine translation.”

Yandex’s free translation service sees upwards of 10 million requests per day—Yandex spokesperson

According to the Yandex website, machine translation is one of four major machine learning areas that make up the MIR division; the others being image recognition, voice recognition, and a machine learning platform.

Combining capabilities to create an additional functionality, the company began testing image translation via Yandex.Translate. They also integrated speech into their machine translation service, the spokesperson said, so users can run speech-to-translate and listen to translations. The same source said Yandex’s free translation service sees upwards of 10 million requests per day.

Beats Google on Price

The company’s paid service, Yandex.Translate API, charges USD 15 per million characters for 0–50 million characters per month, which scales down to as low as USD 6 after the 500 millionth character.

Google charges USD 20 per million characters for 0–1.5 billion characters per month, which drops to USD 15 per million characters when 1.5 billion is breached. Microsoft, meanwhile, charges USD 10 per million characters.

The Yandex.Translate team

More on Neural Soon

According to the Yandex spokesperson, they started integrating neural-network-based models into Yandex in mid-2016, but their primary underlying technology is still statistical. The source promised that Yandex will “have more to share” on neural models moving forward.

Asked about the new division, the source said MIR is “a centralization of existing teams into one unified division,” and, aside from Bilenko’s hiring, its creation did not create any immediate vacancies. In short, more internal reorganization than massive build-out.

“Yandex.Translate is based at Yandex’s headquarters in Moscow, which is one of 17 Yandex offices worldwide. More than half of the Yandex.Translate team is made up of developers, in addition to analysts, testers, project managers, and other support specialists,” the spokesperson said.

Uphill Battle

Language industry researcher Konstantin Dranch, who runs the Russian language industry website translationrating.ru, predicts Yandex will aim for leadership in a free public MT. He told Slator, “They have many channels to promote it (browser, portals), so eventually they are likely to emerge larger than [competitor] PROMT, but will always fight an uphill battle with Google.”

Yandex narrowly beats Google in search volume with a 55.2% share

As it is, Yandex is just narrowly beating Google in terms of search volume with a 55.2% share based on stats from LiveInternet.ru, as cited by Dranch.

On Yandex’s other competitor, ABBYY, Dranch said their MT “is not publicly available, although the baseline technology, Compreno, received a lot of investment and was well publicized three years ago.”

Why Chechen When You Can Do Klingon?

Dranch noted how Yandex offers many services that mirror Google’s: maps, online storage, marketplace, website analytics suite, and so on.

“If you have a Yandex e-mail, a Yandex browser, do your search on Yandex, and check news there on a daily basis, it’s only natural to use Yandex Translate and not Google Translate,” he pointed out.

How does Yandex fare against Google in machine translation quality for EN-RU? Both do equally bad or good—Konstantin Dranch

Asked for his opinion on how Yandex fares against Google in terms of machine translation quality for English⟷Russian, he quipped both “do equally bad or good.”

He added, “The battle for consumers is fought on the level of marketing and the ability to embed the offer into the user’s apps and devices. Google has an advantage because they control Android OS.”

As for Yandex’s opportunities in MT, Dranch’s advice is to add engine-training capabilities similar to Microsoft’s Translator Hub. “This will allow them to create corporate engines with specialized terminology and expand into the B2B space,” he explained.

Dranch said both Yandex and Google already offer the big ex-CIS languages, such as Ukrainian, Kazakh, Tajik, and Georgian. Although there are many smaller languages and Yandex has an advantage there, he said that, in terms of traffic and financial benefit, he does not see them making a huge difference as “80–90% of all queries are Russian to English, French and German anyway.”

Perhaps Yandex will want to add Chechen before Klingon

Head of Business Development at PROMT Julia Epiphantseva told Slator that most ex-USSR languages for the B2C segment are already covered. As far as B2B, specifically using MT in government or legal, she pointed out, “Online services have nothing to offer because those organizations usually need offline translation solutions.”

According to Dranch, the next biggest language is Chechen with over a million speakers. “Perhaps Yandex will want to add it before Klingon, Quenya, and Dothraki. Then again, perhaps not,” Dranch said referring to Yandex’s quest to beat Google at offering obscure languages.

On whether Yandex’s decision to fold Yandex.Translate into their broader MIR division, Dranch concluded, “Machine learning has many applications, not only translation — and it makes sense to give the new unit with a great hire a little more power.”

Editor’s Note: A previous version of this article attributed the quotes from Yandex to Misha Bilenko, Yandex Head of Machine Intelligence and Research

Marion Marking

Communications specialist, veteran journalist, and online editor at Slator who dreams of driving a Veyron on the Autobahn