Builds on Harvard Framework to Run Neural MT at Scale

Three major trends shaping the language technology space converged at in what is likely a harbinger of things to come in the language industry.

There is the ever increasing demand for local language content, access to cheap and virtually unlimited computing power in the cloud, and a proliferation of open-source neural machine translation frameworks.

Travel fare aggregator and lodging reservations website leveraged all three of these drivers to build a production-level neural machine translation (NMT) system, which the company says “is becoming a very attractive solution to complement the traditional human translation services.”

A team working on the project announced the system’s launch in a research paper published in the Cornell University-run open access science site on July 25, 2017.

Co-authors Pavel Levin, Nishikant Dhanuka, Maxim Khalilov, all executives, said the research focused on benchmarking NMT against its own statistical machine translation (SMT) system earlier deployed for two important language pairs: English-German and English-French, as well as two other general purpose online engines (statistical and neural).

“We present automatic and human evaluation results of the translation output provided by each system. We also analyze the effect of sentence length on the quality of output for SMT and NMT systems,” the co-authors wrote in the paper’s abstract.

The paper received the Springer European Association for Machine Translation (EAMT) Best User Paper Award at the 20th Annual EAMT Conference held in Prague in May 2017.

Bold Claims

Using BLEU as the primary automatic metric of translating quality evaluation, which was validated with human Adequacy-Fluency (AF) evaluation, the research claims that it has three main findings: NMT technology consistently outperforms SMT; in-house NMT is better than online general purpose NMT engine in the English-German language pair; and, in what is bound to cause tempers to flare, “fluency of NMT is close to human translation level.”

Citing earlier studies that showed significant drops in translation quality when NMT translates long sentences, the researchers also tested how sentence length will influence NMT and SMT performance in the English-German and English-French pairs, using BLEU.

“Fluency of NMT is close to human translation level”

Their observation is twofold: “performance is degraded for longer sentences, but NMT still outperformed SMT in the English-German pair. A similar trend was observed in the English-French translations.”

Significantly Cutting Translation Costs

The keen interest of in machine translation is hardly surprising as 1.4 million room nights are reportedly reserved on the platform every day. Headquartered in Amsterdam, the company also has many local offices around the world.

The paper revealed that the company offers content in 40 different languages. “One of the main use cases for translation at is translating property descriptions (hotels, apartments, B&Bs, hostels, etc.) from English to any of the other supported languages,” the paper noted.

By integrating these in-house MT solutions, the company believes that it can increase translation efficiency.

According to the paper, this can be achieved “by increasing the speed of translation and reducing the time it takes for a translated property description to appear online, as well as significantly cutting associated translation costs.”

10 Use Cases developed its NMT in-house over a period of six months, excluding time spent on statistical MT development and testing, according to Maxim Khalilov, Commercial Owner of and one of the co-authors of the report.

Khalilov told Slator the development framework is OpenMT, a Torch-based solution by Harvard University and supported by Systran.

In an interview with Slator in December 2016 when OpenMT was launched, Harvard Natural Language Processing (NLP) Group’s Alexander Rush said they “expect a mix of researchers studying how to improve translation and people in the industry looking to become familiar with new AI technology.”

When asked if runs NMT on its own servers, Khalilov said the company relies mostly on Amazon Web Services (AWS), though it uses its own graphical processing units (GPUs) for some experiments.

Moving forward, the research paper noted that’s future research direction is set on improving its in-house NMT system further in two ways — improving treatment of unknown and rare words, and improving the ability to identify business sensitive translation errors.

“We’ve identified around 10 use cases for machine translation within the company and we will iteratively focus on them according to our list of priorities,” Khalilov added.

For expert analysis and insights on the current state-of-the-art in neural machine translation, purchase Slator’s Neural Machine Translation 2018 Report.