3 months ago
November 27, 2018
Six Domain-Adaptive NMT Systems Evaluated on Medical Domain (English to German)
Berkeley, California, Nov. 26, 2018 — Intento (https://inten.to) has evaluated six domain-adaptive Neural Machine Translation systems for English-to-German translation (Medical domain) and published the report.
There are more than 20 different Machine Translation systems available on the market. They differ in features, price and the performance on the specific projects. Intento, which provides a middleware to use Machine Translation in a vendor-agnostic fashion, uses their solution to choose the right combination of MT models for every case. In their previous report, published in July 2018, they demonstrated that nine different stock MT models should be combined to achieve the optimal performance on 48 popular language pairs in the General domain.
In 2018, several MT vendors presented their Domain-Adaptive NMT solutions, which provide an ability to customize the baseline NMT model to a specific domain using a small parallel corpus (typically starting from 10,000 segments). Such systems reportedly offer more benefits compared to the stock models (source1, source2), but also require a more thorough evaluation.
In their new report, Intento showcases the approach used in MT evaluation projects for its LSP and Enterprise clients. They address performance, cost of ownership, dataset size requirements, data protection, and several other questions.
They trained six domain-adaptive NMT systems using English-to-German Biomedical corpora of several sizes (from 10K to 1M segments), evaluated them and compared to stock MT engines.
Most notably, this report demonstrates a hybrid approach to MT quality assessment, with automatic scoring used to filter a set of segments for manual LQA, making the whole process much faster, more affordable and reliable than either automatic or manual LQA alone.
“We have a breakthrough moment in Machine Translation right now, probably the biggest one since the invention of Neural MT. With the domain adaptation, every LSP and enterprise may use their content assets to obtain high-quality MT models tailored to their business, with a cumulative effect on cost and turnaround.”, said Konstantin Savenkov, CEO and co-founder of Intento. “The stakes are high as we’re speaking about two-digit percentage improvements and it’s something must not be missed on this highly competitive market. To get those benefits, one needs to have clean data and to use the right portfolio of MT engines. Our tools and evaluation framework help with the technological challenges, with linguistic and domain expertise to be provided by a client”.
With the goal of full transparency and reproducibility, Intento runs the evaluation on a public UFAL Medical Corpus and describes every step of the process. It is available for free to all interested parties at https://bit.ly/custom_nmt_nov2018.
For further information, please contact Konstantin Savenkov, Co-Founder & CEO of Intento (firstname.lastname@example.org)