Academia, research and industry experts from Tilde, University of Edinburgh, and Unbabel have improved the very popular automated neural translation toolkit Marian. Now everyone can enjoy its on-the-fly domain adaptation of translation memories, terminology integration for MT, and improved GPU efficiency. These new features cut costs and post-editing efforts of end users, and boost translation accuracy.
First introduced in 2017, Marian is the fastest neural machine translation framework to date that is widely used in various academic, commercial, and governmental organizations across the globe, including the World Intellectual Property Organization, the European Commission, the US Air Force, eBay, and Microsoft. With the new toolkit features and improved GPU efficiency, this machine translation tool is even faster and more efficient than before.
Improved Marian toolkit – considerable cost reductions
The cost of neural MT models significantly increases the cost of automated translations due to the very expensive hardware. Improved computation efficiency of this toolkit was achieved by code optimization and close collaboration with NVIDIA and will considerably cut costs and make it more accessible to language service providers.
Improved Marian toolkit – user-defined factors
Various factors can be used to encode token metadata in a sentence via additional vocabularies and embeddings rather than relying on learned word or sub-word embedding representations for the required information. These factors have a myriad of applications, such as terminology integration or information on capitalization, sub-word splitting, and morphology. Models with these factors are now fully supported by the Marian toolkit for the source and target, and also feature a usage guide and other documentation.
Domain adaption technology – less post-editing effort
Though custom MT systems have been popular with large companies and translation agencies for quite some time, they are often unfeasible for individual translators or smaller businesses. Moreover, there is a lack of domain-specific data for less-resourced languages and niche domains to train custom engines. On-the-fly domain adaptation solves this problem by closing the quality gap between generic and custom MT systems. The adaptive machine translation engine directly learns from human post-edits to deliver more accurate, and domain- and project-specific translations as you keep working on the remaining part of the sentence. Iterative learning from human feedback has a demonstrated ability to significantly reduce the post-editing effort of machine-translated texts.
Terminology integration – much better translation accuracy
This project has also made another contribution: a dynamic terminology integration for the current MT systems, which has been a hot topic for quite some time. Though terminology integration has existed for more than a decade, until recently, it was not supported by the latest neural MT technologies. This functionality improves translation quality by integrating bilingual terminology dictionaries (glossaries) that provide information on how specific words and phrases should be translated to any MT system.
Experiments with morphologically rich Northern European (Scandinavian and Baltic) languages demonstrate that terminology integration dramatically improves MT quality in technical domains.
A human evaluation campaign carried out by professional translators demonstrated a remarkable 29% improvement in absolute translation accuracy over generic MT engines for technical domains. These experiments were conducted with morphologically rich Northern European languages, where the new system is also able to correctly inflect glossary terms.
The new features are already available in the latest releases of the Marian framework and commercial solutions relying on it to be enjoyed by various end-users and languages service providers. Tilde MT has already incorporated the on-the-fly domain adaptation in its services (Tilde MT Dynamic Learning), giving translators an adaptive engine that can adjust to various domains, projects, and customers. The latest terminology improvements are also included in Tilde’s MT and are available for a free trial.
“User-Focused Marian” project is co-financed by the European Union Connecting Europe Facility to improve the current Marian toolkit. Grant agreement under the Connecting Europe Facility (CEF) – telecommunications sector agreement no INEA/CEF/ECT/A2019/1927024