Four major institutions in the European Union have developed a new machine translation (MT) engine that will be released worldwide. The source code was also made available to the developer community on GitHub.
The open source project, called ModernMT, received a EUR 3m grant from Horizon 2020, the EU’s framework program for research and innovation, and was released in beta in the fourth quarter of 2016.
Built as a ready-to-install application, the software seeks to address the remaining hurdles that still hinder the adoption of MT technology by end-users, notably language service providers (LSPs) and enterprises.
Currently, it is available as a plug-in for various computer-assisted translation (CAT) tools, including SDL Trados Studio. The full version of the enterprise-grade software is set for release in the fourth quarter of 2017.
“We set out to solve four main problems when we started MMT,” says Marcello Federico, Scientific Coordinator of the project. MMT does not require any initial training phase, manages context automatically (thus, eschewing the building of domain-specific systems), enables scalability of data and users, and creates a data collection infrastructure.
Europe’s Machine Translation A-Team
For what it is set out to achieve, Alessandro Cattelan, VP Operations at Translated.net, one of the four partners in the project, says the core strength of MMT is its learning and adaptation speed from the user data. “You just need to feed it some translation memory and/or translations to have it learning and adapting to a specific domain,” he explains. “This adaptivity is what yields the best results for our engine.”
“You just need to feed it some translation memory and/or translations to have it learning and adapting to a specific domain” — Alessandro Cattelan, VP Operations, Translated.net
The company that Cattelan represents, along with CEO Marco Trombetti and software engineer Davide Caroselli, is Translated.net, a pure-play online provider of professional translation services that has been integrating technology into the human translation process since it was founded in 1999.
Trombetti, a serial entrepreneur and investor, defines the product strategy and commercialization of MMT, while Caroselli, a Senior Java Developer, is the main product developer of MMT.
Built from the ground up, MMT also had the backing of the University of Edinburgh (UEDIN) in the UK, which offers one of the most comprehensive translation studies and degree programs in Europe and is a force in machine translation research.
Ulrich Germann and Barry Haddow, both Senior Researchers at the University, were in charge of evaluation and testing of MMT.
Another renowned institution behind MMT is the Fondazione Bruno Kessler (FBK). Marcello Federico, Director of the HLT Machine Translation Research Unit, told Slator that a team of people with different experiences and skills, and strongly focused on the same goal are crucial to the success of the project.
Federico and Nicola Bertoldi, Senior Researcher at FBK, represented the organization in the MMT project and worked as the Scientific Coordinator and main technology developer, respectively.
“We felt our core strengths really complemented each other and it made sense to try another ambitious project together” — Marcello Federico, Director, HLT Machine Translation Research Unit
Completing the MMT project team is Netherlands-based language industry think tank TAUS, which has been at the forefront of the advocacy for innovation, open platforms, and cross-industry cooperation in the language industry.
In the MMT project, TAUS is responsible for the data collection infrastructure and the commercialization of the product.
The MMT project participants have known each other for years. FBK, Translated, and UEDIN worked together on a previous project funded by the European Commission: MateCat, which was one of the most successful projects of the EC’s FP7 Fund.
“We felt our core strengths really complemented each other and it made sense to try another ambitious project together,” says Federico.
Meanwhile, Translated had been working with TAUS for many years and the team needed a partner with a strong background in data collection and management, both from a technical and legal point of view.
Kick-off in Rome
The MMT team officially kicked off the project with a meeting in Rome in January 2015. This was followed by several co-development workshops in Trento, at FBK’s premises in Northern Italy, and other meetings in Rome.
But while the team has a common goal, Cattelan says there were different perspectives on the product development. “Translated had a stronger focus on getting the product done, even with some compromises in terms of research needs. The academic partners were, instead, pushing for a more accurate development and probably longer cycles to solve specific problems,” he says.
These differing opinions and perspectives are probably necessary and gave the project just the right kind of push.
Don’t Panic, It’s Just Neural
As Slator reported, neural machine translation (NMT) quickly became the No. 1 buzzword of the language industry in 2016.
The team admits that when Google unveiled its Neural Machine Translation (GNMT) system in September 2016, claiming major breakthroughs in translation quality, the team had to pause and re-think its own strategy.
“That’s when the NMT hype hit us real hard and some of us (Davide, in particular) started having some doubts about the feasibility of the project,” Federico reveals.
Federico, however, says that after the brief bout of panic, they had a group of translators evaluate GNMT versus phrase-based MT output for a Chinese to English translation. “The result was very far from human quality,” he says.
“The Google announcement forced us to think earlier about how to transfer our ideas of real-time learning and adaptation to the deep learning framework” — Marcello Federico
Federico adds that, eventually, they understood that their own unique approach had still lots of value. “Adaptive MT is still ahead of Neural MT in many areas. The Google announcement made us think about how to merge the two technologies (Neural + Adaptive) and on the need to do it sooner than we had expected,” he says.
“With the advent of deep learning, machine translation is entering a new era. I believe that we are still in a transition phase. Neural MT is giving us new powerful tools, but it will take some time until we will understand and use them at their full potential. In fact, there is still a lot of exploration going on in research, with new ideas coming out almost every week,” says Federico.
He adds, “Real-time learning and adapting Phrase-based MT is still ahead of Neural MT in many use cases. The Google announcement forced us to think earlier about how to transfer our ideas of real-time learning and adaptation to the deep learning framework. We have this on our roadmap now.”