A Recipe for Better Machine Translation

TAUS Machine Translation

We are still missing out on the full opportunities of machine translation despite seventy years of research. This article is an appeal to everyone involved in the translation ecosystem to come off the fence and realize the full benefits of MT and how to utilize MT-centric translation strategies. We can do better!

Today, most MT is sourced from the big tech companies such as Amazon, Google and Microsoft. They are the driving force behind the industrialization of MT with the scale and the capital to develop the massive models. 

Disturbingly enough, the massive MT models are black boxes. Even the researchers who train them can’t pinpoint exactly why one performs better than the other. The model work is glamorous and cool, but the intellectual insight that would allow us to reproduce bugs and remove them is hard to get. To get models to work in production, data engineering is more vital than research. Well-executed data engineering can bring in the nuances that are required for robust performance in a real-world domain. The issue, however, is that most researchers like to do the model work, not the data work, as also pointed out in the Google Research article titled Data Cascades in High-Stakes AI

Customization has become inherent in many MT platforms allowing users to upload translation data and handle their own data engineering. These features, however, as TAUS found out, require a lot of experimentation and experience.* In-domain training data have unpredictable, often low, and sometimes even negative impact on the performance of the engines. It seems that the big tech companies treat their customization features as stop-gap measures for the time it takes until human parity is reached. Five to ten years? 

To support and facilitate the industrialization of MT, the big tech MT developers can do better. This is how:

1. Don’t bet the future entirely on the brute force of the massive models 

2. Improve your customization features to better support your business customers in building production-ready engines.

MT Users

Although nothing spectacular or revolutionary took place in the past few years, the adoption of MT has still increased. The MT engines are simply plugged into the existing workflows to be used as complementary sources for translation matches. Translators see their tasks shifting more and more into post-editing. The new technology is used primarily to help the business drive for continuous efficiency gains and lower word rates, very much so in the tradition of thirty years of leveraging translation memories.

Blue-sky thinking is what we miss in the translation industry overall. Apart from a few start-up innovators, a defensive approach towards MT technology is adopted by most of the actors in the translation industry. The result is a general negative sentiment with emphasis on cost reductions, compromises in translation quality, disruption in the workforce and pessimistic perspectives on the industry’s future. The problem is that we are all so deeply rooted in our traditions, we can’t see through the present.

MT technology can be a force multiplier for those operators in the translation industry that are capable of shifting from a defensive to a proactive approach.

To support and facilitate the industrialization of MT, MT users, LSPs and enterprises can do better. This is how:

1. Focus on data engineering. Do not accept that the quality output of, among others, the Amazon, Google, Microsoft and Systran engines is as good as it can get. Significant improvements can be made using core competencies such as domain knowledge and linguistic expertise.

2. Design end-to-end MT-centric workflows. Do not think of MT as just an add-on to your current process and workflow but make it the core of new solutions serving new customers, translating content that was never translated before.

3. Provide new opportunities for linguists. Post-editing is not the end-game. Create new perspectives by leveraging intellectual insights for better automation.

TAUS Recipe for Better MT

TAUS has been an industry advocate for translation automation since 2005. We have developed a unique recipe for better MT as outlined below.

1. Evaluate

The first step in every MT project is to measure and evaluate the translation quality. Most MT users are just measuring and comparing the baseline engines. TAUS takes the evaluation a step further. We train and customize different MT engines and then select the engine with the maximum achievable quality in the customer domain. See TAUS DeMT™ Evaluate.

2. Build

The second step is the creation of in-domain customer-specific training datasets, using a context-based ranking technique. Language data are sourced from the TAUS Data Marketplace, from the customer’s repositories or created on the Human Language Project platform. Advanced automatic cleaning features are applied. See TAUS DeMT™ Build.

3. Translate

The third step is then generating the improved machine translation. Improvements demonstrated show scores between 11% and 25% over the baseline engines from Amazon, Google and Microsoft. In many cases, this brings the quality up to levels equal to human translation or post-edited MT. Some customers refer to DeMT™ Translate as ‘zero-shot localization’, meaning that translated content goes directly to customers without post-editing. TAUS offers DeMT™ Translate via an API to LSPs and enterprises as a white-label product. 

* MT customization features require a lot of experimentation and experience. See TAUS DeMT™ Evaluation Report and contact a TAUS expert to learn how to best work with MT customization.