How Metadata Is Staging the Next Revolution in AI-Enabled Translation Workflows

AppTek launches a machine translation system

A recent Gartner report states that by 2025, 75% of translation work will move away from creating translations to machine translation post-editing (MTPE or PEMT). Driving this transition are advances in artificial intelligence (AI), which has grown by leaps and bounds in recent years.

AI now enables a wider range of new use cases for machine translation (MT) than ever before, and the Covid-19 pandemic only accelerated MT uptake across several sectors. Progress was notable in the more creative industries, such as Media & Entertainment and Gaming, because of (a) greater demand as most of the world was in lockdown; (b) there was simply more room to grow, as much of the translation across these sectors was (and still is) done completely manually.

Aside from Media & Entertainment and Gaming, MT has naturally gained much traction over the past year in E-commerce, E-learning, Digital Marketing, as well as the Life Sciences and Pharmaceutical sectors.

And yet, many enterprises still remain unaware of just how powerful AI-enabled translation services have become; and, thus, they fail to leverage MT to achieve greater cost-savings and efficiencies across the production workflow.

Moreover, the same report noted a tendency for translation-related, decision-making and processes within the enterprise to be “siloed and disconnected from a broader enterprise globalization and localization strategy.”

Why Enterprises Fail to Take Full Advantage of MT

In one word, metadata. Take, for example, subtitling as a use case. Media and game localizers may not be taking full advantage of the cost and efficiency benefits offered by MT because translation work is still constrained by the limits of popular black-box MT systems. This was highlighted in a recent white paper by AI language technology provider, AppTek, on how machine translation uses metadata to transform subtitling workflows.

Across the broader range of industries, translation and localization providers that deploy MT off-the-shelf will likely find that —

  • It lacks customization for a specific domain; or, if customized, it needs to be further tailored to suit specific client tones of voice, etc.
  • It may not be easily integrated into certain translation productivity (a.k.a. CAT) tools.
  • It needs further integration of glossaries or termbases.
  • It lacks control of MT output around factors such as length, gender, etc.

Even after a language service provider (LSP) customizes MT out of the box, that LSP may find they need high-quality MT for 10 different domains or genres. That means having to train, deploy, and maintain 10 separate systems, even when some remain idle for extended periods.

“The result is a high environmental footprint and increased costs. There is also a risk of ‘overfitting’ the training, making it so specific to a particular domain that its performance is worse with different data than it otherwise would be,” the paper explained.

How to Make AI & MT Work for You

For the translator wanting to adopt machine translation (MT), audiovisual localization expert, Dr Yota Georgakopoulou, recommends 10 questions translators must ask themselves before engaging in an MTPE project.

What if customization can be handled by a single MT model equipped with a switch to toggle between style, gender, domain, topic, length, dialect, context, and glossaries?

According to Georgakopoulou, “MT implementation is not easy. It needs to be meticulously planned and executed if it is to be successful.” She then highlights two important factors to ensure that MT works for translators: (1) the quality of MT technology; (2) change management (i.e., homing in on how people interact with MT).

For the enterprise, AppTek proposes a different approach: What if customization across multiple dimensions can be handled by a single machine translation model equipped with a switch to toggle between nuances of style, gender, domain, topic, length, dialect, context, and glossaries?

In Q3 2021, AppTek will launch an MT system that makes use of multi-dimensional metadata inputs to offer deeper customization at the project level, document level, or even at the individual sentence level — thus placing translators in the driver’s seat with more control over editable output during post-editing tasks. 

The Power of Metadata

As mentioned, metadata is the latest trick to deploying a more productive, cost-effective MT system. This is what AppTek is looking to offer translators and the broader enterprise community.

Deploying a single MT system to handle each unique domain and scenario without sacrificing translation quality is the new best practice in modern translation workflows. All the user needs to do to generate the desired translation is add an extra parameter in the API call (e.g., length = short, style = formal).

The metadata can come from a variety of sources, including source provenance (i.e., data on the origin of a translated document).

“Translation is more than just taking one sentence in one language and formulating it in another. Yet, until recently, MT systems were only doing this and nothing else,” said AppTek’s Lead Machine Translation Architect, Dr Evgeny Matusov.

Matusov added, “With the addition of metadata that directly influences MT output, we are able to raise the quality and adaptability of a single MT system to the next level. The metadata provides the system with a little ‘world knowledge’ that professional translators have. It can be specified by MT users, computed from the very text being translated, like genre or topic, or predicted via separate machine learning algorithms, such as the ones that infer the gender of the speaker in case of speech translation.”

So what can be customized with metadata?

  • Style – Depending on the context, choose between formal and informal for the tricky pronoun “you,” for example, in its singular or plural forms distinguished in other languages (e.g., Latin / Romance languages)
  • Gender – Customize for gendered words in MT output (possibly avoiding gender bias that could render a translation awkward or inappropriate)
  • Domain – Adapt to a wide range of genres, such as news, patents, entertainment, etc.
  • Topic – Use a more tailored, document-level style and terminology
  • Length – Produce shorter or longer translations with minimal information loss or distortion
  • Language Variety – Combine parallel training data for related languages or dialects within a single system (e.g., Castilian and Lat-Am Spanish, Canadian and European French) for an improved translation into a desired language variety or dialect
  • Extended Context – Use the context of previous or succeeding source sentences for better word sense disambiguation, leading to better translation of pronouns and consistency in term translation between different sentences
  • Glossaries – Integrate a glossary or termbase of official words, mandatory translations, or jargon, which an MT system would otherwise translate differently

Deploying AppTek’s technology, which makes better use of metadata, means localization providers can now train just a single model — rather than 10 or more to cover all client domains — reducing the time, effort, and cost of translation.  

Enterprise and off-the-shelf CAT tools and subtitle editors that integrate the technology can now transcend the confines of traditional black-box MT systems with a simple user-controlled “flip of a switch” to the desired metadata. This way, users get more control of the MT output given back to them.

AppTek will offer translation professionals a free trial of its MT technology in Q3 2021. Click here to register your interest.

Gartner Disclaimer
Gartner does not endorse any vendor, product or service depicted in our research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.