logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Localizing at Scale for International Growth
    • Design Thinking May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • SlatorCon Remote May 2021
    • Localizing at Scale for International Growth
    • Design Thinking May 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register Before April 15th for SlatorCon Remote and Save 15%!

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Does Google’s BERT Matter in Machine Translation?

1 year ago

October 17, 2019

Does Google’s BERT Matter in Machine Translation?

Machine Translation ·

by Seyma Albarino

On October 17, 2019

1 year ago
Machine Translation ·

by Seyma Albarino

On October 17, 2019

Does Google’s BERT Matter in Machine Translation?

Since being open sourced by Google in November 2018, BERT has had a big impact in natural language processing (NLP) and has been studied as a potentially promising way to further improve neural machine translation (NMT).

An acronym for Bidirectional Encoder Representations from Transformers, BERT is a pre-trained, contextual language model that represents words based on previous and following context.

Quick recap: NMT basically reads in an input (with an “encoder”), and then tries to predict an output (with a “decoder”). During training time, the model is fed training data consisting of input-output pairs, and it adjusts its parameters to maximize the probability of generating the correct output given the input.

Advertisement

When researchers train a language model, the process is almost the same, except there is no input. “The model just tries to maximize the probability of generating sentences in the target language, without relying on any particular input,” Professor Graham Neubig of Carnegie Mellon University’s Language Technologies Institute explained to Slator.

Slator 2020 Language Industry Market Report

Data and Research, Slator reports
55 pages. Total market size, biz dev and sales insights, TMS & MT review, buyer segment analysis, M&A, Covid impact & outlook.
$480 BUY NOW

With BERT, Neubig added, “a model is first trained on only monolingual text data, but in doing so it learns the general trends of that language, and can then be used for downstream tasks.”

In practice, pre-trained BERT models have been shown to significantly improve the results in a number of NLP tasks such as part of speech (POS) tagging.

Exactly how BERT has managed to outperform other models is unclear. As explained in a September 2019 paper by a team at the University of Massachusetts Lowell, one advantage is BERT’s self-attention mechanism (Transformer), which offers an alternative to recurrent neural networks (RNNs). Researchers are now exploring BERT’s capacity to capture different kinds of linguistic information.

According to SYSTRAN CEO Jean Senellart, using a masked language model like BERT for NLP tasks is relatively simple because BERT is pre-trained using a large amount of data with a lot of implicit information about language.

“Given that [BERT is] based on a similar approach to neural MT in Transformers, there’s considerable interest and research into how the two can be combined” — John Tinsley, CEO, Iconic Translation Machines

To handle an NLP task, Senellart said, “we take a BERT model, add a simple layer on top of the model, and train this layer to extract the information from the BERT encoding into the actual tags that we are looking for.” This is called “fine-tuning,” and it is used only to extract information that is already known to be present in the encoding.

So, what makes BERT relevant now? As John Tinsley, Co-founder and CEO of Iconic Translation Machines, explained to Slator, “Given that [BERT is] based on a similar approach to neural MT in Transformers, there’s considerable interest and research into how the two can be combined.”

Back in from the Cold

Progress in this stream of research represents something of a comeback for language models, which “were an integral and critical part of statistical MT. However, they were not inherent to neural models and, as such, fell by the wayside when neural MT hit the scene,” Tinsley said.

A September 2019 paper by South Korean internet company NAVER concluded that the information encoded by BERT is useful but, on its own, insufficient to perform a translation task. However, it did note that “BERT pre-training allows for a better initialization point for [an] NMT model.”

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

Data and Research
32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations
$85 BUY NOW

Other experts who spoke to Slator seem to agree that BERT may be a jumping off point for more custom-made solutions.

“BERT itself is perhaps not the best fit for pre-training NMT systems, as it does not predict words left-to-right, as most NMT systems do,” Neubig said. “But methods like BERT are already proving quite effective in improving translation results.”

As an example, Neubig cited a May 2019 Microsoft paper on “a new technique for pre-training in NMT that is somewhat inspired by BERT but directly tailored to match the way we do prediction in NMT” as showing “very promising results.”

The computing and training complexity overhead involved at this point in time make it unlikely to be used for industrial applications in the near term” — Kirti Vashee, Language Technology Evangelist, SDL

However, in terms of bridging the gap between research and commercial use, “the computing and training complexity overhead involved at this point in time make it unlikely to be used for industrial applications in the near term,” said Kirti Vashee, a Language Technology Evangelist with SDL.

“NMT will evolve like SMT, but the NLP research is moving too quickly for people to justify incurring the expenditure in time, training, and computing expense in the near term without very clear evidence that it is worth doing,” Vashee added.

Leveling the Playing Field for Low-Resource Languages

Within NMT, the improvements achieved by BERT have, so far, been seen mostly in low-resource or unsupervised NMT settings, as noted in the September 2019 NAVER paper.

Rohit Gupta, a Senior Scientist for Iconic Translation Machines, predicted that, in the shorter term, BERT is “likely to have a bigger impact on lower resource languages because we can easily get monolingual data.”

Gupta added that pre-training on one language can also positively impact other languages. “For example, we can use English data to improve language modeling for Nepalese,” he said.

Part of the challenge of using pre-trained BERT to train an NMT model, though, is that “the obvious integration does not work well,” Senellart told Slator.

“You can get some improvement for languages with [limited] resources” — Jean Senellart, CEO, Systran

“You can get some improvement for languages with [limited] resources, but for a language with a lot of resources, what happens is that the encoder — and even more the decoder — loses all its prior knowledge when learning how to translate because it has a lot more obvious features to learn,” Senellart said.

Several recent papers have explored new techniques for the integration, Senellart noted, and “the main idea is to integrate the language knowledge in the encoder and decoder, but as an additional source of information (features) that the encoder or decoder can use.”

Samuel Läubli, CTO of Swiss language technology company TextShuttle, believes integrating document-level context will be critical to advancing NMT.

“As long as systems keep focusing on translating sentences in isolation, BERT alone won’t fix the problem” — Samuel Läubli, CTO, TextShuttle

“Ultimately, users are interested in translating whole documents, with consistent terminology and correct references to words in other sentences,” Läubli said. “As long as systems keep focusing on translating sentences in isolation, BERT alone won’t fix the problem.”

Speaking on behalf of TransPerfect, Director of Artificial Intelligence Diego Bartolome told Slator, “We haven’t seen BERT impact our NMT approach yet,” because TransPerfect has optimized its own tools for (and has enough data to work with) its top 40 languages.

However, Bartolome said, BERT “has a role in other solutions we create,” including areas such as “question-answering (chatbots), summarization, and natural language generation, where we clearly see a value.”

TAGS

AIartificial intelligenceBERTCarnegie Mellon UniversityDiego BartolomeGoogleGraham NeubigIconic Translation MachinesJean SenellartJohn TinsleyKirti Vasheemachine learningmachine translationMicrosoftMLMTnatural language processingNaverneural machine translationNeural MTNLPNMTRNNRohit GuptaSamuel LäubliSDLSMTstatistical machine translationSystranTextShuttletrainingTransformerTransPerfect
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Pro Guide: Translation Pricing and Procurement

Pro Guide: Translation Pricing and Procurement

by Slator

Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Press Releases

See all
LocHub Announces QA Localization Solution For Multilingual Content Publishing Processes

LocHub Announces QA Localization Solution For Multilingual Content Publishing Processes

by Xillio

Former TrustPoint Translations CEO Joins XTRF Advisory Board

Former TrustPoint Translations CEO Joins XTRF Advisory Board

by XTRF

Global Ready Conference Lineup Announced

Global Ready Conference Lineup Announced

by Smartling

Upcoming Events

See All
  1. Smartling - Global Ready Conference 2021

    Global Ready Conference

    by Smartling

    · April 14

    When you can't traverse the world, let the world come to you. Join our annual global event from home.

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

SeproTec

SeproTec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Google Translate Not Ready for Use in Medical Emergencies But Improving Fast — Study

Google Translate Not Ready for Use in Medical Emergencies But Improving Fast — Study

by Seyma Albarino

The Slator 2021 Language Service Provider Index

The Slator 2021 Language Service Provider Index

by Slator

DeepL Adds 13 European Languages as Traffic Continues to Surge

DeepL Adds 13 European Languages as Traffic Continues to Surge

by Marion Marking

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,800 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.