logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
Neural Conquers Patent Translation in Major WIPO Roll-out

4 years ago

November 18, 2016

Neural Conquers Patent Translation in Major WIPO Roll-out

Technology ·

by Marion Marking

On November 18, 2016

4 years ago
Technology ·

by Marion Marking

On November 18, 2016

Neural Conquers Patent Translation in Major WIPO Roll-out

Barely two years after it was first proposed, machine translation technology based on neural networks is going mainstream. After Google, Systran, and Microsoft, the World Intellectual Property Organization (WIPO) announced on October 31, 2016 the roll-out of neural machine translation (NMT) on its publicly available translation tool called WIPO Translate.

WIPO, a self-funding UN agency made up of 189 member states, is based in the Swiss city of Geneva.

Work on WIPO Translate began with the open-source, statistical machine translation framework Moses back in 2009. Two years later, the WIPO team had a Moses-based engine ready.

Advertisement

Initially, WIPO Translate was trained to translate between Chinese, Japanese, and Korean patent documents and English as those languages accounted for about 55% of worldwide patent filings in 2014.

Like Google, WIPO has chosen Chinese-English as the trailblazing language combination for its NMT roll-out. That is because in 2015, 14% of all international patent applications were filed in Chinese, according to Francis Gurry, WIPO Director General. “This year, we expect that to go to something like 17% or 18%,” he said on the WIPO YouTube channel.

To roll out the beta version, WIPO trained the tool on a giant corpus of 60 million sentences found in Chinese patent documents from China’s State Intellectual Property Office, which were filed at the US Patent and Trademark Office. Next, WIPO plans to extend the tool’s coverage to patent applications in French, followed by other languages.

To find out more about the NMT production deployment, Slator spoke to Bruno Pouliquen, WIPO Senior Engineer, and Christophe Mazenc, Director of Global Databases Service.

It took probably one or two months to train the model the first time—Christophe Mazenc, WIPO

The rapid roll-out of NMT was a matter of months, recalled Mazenc. “It took probably one or two months to train the model the first time, and then one month to integrate into our systems,” he said.

WIPO partly credits Marcin Junczys-Dowmunt and his technology AmuNMT for the fast deployment. Junczys-Dowmunt is a visiting professor at the University of Edinburgh and a WIPO contractor.

Junczys-Dowmunt’s AmuNMT, Pouliquen explained, is “a tool that can translate very fast using NMT models,” even on a CPU. Within a year, the WIPO engine became more efficient, reliable, and produced better quality output. The team had assembled patent data from Chinese and US patent applications and trained the engine with the open-source tool Nematus, which was developed by Rico Sennrich of the University of Edinburgh, birthplace of Moses SMT.

Narrow Focus

Pouliquen is confident WIPO Translate beats Google on the narrow patent domain: “I think one key aspect of our tool is we train only on patent text. So our tool is very focused and, therefore, it is better because it is not polluted with other things.”

The narrow focus comes with restrictions. Pouliquen pointed out, “If you try to put an e-mail into our tool, you will see that the result is just disastrous; because it doesn’t know how to translate e-mail, it never learned how.”

He cited what he called an “amazing” example: The tool is unable to translate “I am.” Pouliquen said, “‘I am’ is never seen in any patent application, so the tool doesn’t know how to translate it.

The tool does not know how to translate “I am”—Bruno Pouliquen, WIPO

What drove WIPO to look into NMT soon after the idea to use neural networks for translation was first floated, Pouliquen said, was an awareness of the limits of phrase-based machine translation; limits they quickly found NMT could deal with. He said that, when they tested for BLEU scores, there was “big jump for Chinese into English,” indicating a vast improvement from SMT to NMT.

“The difference in BLEU is very impressive,” agreed Databases Director Mazenc, to which Pouliquen added, “Some translators even told us that it was definitely better in terms of human translator evaluation.”

The difference in BLEU is very impressive—Christophe Mazenc, WIPO

Additionally, Pouliquen said, WIPO was sitting on a huge stockpile of parallel data they could train the MT engine on, therefore, “we are effectively in a good position to be one of the quickest to use the technology.”

Does Size Matter?

Pouliquen says WIPO has more reference data for their domain than even tech giants Microsoft or Google. But does the size of the corpus actually matter in NMT?

Pouliquen said, “Our corpus is so big that even with two weeks of machine-intensive training, we didn’t manage to put all the corpus inside. So it’s a bit early to give you an exact answer on that. The only thing we could say is, it doesn’t harm, definitely doesn’t harm.”

He pointed out that, other than size, the quality and timeliness of the corpus are also important. “It’s quite obvious that a quality corpus is better than a bigger [one]. And it’s also quite obvious that recent data is more important than more data.”

The WIPO engineer explained that, in the patent domain, a tool trained on the latest inventions will get a better model to decode new inventions. “Recent terminology is more important than old terminology,” he said.

Are SMT’s Days Numbered?

Asked whether he thinks SMT will still be relevant in three years, Pouliquen said, “It’s like looking in a crystal ball before doing any experiment. I guess all our own models could be replaced by NMT. But if we see that, for example, Portuguese works better with SMT, we will keep SMT for Portuguese. But I think when we’ve got enough data, NMT might be better.”

I guess all our own models could be replaced by NMT—Bruno Pouliquen, WIPO

WIPO is sharing its machine translation technology with other UN organizations. At UN Headquarters in New York, WIPO’s technology has been integrated into the proprietary translation productivity tool and provides MT suggestions for post-editing when there is no match from the translation memory.

This requires WIPO to train their models on a very different set of data, of course. But translation corpora is not something the United Nations lacks. In May 2016, the UN released its Parallel Corpus, consisting of nearly 800,000 documents or slightly over 1.7m aligned document pairs. Pouliquen promises that UN translators will get to enjoy the benefits of NMT sometime in 2017.

Image: WIPO HQ in Geneva

TAGS

neural machine translationpatent translationstatistical machine translationUnited NationsWIPO
SHARE
Marion Marking

By Marion Marking

Slator consultant and corporate communications professional who enjoys exploring Asian cities.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

BeLazy Announces Full Automation for Plunet

BeLazy Announces Full Automation for Plunet

by BeLazy

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.