logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Email Marketing for Freelance Linguists
    • Preparing for the Critical Google Update Coming in May 2021
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Register For Email Marketing for Freelance Linguists and Learn How To Win New Clients.

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
TAUS Launches Matching Data

2 years ago

January 16, 2019

TAUS Launches Matching Data

Press Releases ·

by TAUS

On January 16, 2019

2 years ago
Press Releases ·

by TAUS

On January 16, 2019

TAUS Launches Matching Data

Amsterdam, January 16, 2019 – TAUS launches Matching Data: a new technique of selecting language data for the training and tuning of machine translation (MT) engines. This new approach is a perfect fit for the new generation of Neural MT, which is much more sensitive to the quality of the training data. Matching Data empowers MT developers as well as Language Service Providers to efficiently compile customized corpora for building their own domain-specific translation solutions based on an example data set.

“Finding language data for MT training has always been a big challenge.”, says Jaap van der Meer, director of TAUS. “Selecting data for a particular domain is almost impossible. In 2010 already we started scoping a scenario in which an example data set, a simple domain-specific translation memory, would assist our users to compile a completely personalized corpus out of the repository of many billions of segments in the TAUS Data Cloud. The technology to do this was not there yet, but now it is, thanks to the DatAptor project.”

The DatAptor project was a research project undertaken by the Institute for Logic, Language and Computation of the University of Amsterdam, led by Professor Khalil Sima’an and funded by the Dutch STW. Partners in the project were Intel, the Directorate General of Translation of the European Commission, and TAUS. From 2013 to 2016 a team of researchers explored different approaches to make data selection from vast amounts of data seamless and more effective.

Advertisement

“Our dream was to make the world wide web itself the source of all data selections,” says Professor Khalil Sima’an, “but we decided to start more modest and make the very large TAUS Data repository our hunting field first. In DatAptor we learned that every domain is a mixture of many subdomains. The combinatorics of subdomains in a very large repository harbors a wealth of new, untapped selections. Therefore, if the user provides a Query corpus representing their domain of interest, the Matching Data method is likely to find a suitable selection in the repository. ”

The Matching Data method inverts the typical search approach by indexing all sentences in the mixed domain search corpora as searchable entities. As a result, Matching Data returns high-fidelity data with matching scores assigned to each individual segment. Users can decide to download compact, medium or large selections, depending on their needs.

Oracle International Product Solutions has worked with the new TAUS Matching Data service to develop a colloquial corpus for general online conversations and chats between English and Chinese, Korean, Japanese, Spanish and Brazilian Portuguese. Oracle language specialists undertook an in-depth linguistic review and gave an average quality score of 84% on the segments retrieved through Matching Data.

“Matching Data is designed to serve as an industry community service”, says Jaap van der Meer. “Anyone can initiate a new domain corpus by providing a Query Corpus. The resulting domain corpora are available in the TAUS Matching Data Library for everyone who is interested in improving their global content solutions. This release of Matching Data is the first step on our ambitious road towards an open data marketplace.”

For more information, please go to:

  • Ten Years of TAUS Data Cloud Taught Us How to Fix the Data Gap
  • Matching Data White Paper

ABOUT TAUS

TAUS, the language data network, is an independent and neutral industry organization. We develop communities through a program of events and online user groups and by sharing knowledge, metrics, and data that help all stakeholders in the translation industry develop a better service. We provide data services to buyers and providers of language and translation services.

The shared knowledge and data help TAUS members decide on effective localization strategies. The metrics support more efficient processes and the normalization of quality evaluation. The data lead to improved translation automation.

TAUS develops APIs that give members access to services like DQF, the Quality Dashboard, and the TAUS Data Market through their own translation platforms and tools. TAUS metrics and data are already built into most of the major translation technologies.

TAGS

machine translationMatching DataTAUS
SHARE
TAUS

By TAUS

TAUS, the language data network, is an independent and neutral industry organization. We develop communities through a program of events and online user groups and by sharing knowledge, metrics, and data that help all stakeholders in the translation industry develop a better service. We provide data services to buyers and providers of language and translation services.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
Iconic Launches INTRA Translation Platform

Iconic Launches INTRA Translation Platform

by Iconic

Pangeanic Is Now Certified to ISO 27001 Information Security

Pangeanic Is Now Certified to ISO 27001 Information Security

by Pangeanic

VSI Acquires Leading Brazilian Dubbing Studio, Vox Mundi

VSI Acquires Leading Brazilian Dubbing Studio, Vox Mundi

by VSI

Upcoming Events

See All
  1. Multilingual Winter Series

    Let’s Talk About the Future of the Localization Industry

    by Lionbridge

    · February 25

    Participate in an easy-paced 90-minute conversation with the minds that lead and influence the direction of the...

    More info FREE

Featured Companies

See all
Sunyu Transphere

Sunyu Transphere

Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

Smartling

Smartling

XTM International

XTM International

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

Poland Rules on LSP Using Google Translate; Defines ‘Professional Translator’

by Marion Marking

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

Behind the Scenes of the European Parliament’s Pivot to Remote Interpreting

by Seyma Albarino

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,500 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.