KantanMT Upgrades Language Data Repository with 204 New Data Sets; 59 Language Combinations

KantanMT is pleased to announce the release of an updated data resource library, KantanLibrary™ for its clients. The upgrade includes 204 new bilingual data sets in seven industry verticals and 59 new language combinations. The new updated library enables Project Managers and MT engineers to quickly source high-quality, bilingual data for a greater number of languages and domains when building custom Machine Translation engines.

KantanLibrary™ is a repository of high-quality training data for building machine translation engines. Clients can use the data to build stand-alone KantanMT engines, or they can use it to supplement their own data when building the engines. KantanLibrary™ data is publicly available and is Intellectual Property Rights (IPR) cleared by the KantanMT Professional Services Team.

The latest upgrade to KantanLibrary™ sees 59 new language combinations introduced, including Spanish – French, Arabic – Spanish, English – Farsi and English to and from Irish.

“Some of our clients might wish to use Machine Translation for a particular vertical and language pair to aid their localization process and improve their ROI”, says Laura Casanellas, Product Manager at KantanMT. “However, in situations where they lack sufficient data to create a reliable MT system, our KantanLibrary™ training data sets can be a tremendous help.”

The Professional Services Team continually add new parallel data to the KantanLibrary for new language pairs and new domains or industry verticals. The current list includes the following combinations:

  • Legal – (83 language pairs, 9,177,030,139 words)
  • Technical – (58 language pairs, 1,726,635,296 words)
  • Financial – (25 language pairs, 160,376,232 words)
  • Medical – (31 language pairs, 289,466,969 words)
  • Automotive – (7 language pairs, 3,796,120 words)
  • Patents – (2 language pairs, 1,441,433,608 words)
  • General – (39 language pairs, 173,720,207 words)
  • Subtitles – (12 language pairs, 336,089,138 words)
  • UN – (7 language pairs, 1,064,205,980 words)

View the KantanLibrary Catalogue, or to request language combinations that are not listed, contact our support team (support@kantanmt.com).

About KantanMT

KantanMT.com is a leading SaaS based machine translation platform that enables users to develop and manage Custom Machine Translation engines in the cloud. The innovative technologies offered on the KantanMT.com platform enable users to easily build MT engines in over 760 language combinations, seamlessly integrating into localization workflows and web applications. KantanMT is based in the INVENT Building, DCU Campus, Dublin 9, Ireland.