Unlocking the Secrets of Language AI With Inter-language Vector Space

XTM Inter-Language Vector Space

Just as neural machine translation (NMT) disrupted statistical machine translation, so has Inter-Language Vector Space (ILVS) disrupted current word alignment models. Moreover, the adoption of ILVS comes with considerable commercial implications for translation and localization workflows in terms of time savings, reduced cost, improved accuracy, and brand consistency.

ILVS is neural network-based technology that indicates the proximity between distinct source and target words within a segment being translated. It is now being used to enhance productivity for translators, reviewers, and post-editors.

“ILVS allows word-to-word alignment to be done online very quickly. In most cases, it takes well under a second,” Dr. Rafał Jaworski told Slator. Jaworski is XTM International’s Linguistic AI Expert.

ILVS is built on extensive research by Google and Facebook (i.e., around vector space algorithms), and XTM provided data from bilingual dictionaries and performed the alignment of vector spaces.

Although vector space algorithms have been widely researched and used by Big Tech for quite some time, Jaworski pointed out that, “to the best of our knowledge, this technology has not been used directly to aid the human translation process. The inter-language aspect of ILVS makes that possible.”

“A fascinating thing about ILVS is that it is able to detect potential translation candidates even if they have never appeared in any dictionary”

He said XTM adopted this new technology and immediately put it into action by developing and releasing multiple features based on it; and, “once proven to be useful, these novel features will likely inspire others to follow.”

Jaworski added: “A fascinating thing about ILVS is that it is able to detect potential translation candidates even if they have never appeared in any dictionary. This is achieved from the alignment of vector spaces — and this process affects all words in the vector space, not only those that appear in dictionaries. For this reason, when performing the task of building multilingual terminology glossaries, for instance, ILVS can detect even highly specialized narrow-domain terms.”

Why It Is Disruptive

According to XTM’s Linguistic AI Expert, Dr. Rafał Jaworski, current known mechanisms for performing translation alignments (e.g., Giza++, FastAlign) work only in batch mode; that is, they process the whole bilingual corpus in one go, which takes a considerable amount of time. 

Rafał Jaworski of XTM on Inter-Language Vector Space

Furthermore, current systems do not offer to perform word alignment between a single pair of sentences using data from the whole corpus. By contrast, ILVS provides a way to immediately calculate the probability of word alignments. “ILVS returns the probability of alignment as opposed to Giza++ or FastAlign, which only provide the binary information: either the word matches or it does not,” Jaworski said.

He added that this information from ILVS on alignment probabilities “opens up countless opportunities, such as the identification of potential translation errors (i.e., words in translated text with low matching probability to source words), translation suggestions, and much more.”

Who Can Immediately Benefit

ILVS was first introduced in version 12.4 of XTM Cloud, the company’s flagship translation management system (TMS) with integrated translation productivity (a.k.a. CAT) tool.

Additional features include automatic placement of inline elements, further improvement to XTM’s already class-beating, auto-alignment corpus aligner and bilingual terminology extraction.

Users of XTM Cloud 12.4 can immediately benefit from ILVS at no added cost and can expect upcoming versions 12.5 and 12.6 to come with further improvements to the technology as well as higher language coverage. The features involved include enhanced auto-alignment, auto-inline element placement, and bilingual terminology extraction.

While word- and phrase-level alignment was available in previous TMS versions, according to Jaworski, “it was only powered by electronic bilingual dictionaries. Now, ILVS provides much higher coverage in terms of the number of supported languages and the number of words within each language.”

“Not a single bit of information came from the private data or material of XTM’s clients”

The technology draws on massive big data resources — including a crawl of all of the Internet and XTM’s massive bilingual dictionaries — to calculate the probability of a given target language word being the correct translation of a source word for over 250 languages.

Jaworski explained, “When we speak about 250 languages, that makes, combinatorially, 31,125 language pairs (i.e., 250 × 249/2). Before ILVS, we supported about 200 languages, which makes 19,900 language pairs.”

The data used to create ILVS came from texts publicly available on the Internet and Big Data bilingual dictionaries licensed by XTM. Jaworski emphasized: “Not a single bit of information came from the private data or material of XTM’s clients. Moreover, even the publicly available texts are not stored within ILVS. The only information that can be retrieved from ILVS is the numerical translation probability.”

Commercial Impact

A critical part of the translation / localization process is terminology management. Building terminology from existing translations is crucial to text quality and consistency — and automating the extraction of bilingual terminology is the next level of advancement.

ILVS can automate up to 90% of bilingual term extraction.

XTM used advances in computational linguistic technology including ILVS to build a reliable bilingual terminology extraction feature. Linguistic AI Expert Jaworski explained: “Automatic glossary creation first detects terminology on the source side of the translation memory. ILVS then helps find the translation of these terms. The human input merely consists of reviewing the output.”

The impact is fourfold: (1) time savings – it takes 85% less time to create glossaries; (2) reduced cost – consistent terminology means less rework and no extra costs; (3) improved quality – up to 90% accuracy based on high-quality translation memory; (4) brand consistency – resulting glossaries can now ensure consistent style across content.For more information, visit www.xtm.cloud/artificial-intelligence.