Sept 1, 2021 – In today’s data-driven world, high-quality bilingual data is a necessity for training better translation models as well as for capturing domain-specific knowledge from high-quality human translations published on the Web and in translated documents.
It can be challenging to find domain-specific bilingual data, particularly because companies generally have their content translated beforehand. As a result, it isn’t feasible or scalable to align it manually and many applications fall short due to the lack of suitable data.
Paralela uses the most advanced AI models to align translation pairs in any combination of 110 languages from unstructured and unordered streams of content, including documents that may be only vaguely related. The aligner almost magically captures the linguistic similarities among sentences in different languages, doing so at a high level of accuracy. As a result, no manual alignment work is required to achieve results that previously took a lot of effort, and large corpora can be built quickly.
Paralela can work with URLs or Microsoft Word (.DOCX) documents, enabling it to produce TMX bilingual translation-memory files that can be used for translation, terminology mining, and the training of machine-translation engines.
Content creators can also use Paralela to build corporate repositories of bilingual data from monolingual content in various languages.
The tool already provides full functionality. It is being used internally at Logrus Global and is available in a beta testing mode for external customers as a service with the aid of developers who continue to add data pre- and post-processing features.
The product is available at http://paralela.logrusglobal.com/index.php