Internet users in the Middle East skyrocketed by 6,378% from 2000–2022, growing 10 times faster than Europe in 2022. Arabic also consistently lands among the top five most-used languages globally. And yet only 1.1% of the top 10 million websites use Arabic.
To address this gap, MENA-based language service provider (LSP), Tarjama, launched this fall a free-to-use Arabic machine translation tool customized for business, including financial, legal, and health language translation.
Creators of Arabic content can now use Tarjama’s business-focused machine translation (MT) engine by uploading documents in various formats to the platform — freemium up to 5MB; free trial up to 10MB — and then receiving instant, accurate translations in the document’s original format (e.g., Excel, PowerPoint, HTML).
Tarjama’s machine translation engine boasts two things, according to Chief Product Officer, Rebecca Jonsson: quality and security.
“Our MT models are trained with high-quality data. In addition, we follow a rigorous quality evaluation process, combining both automatic, manual evaluation and blindfolded benchmarks,” Jonsson said.
Tarjama’s MT engine has also been a proven productivity booster. According to Jonsson, “On average, over 50% of Tarjama’s machine translation output does not need any editing and is considered to be perfectly accurate by our translators. Add to that the segments that only require minor polishing or light editing and our translation team is now able focus on the remaining 15–25% that do require more work.
Furthermore, Tarjama MT consistently outperforms other engines on translating business content when benchmarked both in automatic evaluations (see diagram) and in blindfolded human evaluations.
As for security, Tarjama MT strictly adheres to the ISO 27001 standard for information security management. “This means that both free users and subscribers can be certain that all their data will remain secure at all times,” Jonsson said.
Not All Training Data Is Equal
Jonsson likened MT-engine training data to the human diet. Feeding an MT engine with “healthy data” results in better quality output.
“We actually throw away more data than we keep,” Jonsson said. “Tarjama trains its MT engines on relatively small but healthy data. Rather than feeding our MT engines with huge, single data sources, we make sure our training data is as varied and high quality as possible.”
She pointed out that too much training data from the same source increases the likelihood of repeated mistakes, a skewed terminology knowledge, and a model that overfits to that source. “Even if something is healthy, too much of the same thing is not a healthy diet!” Jonsson said.
Tarjama’s job, therefore, was to define healthy data. Jonsson explained: “Our research showed that, for MT, a good diet would be content that is lexically diverse, includes domain-specific terminology, has well-written source text with sentences of proper length and, of course, high-quality translations of the source.”
Domain-specific data can have a positive impact on the MT engine. Tarjama’s AI team suggests the following recipe for taking forward domain specific data.
- Select domains you lack.
- Avoid generic articles.
- Aim for specialized ones for richer terminology.
- Educate linguists and PMs on how the MT engine learns.
- Explain common data issues, such as transcreation and alignment.
- Ensure that the source is high quality.
- Proofread the source if needed.
- Do not just use MT.
Crucial to the above points, the Tarjama AI team offers this advice: Do not start a big project with a huge budget to review or create domain-specific data. Instead, adapt agile methodologies and start small. After each iteration, then you can evaluate the impact of the data on your models.
Tarjama has been building out its neural machine translation capabilities since 2019. Its vision is to enable companies to benefit from best-in-class MT for Arabic business translation.
“It has never been this easy to get a good, instant translation of your business documents from English into Arabic and vice-versa,” Jonsson said. “And this is just the beginning. We have already integrated Tarjama MT into our proprietary translation management system (TMS), CleverSo, which empowers the translator to perform light review and postediting.”
Tarjama’s Chief Product Officer also disclosed that the AI team is currently training models for different verticals, including e-commerce and media localization, specifically subtitling.
To know more about Tarjama’s 100% safe, fast, high quality, English↔Arabic machine translation for official business documents, drop us a note or schedule a demo here.