Is Translation Quality Evaluation a Solved Problem?

Translation Quality Evaluation

Most language industry professionals would agree that machine translation (MT) has significantly improved, whether the quality is measured by BLEU, COMET or any other well-known parameters. But these automated metrics don’t eliminate the need to ensure, somehow, that the final target language product actually meets requirements and standards.

One of those standards made its appearance in February 2024. The new ISO 5060 Standard was created as a sort of evolutive step to address specifically the way humans evaluate translation output. This applies whether the resulting translation comes from a human, a machine, or a combination of both.

ISO 5060 includes general guidelines for translation quality evaluation that, according to the committee that created it, essentially ensures a reliable analysis by humans. The standard addresses seven broad quality categories (terminology; accuracy; linguistic conventions; style; locale conventions; audience appropriateness; and design and markup), and assigns errors a severity level that can be Critical, Major, or Minor.

Ever-present or ever-pressing, it looks like the guideline can help harmonize evaluator competencies, as well as approaches and strategies for translation QA. 

We were curious to know if readers think that translation quality evaluation is a solved problem. Validating the need for standards like ISO 5060, over two thirds  (72.7%) of respondents said no, and that more research is needed. Less than a quarter (20.5%) answered that it depends on the language and text type, and a very small group (6.8%) thinks the problem is indeed solved.

Millions of Videos

Short form videos, as in TikToks and YouTube shorts, are immensely popular. They are constantly made, creating an average of 34m postings per day (according to social media management platform Social Champ). And there is no shortage of options for automatically dubbing such videos into many languages using AI.

A look at the many startups that have made the AI dubbing function itself their flagship product (at least initially, as have a few of those listed in the Slator Language AI 50 Under 50) is on its own telling of the high level of competition taking hold in the field. AI dubbing providers are courting and winning over potential users on affordability, UI intuitiveness, quality, and scalability. 

Technology juggernauts like Meta, Microsoft, OpenAI, and Google have heavily invested in all manner of foundational models and their applications, and are at least partly to thank for making them available to those clever AI dubbing entrepreneurs. The question is, what will tech giants themselves do with AI dubbing? What of Adobe, for example, whose Dub Dub Dub feature is yet to be launched? Will it have the effect on subscriber retention that Firefly AI image generation had on its Creative platform?

While we get answers to those questions, we asked readers if they think AI dubs for short-form videos will succeed as a standalone product category or become a feature in a bigger platform. The vast majority (59.4%) of respondents believe that AI dubbing will become a feature, in contrast with those who think it will succeed as a standalone product (40.6%).

Thanks. Good Bye.

Google Translate, which launched the “Contribute” feature in 2014 to allow users to submit corrections and improve translations, has eliminated the option from its interface. 

Users will still be able to “Send Feedback” to Google about Translate’s output using a pop up form, with the chance to attach a screen shot. They can also rate the translation by giving it a thumbs up or down. As to the reasons for the change, the company offered that “our systems have significantly evolved, allowing us to phase out Contribute.”

The news was received with concern and disappointment by some, such as a Fulfulde language volunteer who was wondering if the efforts of contributors like him, aimed at preserving the language, had been in vain. Google replied that “Google would integrate all the users' contributions in the under-developed languages to better the translation output.”

We asked readers what they thought were the reasons for Google Translate to shelve “Contribute,” and most (45.9%) had no idea (and to let them know when we find out). A little over a quarter of respondents (28.4%) believe that machine translation (MT) is good enough now, and the rest (25.7%) think the feature was abandoned because the “juice is not worth the squeeze.” 

Read My Lips

To illustrate what can now be accomplished by people with little or no technical knowledge using AI, an audio recording where principal Eric Eiswert (Baltimore Pikesville High School, US) could allegedly be heard making racist remarks toward different groups turned out to have been created using AI, police investigators established

Not an isolated case, to be sure, and more worrisome when paired with equally accessible multilingual lip sync technology, such as that offered by the likes of Sync Labs and HeyGen. A key point here is that, not only can anyone use these technologies, but anyone can also now develop them using open-source resources on GitHub. And if a regular person can use them much like they use an app on a smartphone, businesses, specifically the media, might also begin churning out large scale AI-generated content for their own purposes.

At the rate these AI tools are reaching people the world over, and with the likelihood of millions of synthetic videos already circulating, no legislation has a chance to keep up. By the time an automatic audiovisual stamp is mandatorily added to the output of any piece of content created with gen AI, it might be too little too late for anyone already victimized by it.

We asked readers if they believe AI translated/lip-synced content will be widely adopted by the media by 2026, and the majority said yes (59.1%). Less than a quarter (22.7%) of respondents think it will be the case only in specific areas. The rest (18.2%) don’t think it will be the case.