US Health Agency Set to Mandate Machine Translation Post-Editing for ‘Critical Text’

US Health Agency Set to Mandate Machine Translation Post-Editing for ‘Critical Text’ Xavier Becerra

Professionals and lay people seem to agree: Machine translation (MT) systems are still not fit for use in certain medical situations, as evidenced by a 2021 study on the use of Google Translate in the ER.

Now, the US Department of Health and Human Services (HHS) has chimed in with its take on MT in a new proposed rule to Section 1557 of the Affordable Care Act. The proposed rule, among other things, outlines when and how machine translation (MT) may be used for healthcare-related communications.

Crucially, it states that MT output must be reviewed by a “qualified human translator” for content that is “critical to the rights, benefits, or meaningful access” of the patient, when accuracy is essential, and so on.

The document defines MT as text-based, automated, and instant translations between various languages, sometimes with an option for audio input or output (e.g., speech-to-text), that are produced without the involvement of a qualified human translator.

“While the technology behind machine translation has improved in accuracy, the possibilities of significant consequences from inaccurate translation continue to exist,” the rule states, adding that based on HHS’ review of the literature, “all studies indicated error rates so high as to be ‘unacceptable for actual deployment in health settings.’”

“Context-dependent nature of common words in specialized health and medical domains […] are causing subtle yet clinically significant errors and confusion” — Wenxiu Xie, Meng Ji, et al. in the International Journal of Environmental Research and Public Health

The rule also cites anecdotal instances of multiple US states and territories receiving complaints from individuals with limited English proficiency (LEP) during the Covid-19 pandemic, related to inaccurate or confusing translations on official government websites, likely generated by MT.

While the HHS proposes that a qualified human translator review MT output in certain critical situations, it stops short of specifying just who might be considered “qualified” for the task.

In fact, the oldest study cited in the proposed rule — a paper from 2013 — acknowledged professionally-trained medical interpreters as the gold standard for communication between LEP individuals and healthcare providers, but noted how community practices increasingly turn to MT when interpreters are unavailable.

Limited Shelf Life of MT Studies

Another source, a 2018 review of 18 studies covering MT in clinical settings between 2006 and 2016, showed its age in its conclusions: “In comparison studies, statistical machine translation systems were more accurate than rule-based systems when large corpora were available.” 

A 2021 study cited by the proposed rule to Section 1557, on the other hand, attributed significant improvement in MT accuracy and quality to the advent of neural MT, widely accepted as the sine qua non in the field since roughly 2016.

That study, published in the International Journal of Environmental Research and Public Health, introduced a “risk prevention mechanism” to help healthcare providers assess the risk of “clinically significant mistakes” when using MT (specifically Google Translate for English–Chinese). 

SlatorCon Remote December 2022 | Super Early Bird Now $98

SlatorCon Remote December 2022 | Super Early Bird Now $98

A rich online conference which brings together our research and network of industry leaders.

Register Now

The authors concluded that complex medical jargon is no longer the greatest challenge for MT, suggesting instead that the “context-dependent nature of common words in specialized health and medical domains […] are causing subtle yet clinically significant errors and confusion.”

That said, a 2020 meta-analysis of published research (from 2000 onward) regarding the use of raw MT in healthcare “did not come across cases where MT was the documented cause of ill-suited medical advice or other serious healthcare issues.”

The paper did, however, point to “interactive phrase dictionaries” as potentially more promising than MT in healthcare settings — although, much like MT, “there is no standardized method for evaluating the technology in these contexts.”