6 months ago
February 9, 2021
AI Incident Database Spotlights Worst Machine Translation Fails
In the ongoing popular (albeit shallow) debate pitting human translators against machine translation (MT), one constant is the question of quality — how to define it, how to measure it, and how to improve it.
Now, a new website, the AI Incident Database (AIID), aims to quantify the risks presented, and actual harm caused, by AI. Sean McGregor, ML architect at Syntiant and developer of the AIID, described the “collective memory of [AI systems’] failings” in a November 2020 paper.
As McGregor explained, the AIID is a project of the Partnership on AI (PAI), an organization funded by tech companies and governed by a board comprising corporate partners and non-profits.
The AIID is modeled on incident databases in other industries, namely aviation and cybersecurity, which promote transparency. Its precursor, the “Where in the World is AI?” map, maintains data on harmful AI incidents dating back to 2005. (In a more proactive touch, it also offers a “Responsible AI Design Assistant” to help AI system creators evaluate the trustworthiness of their programs.)
Behind the Headlines
The AIID is powered by volunteers who submit reports (typically media coverage or research papers) of AI incidents through an app on the website. The organization currently defines AI incidents as “events or occurrences in real life that caused or had the potential to cause physical, financial, or emotional harm to people, animals or the environment.” This means that hypothetical risks or damage, worrying though they may be, are excluded from the database.
App users can also search for specific incidents related to their AI programs of interest. Each incident or issue is given a unique number, and any related content submitted to the AIID is accessible on that incident’s page.
One caveat is that the keyword search analyzes the entire text of each report, so some results do not focus on the topic at hand. For example, certain search results for the query “machine translation” include a single, offhand mention of the phrase in reference to an unrelated subject.
Of 1,225 total AI incident reports, the AIID currently returns 111 results for the keyword “language,” 42 for “translation,” and 21 for “machine translation.”
In the most widely reported MT incident, Israeli police arrested a Palestinian man in 2017 after Facebook mistranslated a post reading “good morning” in Arabic as “attack them” in Hebrew. Facebook later apologized for the error, but, of course, the damage had already been done.
In another law enforcement-related incident, a US police officer used Google Translate to communicate in Spanish with a driver from Mexico suspected of carrying drugs. A US district judge ruled in 2018 that Google Translate’s “nonsensical translations” prevented the driver from understanding his rights and consenting to a search of his car.
On the training side of MT, Google Translate’s gender biases have been covered by niche researchers as well as more general media outlets.
Aside from MT, the AIID also catalogs issues in language industry-adjacent pursuits, such as Gmail’s ire-inspiring auto-respond feature; complaints that Amazon Echo responds to television ads; and Amazon’s manipulation of search results on its website to hide adult novels.
Interest in the AIID extends beyond the AI industry, as indicated by collaboration requests from “‘Big 4’ accounting firms, international consultancies, law firms, research institutes, and individual academics,” McGregor wrote, adding that the AIID creators hope the database will grow to promote “the most beneficial intelligent systems for people and society.”