The US just earmarked USD 14m towards academic research that aims to develop, yes, yet another machine translation (MT) system. The technology called SCRIPTS: System for Cross Language Information Processing, Translation and Summarization is an “end-to-end system for information retrieval, translation, and summarization.”
The USD 14m grant was awarded by the US Intelligence Advanced Research Projects Activity (IARPA) to a research team led by the founding director of the Data Science Institute of Columbia University, Kathleen McKeown. IARPA reports directly to the Office of the Director of National Intelligence and facilitates the transition of research to the US intelligence community. The agency itself does not deploy technologies to the field.
With so many machine translation solutions already available what’s the pain point that would justify a USD 14m investment into developing yet another system? The Columbia University team put it this way:
“Intelligence analysts study activity in countries all around the world and must read copious documents in many foreign languages. As it is now, analysts must wade through documents manually or use a computer system unable to translate uncommonly spoken languages into English. And current software systems don’t provide good translations of low-resource languages.”
According to the same post, SCRIPTS will transcribe text documents as well as speech from media like videos and news broadcasts in low-resource languages like Hausa and Uyghur. Data analysts will then be able to query the system and it will find and translate relevant material and provide English summaries regarding the information they contain.
“Current software systems don’t provide good translations of low-resource languages”
The solution will combine elements of machine translation (MT) and related natural language processing (NLP) and information retrieval technologies such as text-to-speech (TTS). It is expected to be able to translate 750 million words per day.
The US faces ever increasing volumes of multilingual information in its foreign operations. Just last month, the US Department of Defense awarded Virginia-based Multilingual Solutions a USD 39m contract to help with translation work.
Quite a Team
Research team head McKeown is an authority in the field of NLP and is no stranger to US government-funded projects for organizations like IARPA and the Defense Advanced Research Projects Agency (DARPA).
McKeown assembled a team of researchers in relevant fields such as machine translation and as text-to-speech across Columbia University, Cambridge University, the University of Maryland, Edinburgh University, and Yale.
Many members of the research team, like McKeown, have worked on similar government-funded projects before, such as DARPA’s LORELEI (Low Resource Languages for Emergent Incidents), a USD 26m MT project for lesser known languages.
Additionally, Columbia University recently sought out an Associate Research Scientist to specifically work on SCRIPTS. The job posting is still up as of this writing, though the earliest proposed start date is November 6, 2017.