The US’ Defense Advanced Research Projects Agency (DARPA), which some credit with the invention of the internet’s precursor, is investing dozens of millions on an automatic translation toolkit for lesser known languages. The agency has publicly announced awarding phase one contracts to 13 organizations for the development of the Low Resource Languages for Emergent Incidents (LORELEI) program. Seven of these contracts, which altogether amount to nearly $26 million, can be viewed in the FedBizOpps archives.
LORELEI is designed for “low resource” languages that are mostly absent from cross-referenced linguistic databases used by tools like Google Translate. Slator reached out to DARPA but a representative told us that the program manager, Dr. Boyan Onyshkevych, was not available to comment.
Onyshkevych wrote in his description of the program that no automated translation technology exists for the low resource languages that the US government frequently encounters in its global operations. LORELEI is intended to reduce language barriers when the US responds to disasters, infectious disease outbreaks, and military conflicts. In Haiti, for example, Haitian Creole was spoken by most of the population affected by the 2010 earthquake.
Below are the 13 organizations DARPA has awarded phase one contracts to as well as the publicly disclosed contract amount, where applicable:
- Appen, awarded $436,554
- Carnegie Mellon University, awarded $4,874,583
- Columbia University
- Johns Hopkins University
- Next Century Corporation, awarded $1,680,453
- Raytheon BBN Technologies, awarded $5,907,119
- University of Illinois Urbana-Champaign
- University of Massachusetts
- University of Pennsylvania
- University of Pennsylvania Linguistic Data Consortium, awarded $7,744,259 with a total potential contract value of $9,701,002 (for phases 2, 3, and other program aspects)
- University of Texas El Paso
- University of Washington
- University Southern California Information Sciences Institute, awarded $4,689,063 with a total potential contract value of $9,613,015 (for phases 2 and 3)
The universities granted phase one contracts all have research centers or centers of excellence related to language research and translation.
Next Century Corporation is a US contractor on technology and software development founded after the 9/11 terrorist attacks. Raytheon BBN Technologies is another US contractor that has been involved in DARPA’s earlier language translation projects. Raytheon’s statistical machine translation research began in 2003 and the contractor has been working with DARPA’s projects since 2005. Appen, a speech and search technology service provider, has helped with statistical machine translation for projects like Skype’s real time translator.
In the FedBizOpps archives, DARPA also awarded CRCL Inc, a nonprofit research center for computational linguistics. The nonprofit’s base contract was worth $585,704, with a total potential contract value of $2,413,799.
In DARPA’s solicitation for research proposals, the agency outlined eligibility requirements:
- Federally-funded entities should demonstrate their proposals are not otherwise available from the private sector;
- Foreign participants should comply with nondisclosure and security regulations; and
- Organizations cannot be awarded for multiple Technical Areas, and LORELEI’s Technical Area 2 requires top secret clearance.
The LORELEI Program is planned for three phases. Phase one will take 24 months, while phases two and three will each take 12 months. The program’s three technical areas are:
- Algorithm research and development environment;
- Run-time framework development; and
- Linguistic resource creation.
The program’s extremely ambitious rapid machine translation toolkit is expected to be able to understand enough of virtually any of the 7,000 languages of the world so US personnel can effectively coordinate an operation anywhere. The goal is to be able to “digest” any language and learn how to provide helpful machine-translated material “as quickly as 24 hours after an incident occurs,” and even go so far as “fully automated language capabilities within days or weeks after that.”