There was a world of opportunity open to Korean-born Jason Lee after graduation, armed with a Masters Degree in Computer Science from Cambridge University and having completed internships at JP Morgan, Goldman Sachs, and Google.
Lee resisted the temptation to immediately monetize his résumé in big tech or high finance. Instead, he opted for a PhD in Deep Learning and Natural Language Processing at the Data Analytics Lab of the Swiss Federal Institute of Technology (ETH) under the supervision of Thomas Hoffman, a former Director of Engineering and co-site Lead at Google Zurich.
Lee could have set his sights on any one of the lab’s key research areas of machine learning, natural language processing and understanding, data mining and information retrieval. But he chose a problem in machine translation (MT) as his first major PhD research—a sign of how rapid progress in machine learning along with access to massive computer power are impacting language technology.
NLP is definitely the frontier of what the current artificial intelligence (AI) state of the art can do—Jason Lee
The addition of neural networks to MT is drawing a new batch of high caliber researchers into the field. Lee says he has always been interested in languages and naturally gravitated toward the natural language processing (NLP) side of machine learning. Had it not been for the recent advent of neural networks in language translation, however, we think Lee would likely have chosen a field other than MT for his research.
He says “NLP is definitely the frontier of what the current artificial intelligence (AI) state of the art can do.” And machine translation is one of the harder problems. Another hard problem: dialog systems (think customer service bots chatting with you).
Lowest Level Possible
The aim of Lee’s project was to take neural network modeling from the word- or subword-level to the level of characters. Drawing an analogy from image recognition, going down to character level is like going down to the individual pixel, the smallest possible unit (called “token” in NLP).
Lee says that, to his knowledge, work in statistical machine translation (SMT) or previous research in NMT never fully went down to that level. Even Google’s latest Google Translate NMT model operates only at subword level.
This approach has a number of real, and perhaps surprising, advantages. For example, it tends to not get confused by typos (say, in user-generated content) and should have an edge at translating morphologically rich languages (think long words in Finnish, Turkish, and other agglutinating languages).
Since, according to Lee, their NMT model is agnostic down to the translated language, another benefit is the system performed well with so-called intra-sentence code-switching; that is, changing the language mid-sentence in the source.
The idea that an MT engine is language-agnostic may be difficult to absorb for many in an industry used to constant language and domain tweaking in machine translation.
Lee programmed his model based on Theano, a Python-based, deep-learning framework. Other frameworks used in NMT are Google’s TensorFlow and Torch, on which Systran based its latest NMT release.
Completely Data Driven
For this project, Lee visited Kyunghyun Cho at New York University, working at their new Center for Data Science. Cho is an Assistant Professor at NYU’s Department of Computer Science.
Lee calls Cho the pioneer of NMT and his 2014 paper “Neural Machine Translation by Jointly Learning to Align and Translate” a milestone in NMT research. In an interview with the NYU blog on his appointment in 2015, Cho called machine translation “the next field/task revolutionized by deep learning.”
We don’t give the model any linguistic knowledge at all—Jason Lee
According to Lee, NMT is very different from previous approaches to machine translation. Among the most obvious differences, Lee says, is the lack of linguistic and domain knowledge required to run the models.
“The reason why NMT is so exciting is because it is purely data-driven. When you design a model, you don’t inject any knowledge about what it should do. You just give it examples of source and target text and the rest is just magic. It just fills the gaps and the rest in by itself. So this idea of machine translation being a proxy for the advancement of artificial intelligence is valid because it’s so not domain specific. We don’t give it any linguistic knowledge at all.”
Here, Lee echoes his mentor Cho, who told the NYU blog that “instead of relying heavily on domain/linguistic knowledge, with neural networks, we now have a fully data-driven way to understand natural languages.”
Pace of Progress
We wanted to know if there was something like a consensus among the research community about a potential breakthrough in machine translation quality and when this could happen. Lee says, in the short time he has been active in the field, he observed impressive progress and a lot of new research coming out every month. But he qualified that, despite recent advances, it is difficult to say at this point just how far NMT can go.
We now have a fully data-driven way to understand natural languages—Kyunghyun Cho
Asked to share his thoughts on AI’s progress in general, Lee points out it is important to have an ethical discussion alongside the development of AI. He highlights the recent OpenAI initiative as an example of such a forum; and that governments are taking notice with the Obama administration releasing a report on the future of AI on October 12, 2016.
Neural networks are indeed having a profound impact on how machine translation is being approached. No matter where one places NMT on Gartner’s hype-cycle curve, there can be little doubt that the technology is giving a major boost to a field where quality improvements had once become increasingly hard-won. As more researchers like Jason Lee join the ranks of researchers tackling NMT, this progress can only accelerate.