Amazon Releases New Iteration of Neural Machine Translation Toolkit Sockeye

Amazon Releases Sockeye 2, A New Iteration of Its Neural Machine Translation Toolkit

On August 11, 2020 researchers at Amazon detailed advances made in Sockeye 2, a new iteration of the e-commerce giant’s open source, sequence-to-sequence toolkit for neural machine translation (NMT).

Now available on Github, the paper describes Sockeye 2 as providing “out-of-the-box support for quickly training strong Transformer models for research or production.”

Amazon introduced the original Sockeye in July 2017, after acquiring Pittsburgh, Pennsylvania-based MT vendor Safaba. Since then, Amazon has forged ahead with localization projects via machine learning offerings that were once the exclusive territory of language service providers (LSPs), including machine dubbing and quality estimation of translated subtitles.

Over the past three years, Sockeye, which powers Amazon Translate, has been referenced in at least 25 scientific publications, including winning submissions to Conference on Machine Translation (WMT) evaluations.

Amazon is not the only player contributing to Sockeye 2’s improvements over its predecessor. The paper specifically credits Intel and NVIDIA for performance improvements on Sockeye inference and Transformer implementation, respectively. 

The authors — five Amazon research scientists and an external advisor, University of Edinburgh professor Kenneth Heafield — attribute Sockeye 2’s significant gains primarily to streamlined Gluon implementation; support for state-of-the-art architectures and efficient decoding; and improved model training.

By adopting Gluon, “the latest and preferred API of MXNet,” Sockeye 2 requires about 25% less Python code, and improves its training speed by 14%, compared to Sockeye. The simplified Gluon code base is meant to enable rapid development and experimentation.

Inspired by the success of self-attentional models, the researchers focused on Transformer architecture and found that “deep encoders with shallow decoders are competitive in BLEU and significantly faster for decoding.”