TikTok Parent Company ByteDance Open Sources Neural Speech Translation Toolkit

Bytedance Neural Machine Translation Research 2021

Chinese technology company ByteDance, known for its short-form, video-sharing app TikTok, has released an open-source toolkit for neural speech translation.

According to a December 2020 paper published on pre-print server arXiv.org, “NeurST aims at facilitating the speech translation research for NLP researchers and provides a complete setup for speech translation benchmarks, including feature extraction, data preprocessing, distributed training, and evaluation.”

One of the end goals, of course, is to extend the toolkit’s use to “advanced speech translation research and products.” NeurST is currently publicly available on GitHub.

Recent work by the three authors — Chengqi Zhao, Mingxuan Wang, and Lei Li, all of ByteDance — includes PRUNE-TUNE, a new method of domain adaptation for machine translation (MT) domain adaptation, and multi-resolutional (MR) Doc2Doc, which the researchers used to train a neural sequence-to-sequence MT model for document-level translation.

As the paper explained, one of the shortcomings of traditional “cascade” speech translation systems is that mistakes in transcription — typically powered by automatic speech recognition — can cause errors in translation. End-to-end speech translation, on the other hand, bypasses the transcription step and produces less lag time.

The authors noted that studies on speech translation work on different datasets; their goal, therefore, was to establish reproducible and reliable benchmarks for the field. They said that NeurST’s “straightforward recipes for preprocessing audio datasets” will free up developers for more advanced work on speech translation.

NeurST was put to the test on several benchmark speech translation tasks for eight European language pairs using publicly available speech translation data (namely the Augmented LibriSpeech and MuST-C corpora).

Overall, NeurST outperformed existing counterparts Espnet-ST and fairseq-ST in most languages. The authors hope the toolkit, which is meant to be NLP researcher-friendly, will be used to establish baselines in future studies.