Alibaba Researchers Probe Large Language Models for e-Commerce Translation

Alibaba Researchers Probe Large Language Models for e-Commerce Translation

A group of researchers from the Alibaba Group, the Chinese company behind the Alibaba online stores and, conducted experiments to test neural machine translation (MT) model behavior and large language model (LLM)-based MT behavior on e-commerce content after specific training. The results were published in a paper (download) on March 6, 2024.

Alibaba researchers Kaidi Chen, Ben Chen, Dehong Gao, Huangyu Dai, Wen Jiang, Wei Ning, Shanqing Yu, Libin Yang, and Xiaoyan Cai argue in their publication that existing MT models overlook domains with specialized writing, including e-commerce and legal documents.

Seeking to improve model performance, the researchers set out to test a model multistep training methodology that specifically addresses the morphological, lexical, and syntactical characteristics of e-commerce text. These include things like keyword stacking and long and short product descriptions.

e-Commerce Peculiarities

According to the researchers, conventional MT methods may create issues in e-commerce content translation like low accuracy and keyword omission and duplication. By contrast, they argue, their “general-to-specific” (G2ST) approach for model training obtains better results.

The G2ST methodology uses two-phase fine-tuning and contrastive enhancement steps to enhance results (contrastive enhancement is a method in which different candidate translations are compared and the model learns to choose the better translation). It works by “transferring” a general MT model to an e-commerce-specific MT model.

Pro Guide: Translation AI

To improve the models’ translation performance, the researchers first collected domain-related resources. These included aligned Chinese-English terms and a parallel corpus annotated specifically for e-commerce.

The first preparatory task consisted of expanding the model vocabulary size, particularly domain-related word pairs, explained the researchers. Chinese-English-aligned term pairs were sourced from and ChatGPT 2 and were used for the first model fine-tuning task.

The resulting parallel corpus was in turn annotated for the second fine-tuning phase. The next step, contrastive enhancement, allowed the researchers to improve the lexical representation capability of the model. 

For their experiments, the Alibaba researchers used their newly curated Chinese-English corpora on SOTA NMT and LLM models, including LLaMA, Qwen, GPT-3.5, and GPT-4. 

Results showed that with the G2ST methodology, LLaMA2 outperformed LLaMA. The researchers used the SacreBLEU, Rouge-1, Rouge-2, and Rouge-L metrics for their tests, and found the performance of other models to be comparable. However, the Qwen-14B model rendered the best translation scores.

The researchers intend to further test their methodology on multilingual machine translation.