Here Are 5 NLP Terms That Now Occur Naturally in Localization

Five Key Terms in Natural Language Processing

The past 10 years have seen the widespread uptake of natural language processing (NLP) in just about every area of localization. Among the most well-known NLP applications is still machine translation (MT) for text-based content, a field that has reached a comfortable middle age.

At a time when the expert-in-the-loop model has matured, as RWS’ Chief Language Officer, Maria Schnell, explained during the May 2022 SlatorPod, MT is the industry’s tool de rigueur. However, MT is being transformed as more private and academic, NLP and AI scientists around the world join in with rapid advances and applications.

NLP terminology is, thus, increasingly becoming part of everyday dialogue in localization, regardless of role. Other offspring of NLP are still in their infancy, but we were curious about how some buzzwords related to NLP in localization are maturing and trending.

So what are people searching for? We used Google Trends to graph term searches worldwide from 2012–2022 and found that, unsurprisingly, many peaks circle back to Google’s own NLP and AI research.

This was a data-driven exercise and a list that started with over 20 terms was filtered down to the following five based on search volumes and significance in localization.

SlatorCon Remote September 2022 | Early Bird $120

SlatorCon Remote September 2022 | Early Bird $120

A rich online conference which brings together our research and network of industry leaders.

Register Now

1. Large Language Model (LLM)

LLMs are based on very large amounts of text data, which are used to generate language or acquire skills for specific purposes. Language modeling has become a research field in its own right, with many scientists now devoted to experimenting with monolingual and multilingual datasets.

A few LLMs are now free for the taking, a development that has in turn led to the recent creation of best practices aimed at preventing misuse of these and other AI resources, as previously reported.

The most recent peak for searches for the phrase “Large Language Model” coincided with Google’s launch of the Pathways Language Model (PaLM) in April 2022 and the launch of the BLOOM large language model by the BigScience project in July 2022, currently a dotted spike in the graph.

Related buzzwords: BERT (Bidirectional Encoder Representations from Transformers), Word Embedding, and Word Vectors.

2. Speech-to-Text (STT) Translation

Speech-to-Text, or audio-to-text translation is a process that analyzes speech automatically and then translates it “live” as text in another language displayed on a screen. Many companies are now offering it as an added feature or service, notably Zoom, Google and, soon, Language I/O

Surprisingly, Google Trends did not render a graph of the exact phrase “Speech-to-Text Translation” using the worldwide search 2012–2022 criteria. A Boolean Google Search for the term rendered 103,000 all-time results, and all top-page hits relate to translation; so we include that trend graph below.

There is an increase in searches for STT around the time Google revealed it will integrate this technology into Google glasses in May 2022.

3. Speech-to-Speech Translation (S2ST) 

Speech-to-speech translation is another NLP application that continues to evolve rapidly. It eliminates the text-generation step in spoken language conversion, thus including languages without writing systems, as Meta did in its S2ST using large audio datasets.

Recent upticks in searches for this term match announcements of breakthroughs in S2ST by Meta in September 2021 and Google in April 2022.

4. Machine Dubbing

Machine dubbing is the process of using technology that combines translation and synthetic voices in multiple languages. It is rapidly gaining adepts globally, including some recent generous investment in several startups, such as Papercup and Dubverse in June 2022, correlating with the most recent peak in the chart below.

Some peaks in searches for this term were observed around the time Google added machine dubbing to YouTube at the end of 2014 (here’s an update).

5. Neural Machine Translation (NMT)

To close, let’s revisit a buzzword whose best days are behind it. Searches for NMT peaked between September and November 2016, when Google announced it had started using NMT in Google Translate and the World Intellectual Property Organization (WIPO) released the NMT-based WIPO Translate.

A term on the decline in searches and now largely replaced by just “machine translation,” NMT saw a recent peak with Google’s addition of 24 languages to Google Translate and other features in April 2022.

Related buzzwords: Artificial Neural Network and Transformer Architecture.

While many terms used in NLP research papers might still be beyond comprehension for most people, NLP stopped being the exclusive domain of scientists and conferences long ago.

NLP glossaries and vocational mini-courses are now as abundant as open-source language models. It may not be long before localization is discussed as a function of language technology, instead of a business powered by it.