As 2023, the year of AI breaking through comes to an end, the language services industry is looking ahead with cautious optimism and a “better be prepared” attitude. AI language technology is putting a spin on everything stakeholders held near and dear for the five years or so that followed the advent of NMT.
This past “most consequential year,” as Slator’s Florian Faes called it during SlatorCon Zurich in October, brought back movers and shakers to in-person discussions around technology that were not fearful, but rather full of eagerness to move forward in as sure-footed a way as possible.
Testament to that spirit was the Investors Panel, where Markus Zejermann, Managing Director at Mayfair Equity Partners, and Fernando Chueca, Managing Director at Carlyle, coincided on the strength and suitability of the language industry to be the experts in control of the very technology and services brought about by large language models (LLMs).
These are Slator.com’s most popular stories of 2023.
1. The US State Department is Looking for 1,000 Translators and Interpreters
The most popular story of 2023 sparked a viral share and discussion in September. It dealt with the US State Department publishing an “intent to issue a blanket purchase agreement” for language services provided directly by about 1,000 translators and interpreters.
The “intent” was just an informational notice describing potential direct contracts with individual linguists for “interpreting, translating, and related services in support of diplomatic and foreign affairs activities at the highest levels of the U.S. Government.” Perhaps part of the popularity of this article had to do with compensation, which according to the notice could represent as much as USD 150k for a single assignment, provided a long list of requirements are met.
2. We Tested Google Bard on Machine Translation
Bard was but a baby LLM in May 2023, at least when it became a public chatbot. At Slator we had to test its innate translation abilities. But it was only just learning, and its parent Google made extensive apologies for the inaccuracy or inappropriateness of its responses.
Bard was initially trilingual in US English, Japanese, and Korean, even though it claimed (or hallucinated?) to be able to translate between 133 languages. Bard also said it could subtitle movies. But when asked to translate in anything other than its three native tongues, it said “I am an LLM trained to respond in a subset of languages at this time, so I can’t assist you with that.”
3. Why Large Language Models Hallucinate When Machine Translating ‘in the Wild’
In March, researchers published the results of a detailed study of massively multilingual translation models and LLMs, including ChatGPT. The scope of the study was broad, with over 100 translation directions including non-English language pairs.
According to the researchers, hallucinations occur more frequently when the target is a low-resource language, and exceed 10% in some language pairs. They concluded that “models tend to rely less on the source context when translating to or from low-resource languages.”
4. Meta Warns Its Latest Large Language Model ‘May Not Be Suitable’ for Non-English Use
Meta released the Llama 2 LLM in July, just about five months after Llama 1. Touted as open and free for research and commercial use, the second iteration of the LLM was trained on 40% more data.
However, the model was not an ideal choice for translation, admitted Meta researchers, who explained that “Most data is in English, meaning that Llama 2 will perform best for English-language use cases,” adding that “a training corpus with a majority in English means that the model may not be suitable for use in other languages.”
5. Here Are Six Practical Use Cases for the New Whisper API
OpenAI’s speech-to-text model, Whisper, is available via API, offering transcription in close to 100 languages. However, that availability may be restricted to a few users with adequate computing capacity, as it requires a lot, Slator found in a March article.
For those able to provide that computing capacity, the six use cases discussed in the article were transcription services, language learning tools, indexing podcasts and audio content, customer service, market research, and voice-based searches. Combining Whisper’s API with those for ChatGPT and other models can also allow users to build other applications, including “video to quiz” and “video to blogpost.”
6. Google Explores How Large Language Models Actually Translate
A May paper analyzed the Pathways Language Model (PaLM) to understand how the LLM is able to translate. Authors found that 55% of bilingual instances were actually not translations, but code-switching, references to named entities in their native language, and unrelated content. 40% of bilingual instances could be considered pseudo translations that included summarization and paraphrasing.
Researchers also experimented with prompts to elicit PaLM’s translation abilities and established that most LLM MT research involves prompting with source and target language names in English, followed by a colon (e.g., “French:”), also the most frequently used prompt in the data.
7. Microsoft Kills Off Beloved Language Portal
The Microsoft Language Portal, a multilingual online dictionary of computer-related terms and a compendium of localization style guides and translations of UI strings, has been available to the general public since 2009. The company announced in May it was shutting down the portal.
Popular with translators, one of them lamented on social media at the time, “I don’t think it’s right to discontinue a terminology portal like this one while so many users and translators like us make so much use of this tool.” The portal was indeed shut down in June, but it reopened just weeks later.
8. Why MrBeast is Launching a Dubbing Company
MrBeast (Jimmy Donaldson’s YouTube moniker), announced in February that he had launched Creator Global, a dubbing services company for content creators. The influencer had been testing YouTube’s multi-track audio since late 2021 on his own content in more than a dozen languages and got the idea to offer quick dubbing in any language from that experience.
YouTube’s multi-track audio allows the centralization of all languages in a single video. Users can then choose between different languages. According to YouTube, among testers, more than 15% of visualizations included “views in the video’s non-primary language.”
9. GPT-4 Launch Promises Surprising New Use Cases
OpenAI launched GPT-4 in March as the most advanced of its language models. The company said of the model that it was “more reliable, creative, and able to handle much more nuanced instructions” than its predecessor. GPT-4 is also able to handle image and text inputs.
The LLM was tested on different benchmarks, including exam simulation (such as the bar exam and the SAT). GPT-4 also displayed better MT performance than the previous model iteration: GPT-4 outperformed GPT-3.5 and other LLMs in 24 out of 26 languages, including in low-resource languages like Latvian, Welsh, and Swahili.
10. WIPO Is Hiring Translation Studies Graduates for USD 5k per Month Geneva Fellowship
The World Intellectual Property Organization (WIPO) opened applications to the 2023 round of its PCT Fellowship Program for Graduate Students in January. The Fellowship program typically lasts a minimum of three months and takes place at the Geneva, Switzerland WIPO headquarters. Fellows receive a monthly stipend of CHF 5,000 (USD 5,410).
Following an initial screening, shortlisted candidates had to take either a terminology, translation, or aptitude test, depending on their selected track.