How Do Large Language Models Work?

How Do Large Language Models Work

Large Language Models (LLM) are a type of machine learning model, which, like human brains, work based on prediction: “Based on what is already known, what will happen in a new, unknown situation?”. LLMs use a type of deep neural network to generate outputs based on what they have learned from training data. 

Foundation models, like those developed by OpenAI and Cohere, refer to general-purpose LLMs. Prompting can then be used to turn general models into models with specific applications. LLMs “are very easy to scale… and it turns out that when you make a really big version, you get very good performance”, Nick Frosst, Co-Founder of Cohere, told SlatorPod.

The term “large” refers to the number of parameters in the neural network in a model — some of the most successful LLMs have hundreds of billions of parameters; e.g., GPT-3 has 175 billion parameters.

Frosst clarifies that while the core technology has essentially remained unchanged, LLMs have improved thanks to the exponential increase in the number of parameters and the use of fine-tuning on human data through reinforcement learning.

The training process begins with gathering, pre-processing, and cleaning the training dataset which can come from various sources, such as books, websites, articles, and other open datasets.

Next, a model must be selected and configured. The most common choice for natural language processing (NLP) applications is transformer deep learning architecture, as used by OpenAI’s GPT and Google’s BERT. Frosst explains that transformers “have a particularly good mechanism for looking at sequences and that is very good for text”. Configuration will require certain elements to be specified (e.g., number of layers in transformer blocks, loss function, hyperparameters).

During training, the model is shown a sequence of words and taught to predict the next word. The model assigns a weighting to each part of the input data based on its significance and changes the weightings based on the difference between its prediction and what the next word actually is. This process of self-supervised learning is repeated until the model reaches the acceptable level of accuracy before an evaluation is carried out using a test dataset that was not used during training.

Some current uses of LLMs include conversational chatbots (ChatGPT), text generation, answering FAQs, routing customer service enquiries, classifying and categorizing large amounts of text data for more efficient processing and analysis, and virtual agents. Big tech companies are also rapidly implementing LLMs into their products.

Deploying Large Language Models in Localization

This technology is also becoming a key feature in the localization landscape and even for translation. Despite a study indicating that the translation and interpretation professions are among the most exposed to LLMs, the quality of translations do not yet match up to the state-of-the-art machine translation widely deployed in the language industry’s expert-in-the-loop workflows.

Microsoft researchers found the quality of translations was “very competitive” for high resource languages, but was still lacking for low resource languages.

The GPT panel at SlatorCon Remote March 2023 saw GPT’s potential in unstructured content as it seems to be “a little better [than MT] at the free-form craziness” that reflects real-life interactions.

Researchers also found LLMs had “state-of-the-art capabilities” in automated assessment of translation quality at the system level. However, only models GPT-3.5 and larger achieved the same accuracy as humans.

As with all new technology, some are hesitant about LLMs, primarily because of the questionable reliability of the content generated, although a preliminary set of best practices was published in a bid to promote responsible development. The time-intensive and expensive cost of training an LLM may also be off-putting with estimates suggesting a single training run for GPT-3 costs USD 5m. Finally, LLMs have been proven to produce different types of hallucinations compared to traditional MT models which impact MT quality, user trust, and safety.

At SlatorCon Remote March 2023, Jon Ritzdorf, Senior Manager of Global Content Solutions at Procore, emphasized the similarity between these concerns and those that surrounded neural machine translation (NMT) in 2016 and believes these issues will be resolved very quickly.