Large Pretrained Language Models (PLMs) have gained extensive attention of late from the language technology research community.
The newest entrant to the large language model (LLM) race is Open Pretrained Transformer (OPT-175B). It is a suite of decoder-only and, well, pretrained transformers, ranging from 125 million to 175 billion parameters, which Meta released in early May 2022, as reported by Slator.
Thanks to the explosive growth of parameters and training data, PLMs have facilitated and dominated many natural language processing (NLP) tasks.
However, despite the success of PLMs, recent studies suggest that these large models pose some privacy risks.
Writing about privacy considerations in LLMs, Nicholas Carlini, Research Scientist at Google Brain, said “as language models continue to advance, new and unexpected risks can be exposed, requiring the research community to proactively work to develop new ways to mitigate potential problems.”
Two other studies — one in 2021 and the other in 2022 — revealed that PLMs memorize a lot of training data, including sensitive information; which may be leaked unintentionally and used by malicious attackers.
In another 2022 study, Jie Huang, Hanyin Shao, and Kevin Chen-Chuan Chang from the University of Illinois at Urbana-Champaign, USA, tried to assess if large PLMs are prone to leaking personal information. They did this by querying PLMs for email addresses with, for example, prompts containing the owner’s name.
As the authors noted, email addresses are an important part of personal information, so “email leakage through PLMs is worth investigating.”
Memorization and Association
The same study also identified two things that may cause privacy leakage: memorization and association.
Memorization refers to a PLM’s capacity to memorize personal information. This info may be retrieved using a particular prefix, such as tokens before the information in training data. Association refers to a PLM’s capacity to associate personal information with its owner; thus, attackers can query the information with the owner’s name (e.g., the email address of Tom is ____).
However, “if a model can only memorize but not associate, […] attackers cannot effectively extract specific personal information since it is difficult to find the prefix, e.g., the context of the information, to extract the information,” the authors said.
For the purpose of this study, the authors evaluated PLMs on retrieving email addresses in various settings, including context, zero-shot, few-shot settings, and with or without domain provided.
The authors found that PLMs memorize a large number of email addresses and do leak personal information through memorization. But the risk of specific personal information being retrieved by PLMs is low since PLMs cannot associate personal information with the owner.
Some conditions, of course, such as long-text patterns associated with an email address, knowledge about the owner, domain information, and size of the model may increase the attack success rate, causing potential privacy risks, the authors pointed out.
PLMs “are relatively safe in terms of preserving personal information, but we still cannot ignore the potential privacy risks,” they said.
To mitigate potential risks, the authors propose several strategies that could be applied before, during, and after PLM training. More specifically,
- During pre-processing: you can identify and clear out long patterns and deduplicate training data — deduplication can significantly reduce memorized text
- During training: you can train the model with differentially private stochastic gradient descent (DP-SGD) algorithm for differential privacy guarantees; this ensures the privacy of training data
- During post-processing: in case of an API-access model like GPT-3, you may include a module to examine whether the output text contains sensitive information. If so, you should refuse to answer or mask the identified sensitive information.
Furthermore, information owners should not disclose personal information in text form directly over the Internet. Instead, they can use a picture instead or rewrite the email address and provide instructions for retrieving the email address.
Moreover, information owners should avoid using email addresses with obvious patterns (an impractical suggestion given most company’s standard email formats), as attacks on email addresses with patterns are far more successful than attacks on email addresses without patterns.
“We […] hope this study can give new insights to help the research community understand the risk of PLMs and make PLMs more trustworthy,” they concluded.