Anonymizing all types of text documents and databases has become a must for all companies that want to comply with data protection regulations. But in which languages can it be done?
Nowadays, personal data is necessary for companies and administrations to run their operations. However, it is necessary to respect the privacy and data protection of the people who are involved.
As a result, anonymization techniques have been developed, and their use is becoming more widespread. In recent years, it has become an indispensable tool for all companies that deal with personal data if they want to avoid fines and comply with the data protection regulations of the countries in which they operate.
Regulations often require retaining personal data necessary for providing a number of services. Once they are no longer needed, and after the minimum period of information retention required by law, the data can be deleted.
Despite companies having security measures in place, many have suffered security breaches involving users’ personal data. This has led to large fines and penalties for not complying with the relevant regulations of the countries where the organization carries out their activity.
Anonymization is required worldwide, and each country applies its own data privacy rules. This is why it is important to have a tool that can identify personal data in the source language in order to hide the corresponding information correctly.
Pangeanic is a natural language processing company specialized in document anonymization software, near-human quality private machine translation, automatic data classification, relevance and sentiment analysis, and summarization. It started out in 2005 as a translation service provider, and subsequently implemented technologies and grew as a language technology and natural language processing company.
The team has developed anonymization software called Masker, which enables organizations around the world to comply with the data protection regulations of the country in which they operate (GDPR, CCPA, HIPAA, APPI). This platform is multilingual in nature, as is to be expected from a company specialized in translation and, therefore, familiar with and specialized in offering multilingual services.
It uses machine learning techniques which are based on the Transformer structure. This structure is composed of attention models, that learn which parts of the sentence are most important and need to be considered for anonymization.
Different languages are available for anonymization. The latest model they have trained is Japanese. This is an important language to consider for Anonymization, since, in Japan, they have the APPI (Act on the Protection of Personal Information), a law that applies to all business operators (individuals and entities) that handle personal information. It is equivalent to the other existing laws around the world mentioned above.