The Federal Data Protection Act requires the anonymization and pseudonymization of personal data. This article will explain the hidden meanings of these words and how you can fulfil the legal requirements.
Update: HIPAA requirements taken into consideration
The definition of the term anonymization was found in the old Federal Data Protection Act (1990).
“Anonymization is the changing of personal information so that the individual information about personal or material relationships can no longer be assigned to a certain person or determinable natural person or only with an unreasonably great expense of time, costs and effort.”
Source: FDPA
The term, pseudonymization is defined by the European General Data Protection Regulation gin the same way as the new Federal Data Protection Act (2018):
“Pseudonymization” is the processing of personal data in such a way that the personal data or enlistment of additional information can no longer be traced to a specific person, if this additional information is to be stored separately and is subject to technical and organizational measures which ensure that the personal data cannot be assigned to an identified or identifiable natural person;”
Source: GDPR Article 4(5)
Both procedures have the goal of ensuring data protection of persons or patients.
For this, in pseudonymization, the data that would allow for identification are replaced with a code, for example. However, there is a separate key (e.g. in the form of a table) between the subject and the pseudonym, so that it is ultimately still possible to re-identify the subject if one knows this key.
This is occasionally used in hospitals to protect VIPs. In clinical trials, the analysts also often work with pseudonymized data. If there is a pressing reason to identify the original subject (the patient), this is then possible.
In many web applications in which one can freely choose a user name, which is then displayed to other users on the same platform, we speak of pseudonymization because the operator of the platform knows the key linking person and pseudonym.
In anonymization, however, all identifying characteristics are deleted. This is not trivial, as with pseudonymization, and in the case of gene data, even impossible.
Fig. 1: Pseudonymization and anonymization of data
Both in anonymization and in pseudonymization, identifying characteristics must be deleted (in anonymization) or separated from other personal data (in pseudonymization) in such a way that a trace to the person or their protected information is significantly more difficult.
Usually it is not only enough to only remove the information that can be relatively directly traced back to the person.
Examples of this would be the name, the exact address, the e-mail, telephone number or birth date.
Usually one must also falsify data, change it or group it:
Using k-anonymity one can assess the degree of anonymization or pseudonymization. The following presentation addresses the problem of de-anonymization and presents the concept of k-anonymity.
The US government uses the following example:
Fig. 2: Identification of patient through de-anonymization
Connections to individual patients can be made through the combination of pseudonymized or even anonymized data (figure 2, left) with other data sources, such as an electoral roll (figure 2, right). In the example above, this was not possible in the first case, because the person was too young and thus not registered to vote.
The risk of identification grows through the
Note that you operate a systematic risk management and must fall back on experts such as statisticians for this.
The GDPR requires in article 32:
“Taking into account the state of the art [...] the controller end the processor will take suitable technical and organizational measures to guarantee an appropriate level of protection consistent with the risk; these measures include the following: pseudonymization and encryption of personal data;”
According to the FDPA, saving, processing and use of personal data is generally not allowed. However, there are exceptions to this comprehensive prohibition:
The U.S. Health Insurance Portability and Accountability Act, HIPAA for short, also regulates the requirements for confidentiality of health data. The US Department of Health has compiled this on its website on information for anonymization and pseudonymization .
The HIPAA provides two options for sufficiently pseudonymizing data:
Handling the 18 attributes is somewhat specific to the US, and must be partially adjusted to European contexts:
Read more on the subject of data protection in health care here.