It is important to understand the fundamentals of individual Privacy Enhancing Techniques (PETs). However, when considering the application of PETs on a data set, understanding the data that it contains is equally, if not more important. Only by understanding the privacy needs of your data set can you correctly choose and apply appropriate PETs.
When we consider a data set it is useful to classify each attribute as either Identifying (ID), quasi-identifying (QID) , sensitive attribute (SA) or non-sensitive (NSA). ID values are direct identifiers such as names or passport numbers. A QID on the other hand is a personal trait such as age and gender. On their own they will not allow identification of an individual, but if enough are combined they could be used to identify an individual. The SA is the data we are trying to protect and might be something like an individual's health status, income or drug use history. In most cases we are trying to analyse the SA while preventing that information from being linked to an individual.
Sample Data Set
To demonstrate some of the PETs discussed here we consider an example data set.
1. Non-Perturbation
Non-Perturbation techniques are a broad class of privacy enhancing techniques that either replace or remove the original value to increase privacy of the data set. These techniques can offer high levels of privacy but often result in the loss of utility of the data. It is important to note that none of these techniques ensure anonymity by themselves.
Masking
Local Suppression
Record Suppression
Anatomisation
Generalisation
Pseudonymisation
2. Perturbation
As the name suggests perturbation methods “perturb” the data in some form. This can be done by either moving the values in relation to each other or by changing the data itself. Perturbation methods are extremely powerful and can offer the highest level of privacy while also retaining high levels of utility. They are however more challenging to apply correctly and it can often be difficult to measure the level of privacy achieved.
Noise Addition
Permutation
Micro Aggregation
Synthetic Data
3. Cryptographic
Cryptographic techniques are powerful tools that can make protecting and re-identifying data sets simple. Many of these techniques can be implemented as part of other PETs. For example, deterministic encryption can be used as part of pseudonymisation. As with other PETs it is again important to note that no single cryptographic technique can ensure anonymity of a data set. Encrypted data is sometimes treated differently under certain regulations as it is a reversible process if access to the cryptographic key is gained. As such, key management becomes an incredibly important part of any cryptographic PET. While the field of cryptography is vast, we are focusing on techniques that allow some form of analysis to be performed on the data in its protected state.
Deterministic Encryption
Order-preserving Encryption
Format-preserving Encryption
Homomorphic Encryption
PETs make up a small part of the broader subject of data privacy. For more information on other topics please look at our other resources