When data protection laws apply
By j.houghton, on 13 February 2020
GDPR and the Data Protection Act (2018) governs the storage and processing of “Personal data”, which is any data relating to a living, identifiable person. If you are working with data gathered from human participants it is essential to be aware of these laws and how they might apply to your project. These laws do not apply to data that has been anonymised so that individuals to who the data relates cannot be identified from it, or to data related to people who are no longer alive. Sometimes data is pseudonymised where, for example, an individual’s name might be replaced by a code in a dataset but there are records kept which link the codes to the real names. This data must still be treated as personal and potentially identifiable.
Levels of data sensitivity
Some data is considered more sensitive than others. Sensitive data, or special category data, is subject to highly stringent restrictions and includes information like ethnic origin and political party affiliation.
|High sensitive personal data such as Criminal Record Data||Highly restricted – Can only be processed under specific circumstances by persons with the appropriate authority working at specific secure locations|
|Special Category Personal Data||Restricted – Can be processed under specific conditions|
|Personal Data (Including pseudonymised data)||Some restrictions exist|
|Non-personal data (Including Anonymised data)||No Restrictions for processing|
What is anonymous?
Deciding whether data is anonymous or not can be extremely difficult. Even if a dataset has been cleaned of information which would easily identify an individual it is possible that when combined with other readily available information the data could still be used to identify someone as demonstrated previously by researchers. Data that might be anonymous in one context might be identifying in another. When in doubt it is best to assume that data related to an individual could be used to identify them, and treat it accordingly.
Identifiability spectrum by Understanding Patient Data. Image reused under the CC-BY license: https://understandingpatientdata.org.uk/what-does-anonymised-mean
A dataset which has obvious identifiers such as names and addresses removed can, if sufficiently detailed, still be used to identify an individual as reported recently in Nature Communications. As a general rule, the more detailed information collected about an individual the easier it becomes to identify.
Strategies for anonymisation
The best way to work with personal data is to not collect it unless absolutely necessary. When designing a research project with human participants, consider which information you really need to record. Try to avoid direct identifiers (such as name, address, date of birth) and only collect indirect identifiers (such as employer, educational attainment, religion).
Where possible, try to practice data blurring. For example, instead of recording someone’s age as 33, record them as belonging to an age category 30-39.
Need more advice?
If you are working with personal data there is lots of advice available from UCL to help you understand your responsibilities. You can also contact the data protection team at email@example.com