Sensitive data – where and how to archive
By Kirsty, on 16 August 2022
Guest post by James Houghton, Research Data Support Officer
It is always essential to protect the personal identity of participants or information that could jeopardise the safety of a building, an endangered species, or similar. Deleting data at the end of a project is often necessary to guarantee privacy and security. But this data is sometimes of immense value. The potential usefulness of data could be weighed against the likelihood of an accidental release and the risk of harm if an unintentional release did occur.
There are options for archiving data with access controls for researchers who feel strongly that their data should be preserved. Some repositories have built-in access controls that ensure sensitive data can only be accessed by specific persons who have undergone an application process. Only a few data repositories offer this feature and will still have a remit controlling what data they can accept. Here are some examples.
- ReShare (UK Data Service) – This site is a social data research repository created to share data for the Economic and Social Research Council (ESRC)-funded projects but is open for submissions from other sources!
- ICPSR, the Inter-university Consortium for Political and Social Research – The ICPSR provide scientific data management and curation services. They have more than 250,000 files of research files in the social and behavioural sciences!
- The European Genome-phenome Archive (EGA) – The EGA offers service for permanent archiving and sharing of personally identifiable clinical data generated for biomedical research projects or in the context of research-focused healthcare systems
The UK Data Service, which runs the ReShare archive, provides functional on-data access control and explains how to implement it.
If you are concerned about storing the data live, even with access controls, consider storing the raw data offline. The existence of the data can be advertised online by creating an entry in a repository that announces the data’s presence and explains how to access it. The repository record will also assign a DOI to cite the dataset properly. Making sure the offline information is stored securely can be challenging, however. There needs to be a specific process to ensure the data is secure and accessible on request.
Dealing with the long-term archiving of sensitive data is complicated. The UCL Research Data Management Team can assist with this. Get in touch if you need support!