X Close

Research Data Management Blog

Home

FAQs on UCL's Research Data Management

Menu

Archive for February, 2020

Storing Data Safely and Securely

By j.houghton, on 14 February 2020

At UCL there are a range of options to support managing research data safely and securely. Here’s an overview of resources and services available.

General Purpose Data Storage

You should always make sure all your data is stored with services managed by UCL IT services, as this will ensure material is automatically backed up and will be recoverable in the event of an IT failure. Never be tempted to rely on personal cloud storage accounts for keeping research data safe!

  • All staff and students have access to an N drive, up to 100GB storage and fully backed up. Accidently deleted files can be retrieved for up to 90 days.
  • The shared S (“shared”) drive is used for enabling staff to share, save and modify files and is backed up like the N Drive.
  • RDSS – Research Data Storage service – is a shared space designed specifically for very large data volumes, in the terabyte range. It’s free for the first 5TB and costed for each 5TB beyond that.

Where to store sensitive data?

Using encryption or user access control can be used to store sensitive data on the S drive or RDSS. But if you are working with identifiable data which is restricted by GDPR or the Data Protection Act (2018) then then you should apply for an account on the UCL Data Safe Haven. This highly secure service conforms to the ISO 27001, which in practical terms means it is an appropriate environment for handling health and social care data. But you will need to register a project with Information Governance before applying for an account. This service offers integration with software like REDCap for securely harvesting information.

If you want to work with highly sensitive information, such as criminal records data, then you could consider the Jill Dando Institute Research Laboratory, but be prepared to undergo stringent vetting and training processes first.

But I need a less severe option!

If dedicated highly secure services seem excessive for your purposes, there are ways of sending, receiving and managing data that is somewhat sensitive requiring a level of security, while avoiding lengthy approval processes for the more secure services. The UCL dropbox service (not be confused with commercial dropbox!) allows for sending and receiving files, although sensitive data should be encrypted when using this service. Files remain online for 10 days before being automatically removed, so they are not left lying around to be vulnerable to hacking.

Travelling with data

If you need to travel and take data with you on a device for a conference, field work or visiting collaborators consider some practical precautions first. Always make sure there is a backup of anything on the device, obviously. If you need to transport sensitive data then be sure to use encryption and read our advice on this before you travel.

Archiving and sharing data securely?

If you produce data and want to share it with other researchers at the end of the project but want to restrict the access to avoid misuse, consider the UK Data Archive. They are able to offer a range of options such as different levels of controlled access for different files. You can also access the service with a UCL login, so no need to create a new account.

If you ever have any questions about data management contact the Research Data Support Team for advice and guidance:

Email: lib-researchsupport@ucl.ac.uk

Telephone: +44 (0) 20 7679 2095 (internal 32095); +44(0) 20 7679 2614 (internal 32614)

When data protection laws apply

By j.houghton, on 13 February 2020

GDPR and the Data Protection Act (2018) governs the storage and processing of “Personal data”, which is any data relating to a living, identifiable person. If you are working with data gathered from human participants it is essential to be aware of these laws and how they might apply to your project. These laws do not apply to data that has been anonymised so that individuals to who the data relates cannot be identified from it, or to data related to people who are no longer alive. Sometimes data is pseudonymised where, for example, an individual’s name might be replaced by a code in a dataset but there are records kept which link the codes to the real names. This data must still be treated as personal and potentially identifiable.

Levels of data sensitivity

Some data is considered more sensitive than others. Sensitive data, or special category data, is subject to highly stringent restrictions and includes information like ethnic origin and political party affiliation.

High sensitive personal data such as Criminal Record Data Highly restricted – Can only be processed under specific circumstances by persons with the appropriate authority working at specific secure locations
Special Category Personal Data Restricted – Can be processed under specific conditions
Personal Data (Including pseudonymised data) Some restrictions exist
Non-personal data (Including Anonymised data) No Restrictions for processing

What is anonymous?

Deciding whether data is anonymous or not can be extremely difficult. Even if a dataset has been cleaned of information which would easily identify an individual it is possible that when combined with other readily available information the data could still be used to identify someone as demonstrated previously by researchers. Data that might be anonymous in one context might be identifying in another. When in doubt it is best to assume that data related to an individual could be used to identify them, and treat it accordingly.

Identifiability spectrum by Understanding Patient Data. Image reused under the CC-BY license: https://understandingpatientdata.org.uk/what-does-anonymised-mean

A dataset which has obvious identifiers such as names and addresses removed can, if sufficiently detailed, still be used to identify an individual as reported recently in Nature Communications. As a general rule, the more detailed information collected about an individual the easier it becomes to identify.

Strategies for anonymisation

The best way to work with personal data is to not collect it unless absolutely necessary. When designing a research project with human participants, consider which information you really need to record. Try to avoid direct identifiers (such as name, address, date of birth) and only collect indirect identifiers (such as employer, educational attainment, religion).

Where possible, try to practice data blurring. For example, instead of recording someone’s age as 33, record them as belonging to an age category 30-39.

Need more advice?

If you are working with personal data there is lots of advice available from UCL to help you understand your responsibilities. You can also contact the data protection team at data-protection@ucl.ac.uk

Meet Research IT Services!

By j.houghton, on 12 February 2020

Continuing our introductions theme for this year’s Love Data Week, you have met the new team, had an update on our newest system, and now let us reintroduce you to the Research IT Services (RITS) Team and what they do for you, your data, and beyond!

First up, RITS runs the Research Data Storage service for staff and doctoral students, where you can store your data whilst your project is ongoing. This is a great place for a number of people to collaborate and share data, especially on big files. To get access, just use the link above and register your project today!

Related to this, RITS also provide research computing platforms for staff and students, which supports research at UCL through provision of specialist platforms for high performance and high throughput computing. Application forms available at the link above.

They also support research software development tools to enable researchers to follow best practices when developing research software, including version control and automated testing, as well as various research IT applications to supporting research management and administration from planning through to publication.

Finally, (as if this list wasn’t long enough already!) RITS provide expertise in creating software for academic research, and collaborate with researchers across UCL on projects. Get in touch with the Research Software Development team to discuss your project!

RITS have been involved in an amazing array of projects, and their Showcase site is a brilliant way to not only get distracted from work for a while, but get an overview of the sheer range of research RITS is involved in, and a taste of the wide range of research that happens here at UCL. As a new member of staff my favourite has to be the joint project between the British Library and UCL Digital Humanities as it proves conclusively that these tools and services are for everyone!

We work very closely with RITS and you can find the Research Data Management (RDM) team alongside the RITS team at their regular drop-ins as well as collaborating on a variety of training covering most of the above, including high performance computing, research software engineering and programming.

Drop-ins are for anyone that wants support or advice about anything covered by RITS or the Research Support team such as:

  • finding the right tools and services,
  • research programming,
  • task automation and scripting,
  • high performance computing,
  • storing large datasets,
  • Research Data Storage (RDS) service induction,
  • handling sensitive data,
  • Research Data Management (including data management plans).

 

We look forward to seeing you there soon!

Once upon a time, UCL launched a research data repository…

By j.houghton, on 11 February 2020

Filled with a sense of excitement (and dare I say it) relief, we finally launched the UCL research data repository on 5 June 2019. This was – and continues to be – an open and free platform for UCL research staff and students to publish outputs of research including datasets, software, posters, presentations, models, photographs… just to name a few.

Benefits of using the UCL Research Data Repository

  • Secure long term data preservation and curation:10+ years
  • Storage: access and data sharingworldwide
  • Compliance:meets funders’ requirements for FAIR data
  • Multiple formats: supports almost all file types
  • Increased citations: published research data has its own DOI
  • Discoverability: enhances discovery and leads to new partnerships
  • Defines reuse: applies Creative Commons and other licences
  • Embargo: research outputs can be embargoedwhere necessary
  • Team collaboration: data can be added to defined project spaces.

Take a look at the UCL Repository FAQs for more hints and tips on using the Repository.

It’s time for some pictures and numbers

(Correct as of 3rd February 2020)

Over the past six months, we have had

  • 13,141 total views
  • 2,440 downloads
  • We made 62 items publicly available

The award for most downloaded goes to…

1st Place: Griffiths, David; Boehm, Jan (2019): SynthCity Dataset – Complete. figshare. Dataset. https://doi.org/10.5522/04/8851658.v2

2nd Place: Quinn, Michael (2019): Writings on Political Economy Volume IV. Preliminaries and preliminary text. All text files (1-16) now uploaded.. figshare. Dataset. https://doi.org/10.5522/04/9897596.v2

3rd Place: Gibson, Adam; Tuckett, Tabitha (2019): Laparoscopy movie of pop-up flaps of de humanis corporis fabrica libri septem by Vesalius. figshare. Media. https://doi.org/10.5522/04/8224085.v1

 

Where do we go from here: Got any sensitive data?

Before we go any further, I need to clarify what is meant by ‘sensitive data’.

Essentially, we are referring to any kind of data which must be safeguarded by being kept to higher set of security standards… clearer now? Let me give you a few examples:

  • Take researchers studying endangered species, the geolocation information of these species may be classed as sensitive. This is because an individual could use this information to commit inadvertent or malicious acts potentially compromising the safety and security of these flora and fauna.
  • Or how about commercial data? These data could be classed as sensitive and in need of safeguarding – take supermarket loyalty cards, with the right technical wizardry, this dataset could be linked with others potentially risking an individual’s anonymity
  • Health data relating to living human persons (or anything else which could be subject to data protection legislation)…
  • Data relating to national security… the list is potentially endless!

So hopefully you can begin to see my point – ‘sensitive data’ very much depends on the research context and UCL staff and students will undoubtedly define ‘sensitive’ according to their own academic domain.

UCL RDR 2.0

This is why phase 2 of the UCL Research Data Repository project is vital in helping UCL researchers to manage different kinds of data, across the research data lifecycle. Phase 2 focuses on requirements gathering so that we may gain a better understanding of the kind of infrastructure researchers need to store these data – at UCL – beyond the end of their research project. If anyone would like to take part in interviews, focus groups etc. please contact researchdatarepostiory@ucl.ac.uk to register your interest.

See you soon!

For more information

For support, please contact researchdatarepository@ucl.ac.uk or come to one of our regular drop-in sessions.

 

Meet the new Research Data Management Team

By j.houghton, on 10 February 2020

DATA? But I don’t use spreadsheets…

RDM is the term given to the approach taken to handle research outputs across the research lifecycle, from the planning stage of a project through to preserving outputs in the medium to longer-term, with a view to potentially sharing these with others.

Historically, RDM was associated with STEM subjects and often seen as an approach reserved only for the spreadsheet users. I suppose the problem lies with the not so academically inclusive word, ‘data’. For some, images of numbers sitting neatly in rows and columns immediately appear in the mind. And the possibility of losing these researchers’ interest in adopting best practice in RDM -simply because of language choice – is all too real.

So let me address this. In this context, the word ‘data’ refers to anything that could be produced or collected during a study. Data could be: models, software, samples, specimens, artefacts, databases, photographs, protocols, manuals, workflows, presentations, posters…just to name a few.

The UCL Research Data policy states, “Data are facts, observations or experiences on which an argument or theory is constructed or tested. Data may be numerical, descriptive, aural or visual. Data may be raw, abstracted or analysed, experimental or observational.”

More inclusive terms would be ‘output’ or ‘item’. RDM is implementable by researchers across the 11 UCL faculties to handle all outputs of research. (As for the bias towards STEM, the RDM team are working hard to address this).

Greetings!

So who can researchers turn to for support in addressing the challenges associated with adopting best practice in RDM? The RDM team, of course!

The RDM team can advise on topics such as:

  • writing data/output management and sharing plans,
  • meeting UCL and (where applicable) funding agency expectations, and
  • adopting Open Research practices.

We are also the administrators of the new UCL Research Data Repository. We can help researchers to publish their research outputs potentially enhancing the discoverability of their work, and capacity for researchers to be cited, through use of data citations complete with a DOI.

Along with the Bibliometrics team, the RDM team are a part of Research Support. Other teams we work closely with include, Open Access and Copyright.

 

People

Recently, UCL Library Services gained three new members of staff who form the RDM team.

Kirsty Wallis is the Head of Research Liaison at UCL. She is responsible for managing a number of teams with research support responsibilities, including RDM and Bibliometrics teams. She is also the internal lead for Open Science advocacy. Prior to joining UCL, Kirsty was the Scholarly Communications Manager at the University of Greenwich (London, UK) and the Research Support Librarian at the University of Kent (Canterbury, UK). She also has a MSc in Library and Information Studies from Aberystwyth University, a BSc in Biological Anthropology from the University of Kent, and a Prince2 certification.

Dr Christiana McMahon is a Research Data Support Officer at UCL. She is an advocate of best practice in research data management and is responsible for supporting researchers when managing data and other related research outputs across the research data lifecycle. Christiana has a Ph.D. in Health Informatics focusing on managing research data and associated metadata in public health and epidemiology from UCL (London, UK). She also has a BSc (Hons) in Biomedical Informatics from St. George’s, University of London (London, UK). Prior to joining UCL, Christiana was the Research Data Management Officer at the Institute of Cancer Research (London, UK) and the Research Data Coordinator at King’s College London (London, UK).

 

Dr James Houghton is a Research Data Support Officer at UCL with a background in life sciences research and academic publishing. James supports researchers in finding the best solutions for managing, sharing and disseminating their research data. He has a PhD in microbiology, specialising in metagenomics sequencing. James previously worked as an associate publishing manager at Springer Nature and as a postdoctoral researcher.

RDM Training and support

Along with colleagues from the Bibliometrics and Open Access teams, we provide training on:

Writing data management plans (Tuesday 11 February, 9:30am – 12pm; Thursday 20 February, 2pm-5pm; Tuesday 24 March, 10am-1pm; Monday 4 May 2pm-5pm)

Introduction to open scholarship (Friday 21 February, 2pm-5pm; Friday 27 March, 2pm-5pm)

Open publishing: sharing results and ideas (Wednesday 12 February, 2pm-4pm; Wednesday 6 May, 10am-12pm

Pop us an email or pop in for a chat

UCL Library Services Room 312

DMS Watson Building

Malet Place

London WC1E 6BT

 

E lib-researchsupport@ucl.ac.uk

T +44 (0) 20 7679 2095 (internal 32095); +44(0) 20 7679 2614 (internal 32614)