By James N I Houghton, on 14 February 2020
At UCL there are a range of options to support managing research data safely and securely. Here’s an overview of resources and services available.
General Purpose Data Storage
You should always make sure all your data is stored with services managed by UCL IT services, as this will ensure material is automatically backed up and will be recoverable in the event of an IT failure. Never be tempted to rely on personal cloud storage accounts for keeping research data safe!
- All staff and students have access to an N drive, up to 100GB storage and fully backed up. Accidently deleted files can be retrieved for up to 90 days.
- The shared S (“shared”) drive is used for enabling staff to share, save and modify files and is backed up like the N Drive.
- RDSS – Research Data Storage service – is a shared space designed specifically for very large data volumes, in the terabyte range. It’s free for the first 5TB and costed for each 5TB beyond that.
Where to store sensitive data?
Using encryption or user access control can be used to store sensitive data on the S drive or RDSS. But if you are working with identifiable data which is restricted by GDPR or the Data Protection Act (2018) then then you should apply for an account on the UCL Data Safe Haven. This highly secure service conforms to the ISO 27001, which in practical terms means it is an appropriate environment for handling health and social care data. But you will need to register a project with Information Governance before applying for an account. This service offers integration with software like REDCap for securely harvesting information.
If you want to work with highly sensitive information, such as criminal records data, then you could consider the Jill Dando Institute Research Laboratory, but be prepared to undergo stringent vetting and training processes first.
But I need a less severe option!
If dedicated highly secure services seem excessive for your purposes, there are ways of sending, receiving and managing data that is somewhat sensitive requiring a level of security, while avoiding lengthy approval processes for the more secure services. The UCL dropbox service (not be confused with commercial dropbox!) allows for sending and receiving files, although sensitive data should be encrypted when using this service. Files remain online for 10 days before being automatically removed, so they are not left lying around to be vulnerable to hacking.
Travelling with data
If you need to travel and take data with you on a device for a conference, field work or visiting collaborators consider some practical precautions first. Always make sure there is a backup of anything on the device, obviously. If you need to transport sensitive data then be sure to use encryption and read our advice on this before you travel.
Archiving and sharing data securely?
If you produce data and want to share it with other researchers at the end of the project but want to restrict the access to avoid misuse, consider the UK Data Archive. They are able to offer a range of options such as different levels of controlled access for different files. You can also access the service with a UCL login, so no need to create a new account.
If you ever have any questions about data management contact the Research Data Support Team for advice and guidance:
Telephone: +44 (0) 20 7679 2095 (internal 32095); +44(0) 20 7679 2614 (internal 32614)
By James N I Houghton, on 13 February 2020
GDPR and the Data Protection Act (2018) governs the storage and processing of “Personal data”, which is any data relating to a living, identifiable person. If you are working with data gathered from human participants it is essential to be aware of these laws and how they might apply to your project. These laws do not apply to data that has been anonymised so that individuals to who the data relates cannot be identified from it, or to data related to people who are no longer alive. Sometimes data is pseudonymised where, for example, an individual’s name might be replaced by a code in a dataset but there are records kept which link the codes to the real names. This data must still be treated as personal and potentially identifiable.
Levels of data sensitivity
Some data is considered more sensitive than others. Sensitive data, or special category data, is subject to highly stringent restrictions and includes information like ethnic origin and political party affiliation.
|High sensitive personal data such as Criminal Record Data||Highly restricted – Can only be processed under specific circumstances by persons with the appropriate authority working at specific secure locations|
|Special Category Personal Data||Restricted – Can be processed under specific conditions|
|Personal Data (Including pseudonymised data)||Some restrictions exist|
|Non-personal data (Including Anonymised data)||No Restrictions for processing|
What is anonymous?
Deciding whether data is anonymous or not can be extremely difficult. Even if a dataset has been cleaned of information which would easily identify an individual it is possible that when combined with other readily available information the data could still be used to identify someone as demonstrated previously by researchers. Data that might be anonymous in one context might be identifying in another. When in doubt it is best to assume that data related to an individual could be used to identify them, and treat it accordingly.
Identifiability spectrum by Understanding Patient Data. Image reused under the CC-BY license: https://understandingpatientdata.org.uk/what-does-anonymised-mean
A dataset which has obvious identifiers such as names and addresses removed can, if sufficiently detailed, still be used to identify an individual as reported recently in Nature Communications. As a general rule, the more detailed information collected about an individual the easier it becomes to identify.
Strategies for anonymisation
The best way to work with personal data is to not collect it unless absolutely necessary. When designing a research project with human participants, consider which information you really need to record. Try to avoid direct identifiers (such as name, address, date of birth) and only collect indirect identifiers (such as employer, educational attainment, religion).
Where possible, try to practice data blurring. For example, instead of recording someone’s age as 33, record them as belonging to an age category 30-39.
Need more advice?
If you are working with personal data there is lots of advice available from UCL to help you understand your responsibilities. You can also contact the data protection team at firstname.lastname@example.org
By James N I Houghton, on 12 February 2020
Continuing our introductions theme for this year’s Love Data Week, you have met the new team, had an update on our newest system, and now let us reintroduce you to the Research IT Services (RITS) Team and what they do for you, your data, and beyond!
First up, RITS runs the Research Data Storage service for staff and doctoral students, where you can store your data whilst your project is ongoing. This is a great place for a number of people to collaborate and share data, especially on big files. To get access, just use the link above and register your project today!
Related to this, RITS also provide research computing platforms for staff and students, which supports research at UCL through provision of specialist platforms for high performance and high throughput computing. Application forms available at the link above.
They also support research software development tools to enable researchers to follow best practices when developing research software, including version control and automated testing, as well as various research IT applications to supporting research management and administration from planning through to publication.
Finally, (as if this list wasn’t long enough already!) RITS provide expertise in creating software for academic research, and collaborate with researchers across UCL on projects. Get in touch with the Research Software Development team to discuss your project!
RITS have been involved in an amazing array of projects, and their Showcase site is a brilliant way to not only get distracted from work for a while, but get an overview of the sheer range of research RITS is involved in, and a taste of the wide range of research that happens here at UCL. As a new member of staff my favourite has to be the joint project between the British Library and UCL Digital Humanities as it proves conclusively that these tools and services are for everyone!
We work very closely with RITS and you can find the Research Data Management (RDM) team alongside the RITS team at their regular drop-ins as well as collaborating on a variety of training covering most of the above, including high performance computing, research software engineering and programming.
Drop-ins are for anyone that wants support or advice about anything covered by RITS or the Research Support team such as:
- finding the right tools and services,
- research programming,
- task automation and scripting,
- high performance computing,
- storing large datasets,
- Research Data Storage (RDS) service induction,
- handling sensitive data,
- Research Data Management (including data management plans).
We look forward to seeing you there soon!
By James N I Houghton, on 11 February 2020
Filled with a sense of excitement (and dare I say it) relief, we finally launched the UCL research data repository on 5 June 2019. This was – and continues to be – an open and free platform for UCL research staff and students to publish outputs of research including datasets, software, posters, presentations, models, photographs… just to name a few.
Benefits of using the UCL Research Data Repository
- Secure long term data preservation and curation:10+ years
- Storage: access and data sharingworldwide
- Compliance:meets funders’ requirements for FAIR data
- Multiple formats: supports almost all file types
- Increased citations: published research data has its own DOI
- Discoverability: enhances discovery and leads to new partnerships
- Defines reuse: applies Creative Commons and other licences
- Embargo: research outputs can be embargoedwhere necessary
- Team collaboration: data can be added to defined project spaces.
Take a look at the UCL Repository FAQs for more hints and tips on using the Repository.
It’s time for some pictures and numbers
(Correct as of 3rd February 2020)
Over the past six months, we have had
- 13,141 total views
- 2,440 downloads
- We made 62 items publicly available
The award for most downloaded goes to…
1st Place: Griffiths, David; Boehm, Jan (2019): SynthCity Dataset – Complete. figshare. Dataset. https://doi.org/10.5522/04/8851658.v2
2nd Place: Quinn, Michael (2019): Writings on Political Economy Volume IV. Preliminaries and preliminary text. All text files (1-16) now uploaded.. figshare. Dataset. https://doi.org/10.5522/04/9897596.v2
3rd Place: Gibson, Adam; Tuckett, Tabitha (2019): Laparoscopy movie of pop-up flaps of de humanis corporis fabrica libri septem by Vesalius. figshare. Media. https://doi.org/10.5522/04/8224085.v1
Where do we go from here: Got any sensitive data?
Before we go any further, I need to clarify what is meant by ‘sensitive data’.
Essentially, we are referring to any kind of data which must be safeguarded by being kept to higher set of security standards… clearer now? Let me give you a few examples:
- Take researchers studying endangered species, the geolocation information of these species may be classed as sensitive. This is because an individual could use this information to commit inadvertent or malicious acts potentially compromising the safety and security of these flora and fauna.
- Or how about commercial data? These data could be classed as sensitive and in need of safeguarding – take supermarket loyalty cards, with the right technical wizardry, this dataset could be linked with others potentially risking an individual’s anonymity
- Health data relating to living human persons (or anything else which could be subject to data protection legislation)…
- Data relating to national security… the list is potentially endless!
So hopefully you can begin to see my point – ‘sensitive data’ very much depends on the research context and UCL staff and students will undoubtedly define ‘sensitive’ according to their own academic domain.
UCL RDR 2.0
This is why phase 2 of the UCL Research Data Repository project is vital in helping UCL researchers to manage different kinds of data, across the research data lifecycle. Phase 2 focuses on requirements gathering so that we may gain a better understanding of the kind of infrastructure researchers need to store these data – at UCL – beyond the end of their research project. If anyone would like to take part in interviews, focus groups etc. please contact email@example.com to register your interest.
See you soon!
For more information
By James N I Houghton, on 10 February 2020
DATA? But I don’t use spreadsheets…
RDM is the term given to the approach taken to handle research outputs across the research lifecycle, from the planning stage of a project through to preserving outputs in the medium to longer-term, with a view to potentially sharing these with others.
Historically, RDM was associated with STEM subjects and often seen as an approach reserved only for the spreadsheet users. I suppose the problem lies with the not so academically inclusive word, ‘data’. For some, images of numbers sitting neatly in rows and columns immediately appear in the mind. And the possibility of losing these researchers’ interest in adopting best practice in RDM -simply because of language choice – is all too real.
So let me address this. In this context, the word ‘data’ refers to anything that could be produced or collected during a study. Data could be: models, software, samples, specimens, artefacts, databases, photographs, protocols, manuals, workflows, presentations, posters…just to name a few.
The UCL Research Data policy states, “Data are facts, observations or experiences on which an argument or theory is constructed or tested. Data may be numerical, descriptive, aural or visual. Data may be raw, abstracted or analysed, experimental or observational.”
More inclusive terms would be ‘output’ or ‘item’. RDM is implementable by researchers across the 11 UCL faculties to handle all outputs of research. (As for the bias towards STEM, the RDM team are working hard to address this).
So who can researchers turn to for support in addressing the challenges associated with adopting best practice in RDM? The RDM team, of course!
The RDM team can advise on topics such as:
- writing data/output management and sharing plans,
- meeting UCL and (where applicable) funding agency expectations, and
- adopting Open Research practices.
We are also the administrators of the new UCL Research Data Repository. We can help researchers to publish their research outputs potentially enhancing the discoverability of their work, and capacity for researchers to be cited, through use of data citations complete with a DOI.
Recently, UCL Library Services gained three new members of staff who form the RDM team.
Kirsty Wallis is the Head of Research Liaison at UCL. She is responsible for managing a number of teams with research support responsibilities, including RDM and Bibliometrics teams. She is also the internal lead for Open Science advocacy. Prior to joining UCL, Kirsty was the Scholarly Communications Manager at the University of Greenwich (London, UK) and the Research Support Librarian at the University of Kent (Canterbury, UK). She also has a MSc in Library and Information Studies from Aberystwyth University, a BSc in Biological Anthropology from the University of Kent, and a Prince2 certification.
Dr Christiana McMahon is a Research Data Support Officer at UCL. She is an advocate of best practice in research data management and is responsible for supporting researchers when managing data and other related research outputs across the research data lifecycle. Christiana has a Ph.D. in Health Informatics focusing on managing research data and associated metadata in public health and epidemiology from UCL (London, UK). She also has a BSc (Hons) in Biomedical Informatics from St. George’s, University of London (London, UK). Prior to joining UCL, Christiana was the Research Data Management Officer at the Institute of Cancer Research (London, UK) and the Research Data Coordinator at King’s College London (London, UK).
Dr James Houghton is a Research Data Support Officer at UCL with a background in life sciences research and academic publishing. James supports researchers in finding the best solutions for managing, sharing and disseminating their research data. He has a PhD in microbiology, specialising in metagenomics sequencing. James previously worked as an associate publishing manager at Springer Nature and as a postdoctoral researcher.
RDM Training and support
Along with colleagues from the Bibliometrics and Open Access teams, we provide training on:
Pop us an email or pop in for a chat
UCL Library Services Room 312
DMS Watson Building
London WC1E 6BT
T +44 (0) 20 7679 2095 (internal 32095); +44(0) 20 7679 2614 (internal 32614)
By Tina Johnson, on 4 September 2019
The Research Data Management team and Research IT Services jointly run regular drop-in sessions. These sessions are open to all UCL research staff and research students.
Someone from the Research Data Management team will be there to support you with
- data management planning
- best practice in storing and sharing data
- complying with funder requirements
– at all stages of the research lifecycle.
If you’d like to come along to one of drop-in sessions, please contact the RDM team at firstname.lastname@example.org with a summary of your research data query beforehand.
Representatives from all of the RITS service areas teams will also be on hand to answer questions or problems related the following areas:
- research programming
- workflow automation
- finding tools and services for your research programmes
- high performance computing
- handling large datasets
- handling personal and GDPR special category data
- data storage
For RITS queries, there’s no need to book, but the RITS team can make sure there’ll be someone there to help with your problem if you email email@example.com, ideally two days before the session.
By Tina Johnson, on 4 June 2019
The (ALLEA) E-Humanities Working Group is seeking feedback on its draft guidance for humanities researchers working with data:
Deadline: 15 July 2019
There is a growing consensus that research data needs to be FAIR – Findable, Accessible, Interoperable, Reusable. That is, it must be managed, organised, preserved, and open to scrutiny and reuse. For this to happen requires a collective rethink of the role of data in the research lifecycle and a change in organisational culture and practices, both across the UK and internationally.
By Tina Johnson, on 10 May 2019
Free workshop for all UCL researchers and staff. Registration is now open.
UCL Open Science Day: developing open scholarship at UCL
A year on from LERU‘s publication of Open Science and its role in universities: a roadmap for cultural change, and following the success of the last year’s workshop, UCL Open Science Day 2019 will explore what open science – or open scholarship – will mean for a UCL researcher in its different applications, and how best the UCL research community can make the practical changes needed.
Thursday 23 May 2019 9.30-4pm
Institute of Education (IOE), 20 Bedford Way, London, WC1H 0AL View Map
This event is organised by UCL LIbrary Services, with Scientific Knowledge Services, UCL (University College London) and in collaboration with UCL Press and LIBER (Association of European Research Libraries).
Email contact: firstname.lastname@example.org.
Blog updated 22 May 2019
By Tina Johnson, on 8 May 2019
After 2 years of collaboration with provider Figshare, the Research Data Repository is live! This free, open service will allow all UCL researchers (doctoral and beyond) to publish, preserve and share data underpinning research – or other potentially useful data. Free, open access to data is central to FAIR data principles and enables replicable research – key aspects of Open Science.
On 5 June 2019, UCL researchers, PhD students and staff joined Library Services and Research IT Services teams to celebrate the launch.
5 reasons to use the UCL Research Data Repository
- repository storage complies with research funder requirements to preserve research data for 10 years or more
- publishing data as a research output takes little further effort and makes your research more discoverable and citable
- greater impact and visibility will enhance your academic profile
- published data can be validated and tested by others – a sign of robust methodology
- sharing data is likely to become a key performance indicator as research practices become more open
Additional benefits of sharing your data publicly
- making your data available can lead to new collaborations and partnerships
- allowing data to be reused and avoiding doubling up makes the best use of funding
- published data provides great resources for education and training
The Research Data Management team plans to deliver tailored training on using the UCL Data Repository later this year at different UCL departments.
Join UCL Reproducibility
Subscribe to the UCL reproducibility mailing list for news and updates, invitations to contribute and training opportunities.
Attend a Reproducibilitea talk
Colleagues from all disciplines, sceptics and non-UCL, welcome.
Research Data Management blogs
- Making UCL Research Reproducible
- How long should I keep my data?
- Who owns my data and what happens when I leave the University?
- UCL General Data Protection Regulation
- Personal and sensitive research data and the law (useful 2016 blog post pre-GDPR)
- Where can I find UCL policies regarding research data, information security and data protection?
- Five selfish reasons to work reproducibly (Genome Biology article)
- FAIR data principles
- How do you deal with a problem like reproducibility?
This blog was updated 13 June 2019.
By Tina Johnson, on 15 April 2019
The call for Open Access to research
Progress on sharing research has been gradual since the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2006) and the San Francisco Declaration on Research Assessment – or DORA (2012). Last year’s publication of Open Science and its role in universities: a roadmap for cultural change by LERU, lead-authored by Dr Paul Ayris, has helped shift the debate from theory to practice. In September 2018, cOAlition S – a consortium of 12 European research funders – called for public-funded research to be published in compliant Open Access (OA) journals or platforms by 1 January 2020. The resulting guidance document, Plan S: Making full & immediate Open Access a reality, has been largely welcomed by the research community for its 10 principles:
Plan S principles:
- the author retains unrestricted copyright – a Creative Commons licence where possible
- robust criteria and requirements are in place for OA journal and platform services
- funders collaboratively establish and support OA journals and infrastructures
- funders or universities cover OA publication fees, not individuals as a rule
- standardised funding and capping of OA fees apply across Europe
- universities, research organisations and libraries align their policies and strategies
- monograph and book publishing practices will require more time to change
- open archives and repositories are important
- hybrid Open Access models are NOT compatible with these principles
- funders will monitor compliance and sanction non-compliance
The UCL response to Plan S
Published January 2019, the UCL response to Plan S fully endorses Open Access in scholarly publishing, calling for “a wholesale rethink of the strategy and timelines for moving to 100% Open Access”, with:
- more engagement with universities, learned societies and researchers before implementation
- more detail on how Open Access publishing could work in different subject disciplines
- a more realistic timeline of years not months to allow universities to apply DORA recommendations and set up appointment and promotions frameworks
- more detail and thought on how publishing fees and Article Processing Charges (APCs) could work, with a risk assessment
- worldwide engagement, as Europe is too small a player to make global changes
Draft UCL statement on reproducibility
On 10 April, UCL colleagues met for a UCL Research Reproducibility Town Hall discussion on the approach and actions needed to improve research standards through replicability. Under specific discussion was a draft UCL statement on reproducibility in research. Email contact: email@example.com
Join us on Thursday 23 May 2019 9.30 – 4pm
Logan Hall, Institute of Education (IOE), 20 Bedford Way, London, WC1H 0AL View Map
This free UCL Library workshop will “explore the facets of Open Science and how these are, or could be, pursued by UCL researchers”, with morning discussions and afternoon workshops offering practical advice.
Morning talks include Registered Reports and the UKRN – Prof Chris Chambers (Cardiff University), cognitive neuroscientist, expert in registered reports and co-founder of the UK Reproducibility Network (UKRN).
Join UCL Reproducibility
Subscribe to the UCL reproducibility mailing list for news and updates, invitations to contribute and training opportunities.
The next talk is on Thursday 23 May and part of the UCL Open Science Day 2019
Attend a Reproducibilitea talk
Colleagues from all disciplines, sceptics and non-UCL, welcome.
- Nosek et al: ‘Estimating the reproducibility of psychological science‘ (Science, 2015)
- Marcus Munafò et al: ‘A Manifesto for Reproducible Science‘ (2017)
- Halsey et al: The Fickle p Generates Irreproducible Results (2015)
The ReproducibiliTea journal club is supported by the UCL Researcher-led Initiative Award, and the UK Reproducibility Network has helped to spread the club to a number of universities.
UCL Open Access policy development:
- UCL Statement on Research Integrity
- UCL Code of Conduct for Research
- UCL’s response to Plan S
- UCL Plan S Town Hall meeting January 2019 (blog)
- UCL Open Access blogs on Plan S
- Wellcome Trust Response to UCL Response to Plan S
Key documents in the Open Scholarship movement
- Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2006)
- San Francisco Declaration on Research Assessment
- Plan S: Making full & immediate Open Access a reality
- LERU policy paper on open science
- How do you deal with a problem like reproducibility? (Marcus Munafò, Jisc blog)
- From coalition to commons: Plan S and the future of scholarly communication (Insights article)
- Plan T: Scrap APCs and fund Open Access with Submission Fees (The Scholarly Kitchen article)
- Most plan S principles are not contentious (Cambridge University blog)
- Implementing publisher policies (Insights case study)
- Reddit feeds on Open Access