X Close

Open@UCL Blog

Home

Menu

Sensitive data – where and how to archive

By Kirsty, on 16 August 2022

Guest post by James Houghton, Research Data Support Officer

It is always essential to protect the personal identity of participants or information that could jeopardise the safety of a building, an endangered species, or similar. Deleting data at the end of a project is often necessary to guarantee privacy and security. But this data is sometimes of immense value. The potential usefulness of data could be weighed against the likelihood of an accidental release and the risk of harm if an unintentional release did occur.

There are options for archiving data with access controls for researchers who feel strongly that their data should be preserved. Some repositories have built-in access controls that ensure sensitive data can only be accessed by specific persons who have undergone an application process. Only a few data repositories offer this feature and will still have a remit controlling what data they can accept. Here are some examples.

  • ReShare (UK Data Service) – This site is a social data research repository created to share data for the Economic and Social Research Council (ESRC)-funded projects but is open for submissions from other sources!
  • The European Genome-phenome Archive (EGA) – The EGA offers service for permanent archiving and sharing of personally identifiable clinical data generated for biomedical research projects or in the context of research-focused healthcare systems

The UK Data Service, which runs the ReShare archive, provides functional on-data access control and explains how to implement it.

If you are concerned about storing the data live, even with access controls, consider storing the raw data offline. The existence of the data can be advertised online by creating an entry in a repository that announces the data’s presence and explains how to access it. The repository record will also assign a DOI to cite the dataset properly. Making sure the offline information is stored securely can be challenging, however. There needs to be a specific process to ensure the data is secure and accessible on request.

Dealing with the long-term archiving of sensitive data is complicated. The UCL Research Data Management Team can assist with this. Get in touch if you need support!

Art History theses and copyright

By Kirsty, on 9 December 2021

Guest post by Thomas Stacey, Open Access Team, UCL Library (LCCOS)

At UCL, students studying for doctoral and research master’s degrees are required to submit an electronic copy of their thesis to the Library for inclusion in UCL Discovery, our open-access repository of UCL research outputs, in order for their degree to be awarded.  The Open Access Team encourages theses to be made openly available, either immediately after award or following the completion of an embargo period. We do, however, recognise that there are a number of reasons why access may need to be restricted, such as future publication, confidentiality, the inclusion of sensitive and/or personal information, and – in the discipline of Art History in particular – the presence of third-party copyrighted images.

I have been thinking about art history theses and whether they could be made open access more easily – and crucially with all the images included where needed.

The University of Cambridge’s ‘Unlocking Research’ blog post written in 2019 by Dr Lorraine de la Verpillière provides a comprehensive background on the issues facing academics within the arts: many are forced to pay to access third-party copyrighted works for private study, and then to pay again later on publish the final research output. Within this blog post, one academic commented “The more successful I become the poorer I get” as the furthering of their career through obtaining copyright for images has cost them over $20,000. Even out-of-copyright artworks are affected, as galleries and museums that own the originals can create their own copyrighted reproductions and restrict others’ ability to do the same.  Bridgeman Images, for example, now owns the rights to all images of artworks in Italian national museums – which can pose a huge financial challenge for many art historians.

A further obstacle for Art History students is that the principle of fair dealing within the Copyright, Designs and Patents Act 1988, which can be used to justify the inclusion of extracts of texts and figures (as part of a wider, previously-published work) in theses, cannot be applied to the reproduction of full artworks, which constitute entire copyrighted works in themselves.

An art history thesis without images understandably compromises the integrity of the work. Unless PhD students use images with Creative Commons licences or which are in the public domain due to being out-of-copyright entirely, they will either have to obtain permissions or redact the images within their thesis accordingly. When processing thesis submissions for UCL PhD students, the Open Access Team will often be required to redact images as part of routine checks prior to any thesis file being made publicly available in UCL Discovery.

It seems there is not a straightforward solution to enable art history theses to be made open access with all images included in the work. Dr De la Verpillière suggests that there could be more support from universities for art history students and academics regarding third-party copyright. Art institutions really need to do more in this respect. Some art institutions have started to make their image collections open access (a selection is given below) so hopefully more will do likewise soon. Even if art institutions provided discounted permissions fees for PhD students needing to use images for example – that is a compromise of sorts to help new academics.

To avoid delays in making theses available in UCL Discovery post-award, or redactions being made to images of artwork that are critical to the overall integrity of the thesis, the Open Access Team also recommends that relevant licence and/or permissions information is included within the thesis file, as part of the Library’s guide to copyright for research students.

Here are some art institutions with open-access image collections:

Data journals and data reports – don’t miss out on this useful publishing format!

By Kirsty, on 17 August 2021

Guest post by James Houghton – Research Data Support Officer

Why not publish a data report article?

For a researcher who produces large amounts of data or works heavily with software and code for analysis, getting proper credit for their efforts can be a problem. Traditionally, an academic article is written in a format where a hypothesis is tested, results produced and analysed, and ends with a conclusion. This format increasingly is a poor fit for the work of many and data journals are one solution to this issue. The goal of this kind of journal is to publish a type of article usually referred to as a data report which focusses on announcing and describing the output of research projects which are resources, raw data, databases or similar and can be of use to the research community in general.

Publishing with a data journal offers several benefits. First, a data report article is more formal than a publication of data files in a repository and is a peer reviewed publication which then contributes to a researcher’s publication record which is important for CVs and advancement for many. Second, they allow a more detailed explanation of a dataset and any analysis or code related to it than is usually otherwise possible. Third, the appearance of an article in a recognised journal can help to drive visibility of a dataset for other researchers. In practice it my often be the case that a repository will be used to host material which is discussed at length in a paper.

For the research community more generally, data reports are a great way to discover and understand valuable contributions which they can re-use and build on. The data report guarantees there has been some level of peer-review applied to the data and, therefore, increases the confidence in the quality.

Data journals have flourished in recent years. Many publishers have introduced titles which specialise in data announcements and many other journals have begun to allow data articles as one of their accepted formats. Publishers will have their own specific guidelines for exactly what to include (or not include), but data articles will often have the following features:

  • Detailed description of the methodology of how the data was produced and processed, allowing for far more detail than generally appears in a “traditional” publication.
  • Documentation on structure and format of the data and details of how to retrieve it.
  • Comments on how the data could potentially be re-used.
  • Very limited or no results and conclusions.

The scope of a data journal varies greatly

  • Some journals publish a wide range of data reports that cover many research areas, such as Scientific Data published by Springer Nature.
  • Others are more subject specific such as Big Earth Data published by Taylor and Francis focussing on ecology and climate science, or Journal of Open Psychology Data published by the Open Access Ubiquity Press and specialising in psychology and anthropology data.

Of course, you must always check individual journal’s instruction for authors before preparing an article for submission.

Repositories and data journals should be seen as symbiotic, rather than needing to choose one or the other. An openly shared data set can be made available, and a data journal can be used as a way of announcing the existence of the resource to the community along with a detailed commentary which might not be easily supported by the repository itself. In fact, depending on the journal, hosting the data with a recognised external repository may even be a requirement for the publication process.

We won’t attempt to provide a comprehensive list of all journals that support this publication type here. There are many discipline specific and several more generalist options – but we would encourage you to investigate the options available in your subject area and tell us what you find!

Love Data Week – UCL’s Research Data Storage Service (RDSS) now open to external collaborators!

By Kirsty, on 12 February 2021

Guest post by James Wilson, Head of Research Data Services


Over the last year we’ve been making a number of improvements to the Research Data Storage Service (RDSS) to help researchers store and access their data in a way that better corresponds to how they work.

The RDSS is a managed storage service that helps researchers comply with funders’ criteria for good data management. It provides a storage space for research projects so that anyone involved in that project has a secure area in which to store and share files with their collaborators. Projects in the RDSS do not need to be formal, externally funded projects – they can be for personal research, or small unfunded collaborations between colleagues – but the service is well adapted for large projects with compute and multi-terabyte storage requirements.

That said, the service has had some limitations in the past which we have been addressing. The foremost amongst these was that you needed to be a member of UCL in order to use it. Increasingly, however, research is undertaken with collaborators around the world or in partnership with industry. Covid-19 has only accelerated this trend. We have recently added external collaborator functionality, enabling PIs to add external project members via a simple email invitation from within the interface.

We have also integrated the RDSS with UCL’s Research Data Repository – a platform that enables data and other non-traditional research outputs to be published, cited, and preserved over the long term. Researchers with a project registered in the RDSS can now move files, including very large files, across to the repository, along with contextual information.
As the volume of data in the RDSS grows, so we extend our capacity. We added an additional 600 terabytes of capacity during 2020, and will be adding a further petabyte of storage this coming term. The first terabyte of storage for any project is provided free of charge, with larger projects charged at £50 per TB per year. This gets you two copies of your data on disk in two different physical data halls at UCL’s Slough Data centre. A third back-up copy is saved to tape, and there is a 30-day retention period to help protect against accidental deletion.

Further information about the RDSS can be found at https://www.ucl.ac.uk/isd/services/research-it-services

Love Data Week – Sharing data? Your questions answered

By Kirsty, on 10 February 2021

Guest post by James Houghton, Research Data Support Officer


Dealing with research data, and the associated legal and administrative issues, can be confusing. This article responds to some of the frequent question and confusions people have regarding research data management.

Do I always have to share data?

Not always – but in general data sharing is required unless you have a very good reason not to and UCL expects research to be shared as widely as possible. Data sharing is possibly inappropriate in the following situations:

  • The project contains personal data which could compromise the privacy of individuals. In this case the Data Protection Act (2018) applies and the data cannot be shared.
  • There is a possibility that the research could be commercialised. In this case, data should not be shared before obtaining necessary patent protections.
  • Other ethical concerns for which a justification can be created. For example, data on an endangered species might be used by poachers so it would be reasonable not to share this data.

If you are ever unsure about releasing data, speak to someone before you proceed. The Library RDM team and the Data Protection Team can advise on this.

Does UCL have a data sharing policy?

Yes, and it specifies the expectations placed on all UCL staff and students on making data available.

Be aware that in addition to the UCL policy, funding agencies will have their own requirements. You need to be compliant with all policies that might apply!

So, I need to share my data. Does UCL have a platform for data sharing?

Yes, we do! UCL has its own data repository service, the UCL Research Data Repository

I don’t have any data.

The term “data” is used as a shorthand to cover all research outputs, so even if you think you don’t have data, you probably generate something during the course of your research that should be preserved and potentially shared. Even if your field uses a different term you are probably still bound by the data sharing policy.

Here’s is a wide-ranging list of what could be considered “research data”

  • Research notebooks, detailing progress of research and experiments
  • Responses to surveys and questionnaires
  • Software, code, algorithms, and models
  • Measurements from laboratory or field equipment
  • Images (such as photographs, films, scans of documents)
  • Methods, protocols, and experimental procedures
  • Databases of collected information
  • A corpus of writings
  • Audio and video recordings
  • Interview Transcripts
  • Physical samples and objects

If you have an output not included in this list, it could can still be classed as research data!

What on earth is metadata?

Metadata is simply data that describes other data. Here are a few examples:

  • A description of the inclusion criteria for enrolling participants in a study
  • The set of questions used in interviews
  • Any file naming conventions used to keep track of data
  • The parameters used by any equipment used to make measurements
  • The dates and times images were taken
  • Details of quality assurance steps to explain why some data points were deemed to be erroneous and unsuitable for analysis
  • Administrative information such as dates of interviews, experiments or visits to a location

This is not an exhaustive list by any means! Metadata can vary considerably between projects and research fields.

In the same way data might underpin the results of a project, metadata could be said to underpin the methods of a project. If you need to address the issue of metadata, think about what another researcher would need to know to replicate the data as closely as possible.

What resources can I access at UCL to store data safely?

All UCL IT managed storage services have automated backups in place to protect data and are recommended over using your own personal devices or individual cloud storage accounts. There are a few different options depending on your needs:

  • The personal N: drive or S: drives are fine for day-to-day storage of PDFs, office documents and non-sensitive materials.
  • The Research Data Storage Service supports high speed file transfer for large quantities of data and is extremely useful for anyone who want to work with the high-performance computing clusters.
  • The Data Safe Haven is specifically designed to store personal data covered by the Data Protection Act 2018. This secure service helps you meet legal obligations on data security when relevant.
  • Services such as SharePoint and OneDrive can be useful for collaboration with colleagues and allow for functionality such as simultaneous editing of documents.

Need more information?

We have extended guidance on research data management available on our website and the library research data management team can be contacted to discuss specific issues at: lib-researchsupport@ucl.ac.uk

Love Data Week – Research Data Management at UCL: 2020 in review

By Kirsty, on 8 February 2021

To celebrate Love Data Week, the Research Data Management team have prepared a review of 2020, looking back over the past 12 months and reflecting on progress made in a number of areas.

Follow the link below to read the report and find out more about the Research Data Management and Sharing Plan review service, our new online training courses on writing data management plans and open science and scholarship and improved guidance about making research data FAIR – findable, accessible, interoperable and reusable – within the wider open science and scholarship context. You can also find out about the newly revised research data policy which includes updated advice for UCL staff and research students in managing their research outputs

Finally, you can find out about the number, amount and types of research outputs published using the UCL Research Data Repository, as well as the number and variety of views and downloads.

Download and explore the report on the UCL Research Data Repository,

 

Deep Dive: DOIs

By Kirsty, on 8 September 2020

In our recent blog post, PIDs 101, we covered a wide range of Persistent Identifiers (PIDs) and looked at how they link together, and what the future holds for them. This week we are drilling down to investigate Digital Object Identifiers (DOIs) in more detail.

In the last post we discussed DOIs being a unique registration number for a Digital Object, and the fact that a digital object in this context could be an article or a dataset, but it could equally be any of a number of other item types, such as on this list defined by Crossref.

How do DOIs work?

Each publisher, funder or repository that is registered to provide DOIs is given a unique registration number. This number, along with the ‘10.’ common to all DOIs, forms the first part of a DOI, called the prefix – shown below. Each registered provider is then responsible for choosing their own suffix pattern.

 

 

This is where DOIs get extra clever. Each registered provider can construct the suffixes to their own design, and these can be as simple or as complex as needed. For example, the Wellcome trust uses DOIs for identifying grants as well as publications, and PLOS uses different suffixes to identify which articles come from which journal – for example:

 

 

 

In the three PLOS DOI examples above, the unique registration number is 1371. Each suffix starts by designating the item type: journal, and then follows with an acronym of the individual journals themselves, pbio (PLOS Biology), pone (PLOS one) and pgen (PLOS Genetics). Each journal then uses article numbers in a predetermined sequence for the final part of the DOI. These numbers match the article numbers shown in the article citations. Every registered provider needs a scheme like this that they use to generate their DOIs, as it is essential that each item receives a unique DOI.

For every DOI that is generated, it is the responsibility of the provider to send metadata and a link to the top level webpage for the item to their individual registration agency. In the UK this is most likely to be Crossref or Datacite. This metadata is then made openly available so it can be used to build overarching databases or added into other tools and services like the search interface at doi.org. Crossref and DataCite make the metadata and DOIs registered with them openly available via APIs so that it can be used in databases like Europe PubMed Central.

The different publishers, repositories, universities and funders all have a responsibility to keep the metadata of all of the DOIs they generate up to date. This is important in order for the DOI to be persistent. For example, if your chosen journal changes publisher after your article has been published, it is the responsibility of the publisher to facilitate updating the metadata of every article so that you will still be able to find your article using the DOI.

Why is having a DOI beneficial?

The purpose of a DOI is to accurately identify, link to and discriminate between online works. DOIs are unique to the work they identify and permanently link to it. This means that a DOI must link to the authoritative and authentic web presence for the work hosted on a sustainable platform.

So, having a DOI for your work (whatever it may be) means that it will always be findable: even if the journal where it was originally published no longer exists, there will always be a record of your work no matter how much time has passed. It also helps ensure that your work is cited properly, and that every mention of it is correctly attributed and easy to track. If your work has a DOI, it can be included in other tools like Altmetric or Plum Analytics. These tools track mentions of works in social media, news media, policy documents and other places.

How do I get a DOI for my work?

It is relatively unusual for journals to be unable to provide you with a DOI for your article. If your publisher does not have the facility to give you a DOI, or you wish to get a DOI for another type of material, the simplest way to go about getting one is to create a record in a repository that can provide a DOI for you.

At UCL we have the Research Data Repository (RDR) which can accept a wide range of outputs including data, figures, presentations, software, posters, even images and other media. There is the option in the record creation process to ‘Reserve’ a DOI which will become live once the record is checked and verified by the RDR team.

Outside UCL, there are also independent repositories that are able to give you a DOI. You can choose a subject repository appropriate for your data – there is lots of information available on the Research Data Management team website – or a generic one such as the UK Data Archive, Zenodo, Figshare or Dryad.

Open Access and your Research in a COVID-19 World

By Kirsty, on 6 May 2020

On 20 March, days after lockdown began, JISC and partners issued a statement calling for Publishers to help in the global effort to combat COVID-19 and support institutions and students to continue their education by making resources available where possible. Since that day, numerous publishers have made temporary changes to their policies, and have begun to make more content freely available online. The Library has been maintaining a list of these newly open resources on the website, along with other help and advice for finding and using resources remotely. There are also lists of resources available from the British Library as well as a brilliant collated list of data and computational resources from the National Institute of Health.

The Copyright Licensing Agency has also made some temporary adjustments to the licence that allows books to be scanned and shared. Please contact the Teaching & Learning Services team for more information.

In addition, there are now tools that allow you to search the web for trustworthy Open Access versions of content from inside your web browser. Just searching Google can bring up not only illegal copies of material, but also inadvertently support predatory and fake journals. The recommended tool is called Open Access Button. More information about Open Access Button is available here

Open Access choices

Just because publishers are making things open for the time being, doesn’t mean they will stay that way. Be careful about the choices you make for your research – in the long term, will the publisher of your chosen journal stop access to your paper? When you are choosing the journal to submit your research to, take a look at the guidance provided by the Open Access team, and also check Sherpa/Romeo to find out whether you are allowed to share your work on RPS, or even on a pre-print service to get it out there even faster!

Don’t forget that you can use the Research Publications Service (RPS) as well as the Research Data Repository (RDR) to take advantage of Open Access to share all of your research outputs to get them out to the rest of the research community.

Doctoral theses in UCL’s repository

By Patrycja, on 25 October 2018

At UCL, candidates for research degrees are required to deposit an electronic copy of their final thesis in UCL’s Research Publications Service (RPS), to be made open access in UCL’s institutional repository, UCL Discovery. Students can choose to restric public access to their thesis, for a variety of reasons like future publication, copyright restriction or sensitive data, but most are made open access immediately, or after a delay period no longer than 12 months.

The requirement to submit an electronic copy of your thesis as a condition of award has been in place at UCL since 2009. In addition to that, we have retrospectively digitised theses from earlier years, as a part of a collaborative project with ProQuest. So far, about 3,500 theses have been made available in UCL Discovery as a part of this collaboration. Theses are also digitised through the British Library’s e-Theses Online Service (EThOS), upon request.

In total, there are over 10,500 theses available in UCL’s institutional repository, dating as far back as 1933. UCL theses are amongst our most-downloaded items! The most popular is a 1990 thesis, Marketing theories and concepts for the international construction industry, available here. Amongst the theses available there are some completed by notable UCL alumni:

Julian Baggini, philosopher and author of popular books on philosophy, including A Short History of Truth, The Pig that Wants to be Eaten and 99 other thought experiments, and most recently How the World Thinks. Baggini completed his PhD in 1996, and his thesis on philopsphy of idnetity was recently made available here: http://discovery.ucl.ac.uk/10057733/

Adam Rutherford, geneticist and author, has produced several science documentaries, and hosts the BBC 4 radio programme Inside Science. He completed his PhD at UCL in 2002, and his thesis on the role of a specific gene (CHX10) on eye development was recently made available in UCL Discovery: http://discovery.ucl.ac.uk/10057801/

Chris Van Tulleken, together with his twin brother and fellow doctor Xand, makes programmes on various aspects of health, most recently Operation Ouch for CBBC. He is also an infectious diseases doctor and MRC Clinical Research Fellow at University College London Hospital, and completed his PhD in 2017. Chris’ thesis is available here: http://discovery.ucl.ac.uk/1567969/