X Close

Open@UCL Blog

Home

Menu

Getting a Handle on Third-Party Datasets: Researcher Needs and Challenges

By Rafael, on 16 February 2024

Guest post by Michelle Harricharan, Senior Research Data Steward, in celebration of International Love Data Week 2024.

ARC Data Stewards have completed the first phase of work on the third-party datasets project, aiming to help researchers better access and manage data provided to UCL by external organisations.

alt=""

The problem:

Modern research often requires access to large volumes of data generated outside of universities. These datasets, provided to UCL by third parties, are typically generated during routine service delivery or other activities and are used in research to identify patterns and make predictions. UCL research and teaching increasingly rely on access to these datasets to achieve their objectives, ranging from NHS data to large-scale commercial datasets such as those provided by ‘X’ (formerly known as Twitter).

Currently, there is no centrally supported process for research groups seeking to access third-party datasets. Researchers sometimes use departmental procedures to acquire personal or university-wide licenses for third-party datasets. They then transfer, store, document, extract, and undertake actions to minimize information risk before using the data for various analyses. The process to obtain third-party data involves significant overhead, including contracts, compliance (IG), and finance. Delays in acquiring access to data can be a significant barrier to research. Some UCL research teams also provide additional support services such as sharing, managing access to, licensing, and redistributing specialist third-party datasets for other research teams. These teams increasingly take on governance and training responsibilities for these specialist datasets. Concurrently, the e-resources team in the library negotiates access to third-party datasets for UCL staff and students following established library procedures.

It has long been recognized that UCL’s processes for acquiring and managing third-party data are uncoordinated and inefficient, leading to inadvertent duplication, unnecessary expense, and underutilisation of datasets that could support transformative research across multiple projects or research groups. This was recognised in the “Data First, 2019 UCL Research Data Strategy”.

What we did:

Last year, the ARC Data Stewards team reached out to UCL professional services staff and researchers to understand the processes and challenges they faced regarding accessing and using third-party research datasets. We hoped that insights from these conversations could be used to develop more streamlined support and services for researchers and make it easier for them to find and use data already provided to UCL by third parties (where this is within licensing conditions).

During this phase of work, we spoke with 14 members of staff:

  • 7 research teams that manage third-party datasets
  • 7 members of professional services that support or may support the process, including contracts, data protection, legal, Information Services Division (databases), information security, research ethics and integrity, and the library.

What we’ve learned:

An important aspect of this work involved capturing the existing processes researchers use when accessing, managing, storing, sharing, and deleting third-party research data at UCL. This enabled us to understand the range of processes involved in handling this type of data and identify the various stakeholders involved—or who potentially need to be involved. In practice, we found that researchers follow similar processes to access and manage third-party research data, depending on the security of the dataset. However, as there is no central, agreed procedure to support the management of third-party datasets in the organization, different parts of the process may be implemented differently by different teams using the methods and resources available to them. We turned the challenges researchers identified in accessing and managing this type of data into requirements for a suite of services to support the delivery and management of third-party datasets at UCL.

Next steps:

 We have been working on addressing some of the common challenges researchers identified. Researchers noted that getting contracts agreed and signed off takes too long, so we reached out to the RIS Contract Services Team, who are actively working to build additional capacity into the service as part of a wider transformation programme.

Also, information about accessing and managing third-party datasets is fragmented, and researchers often don’t know where to go for help, particularly for governance and technical advice. To counter this, we are bringing relevant professional services together to agree on a process for supporting access to third-party datasets.

Finally, respondents noted that there is too much duplication of data. The costs for data are high, and it’s not easy to know what’s already available internally to reuse. In response, we are building a searchable catalogue of third-party datasets already licensed to UCL researchers and available for others to request access to reuse.

Our progress will be reported to the Research Data Working Group, which acts as a central point of contact and a forum for discussion on aspects of research data support at UCL. The group advocates for continual improvement of research data governance.

If you would like to know more about any of these strands of work, please do not hesitate to reach out (email: researchdata-support@ucl.ac.uk). We are keen to work with researchers and other professional services to solve these shared challenges and accelerate research and collaboration using third-party datasets.

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

FAIR Data in Practice

By Rafael, on 15 February 2024

Guest post by Victor Olago, Senior Research Data Steward and Shipra Suman, Research Data Steward, in celebration of International Love Data Week 2024.

Image depicting the FAIR guiding principles for data resources: Findable, Accessible, Interoperable, and Reusable. Created by SangyaPundir.

Credit: Sangya Pundir, CC BY-SA 4.0 via Wikimedia Commons

The problem:

We all know sharing is caring, and so data needs to be shared to explore its full potential and usefulness. This makes it possible for researchers to answer questions that were not the primary research objective of the initial study. The shared data also allows other researchers to replicate the findings underpinning the manuscript, which is important in knowledge sharing. It also allows other researchers to integrate these datasets with other existing datasets, either already collected or which will be collected in the future.

There are several factors that can hamper research data sharing. These might include a lack of technical skill, inadequate funding, an absence of data sharing agreements, or ethical barriers. As Data Stewards we support appropriate ways of collecting, standardizing, using, sharing, and archiving research data. We are also responsible for advocating best practices and policies on data. One of such best practices and policies includes the promotion and the implementation of the FAIR data principles.

FAIR is an acronym for Findable, Accessible Interoperable and Reusable [1]. FAIR is about making data discoverable to other researchers, but it does not translate exactly to Open Data. Some data can only be shared with others once security considerations have been addressed. For researchers to use the data, a concept-note or protocol must be in place to help gatekeepers of that data understand what each data request is meant for, how the data will be processed and expected outcomes of the study or sub study. Findability and Accessibility is ensured through metadata and enforcing the use of persistent identifiers for a given dataset. Interoperability relates to applying standards and encoding such as ICD-10, ICDO-3 [2] and, lastly, Reusability means making it possible for the data to be used by other researchers.

What we are doing:

We are currently supporting a data reuse project at the Medical Research Council Clinical Trials Unit (MRC CTU). This project enables the secondary analysis of clinical trial data. We use pseudonymisation techniques and prepare metadata that goes along with each data set.

Pseudonymisation helps process personal data in such a way that the data cannot be attributed to specific data subjects without the use of additional information [3]. This reduces the risks of reidentification of personal data. When data is pseudonymized direct identifiers are dropped while potentially identifiable information is coded. Data may also be aggregated. For example, age is transformed to age groups. There are instances where data is sampled from the original distribution, allowing only sharing of the sample data. Pseudonymised data is still personal data which must be protected with GDPR regulation [4].

The metadata makes it possible for other researchers to locate and request access to reuse clinical trials data at MRC CTU. With the extensive documentation that is attached, when access is approved, reanalysis and or integration with other datasets are made possible.  Pseudonymisation and metadata preparation helps in promoting FAIR data.

We have so far prepared one data-pack for RT01 studies which is ‘A randomized controlled trial of high dose versus standard dose conformal radiotherapy for localized prostate cancer’ which is currently in review phase and almost ready to share with requestors. Over the next few years, we hope to repeat and standardise the process for past, current and future studies of Cancer, HIV, and other trials.

References:    

  1. 8 Pillars of Open Science.
  2. Digital N: National Clinical Coding Standards ICD-10 5th Edition (2022), 5 edn; 2022.
  3. Anonymisation and Pseudonymisation.
  4. Complete guide to GDPR compliance.

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Finding Data Management Tools for Your Research Discipline

By Rafael, on 14 February 2024

Guest post by Iona Preston, Research Data Support Officer, in celebration of International Love Data Week 2024.

Various gardening tools arranged on a dark wooden background

Photo by Todd Quackenbush on Unsplash.

While there are a lot of general resources to support good research data management practices – for example UCL’s Research Data Management webpages – you might sometimes be looking for something a bit more specific. It’s good practice to store your data in a research data repository that is subject specific, where other people in your research discipline are most likely to search for data. However, you might not know where to begin your search. You could be looking for discipline-specific metadata standards, so your data is more easily reusable by academic colleagues in your subject area. This is where subject-specific research data management resources become valuable. Here are some resources for specific subject areas and disciplines that you might find useful: 

  • The Research Data Management Toolkit for Life Sciences
    This resource guides you through the entire process of managing research data, explaining which tools to use at each stage of the research data lifecycle. It includes sections on specific life science research areas, from plant sciences to rare disease data. These sections also cover research community-specific repositories and examples of metadata standards. 
  • Visual arts data skills for researchers: Toolkits
    This consists of two different tutorials covering an introduction to research data management in the visual arts and how to create an appropriate data management plan. 
  • Consortium of European Social Science Data Archives
    CESSDA brings together data archives from across Europe in a searchable catalogue. Their website includes various resources for social scientists to learn more about data management and sharing, along with an extensive training section and a Data Management Expert Guide to lead you through the data management process. 
  • Research Data Alliance for Disciplines (various subject areas)
    The Research Data Alliance is an international initiative to promote data sharing. They have a webpage with special interest groups in various academic research areas, including agriculture, biomedical sciences, chemistry, digital humanities, social science, and librarianship, with useful resource lists for each discipline. 
  • RDA Metadata Standards Catalogue (all subject areas)
    This directory helps you find a suitable metadata scheme to describe your data, organized by subject area, featuring specific schemes across a wide range of academic disciplines. 
  • Re3Data (all subject areas)
    When it comes to sharing data, we always recommend you check if there’s a subject specific repository first, as that’s the best place to share. If you don’t know where to start finding one, this is a great place to look with a convenient browse feature to explore available options within your discipline.

These are only some of the different discipline specific tools that are available. You can find more for your discipline on the Research Data Management webpages. If you need any help and advice on finding data management resources, please get in touch with the Research Data Management team on lib-researchsupport@ucl.ac.uk 

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Research Data Stewardship at UCL

By Rafael, on 13 February 2024

Guest post by James A J Wilson, Head of Research Data in Advance Research Computing at UCL, in celebration of International Love Data Week 2024.

The image depicts a vibrant poster for International Love Data Week 2024. In the center of the poster, the main theme 'My Kind of Data' is displayed at the centre. Below it, the hashtag #lovedata2024 is displayed

A Research Data Steward is a relatively recent term for someone undertaking a range of jobs that have already been undertaken for some time, albeit sometimes without due appreciation. If you have helped researchers manage their data – helping with data management plans, adding metadata, providing services for data hosting, preparing datasets for analysis, scripting data transformations, readying data for sharing or publication, or engaging in long-term data preservation and curation – you may have unwittingly been a data steward.

As the importance of data for enabling research reproducibility and transparency becomes more widely recognized, so does the importance of good data stewardship.  In 2016, the European Commission’s publication ‘Realising the European Open Science Cloud’, estimated that, “on average, about 5% of research expenditure should be spent on properly managing and stewarding data”[1]. Whilst the world and UCL are not at that level yet, the importance of managing research data more effectively has not passed the university by.

Advanced Research Computing (ARC) has established four different Research Technology professions. Besides our Research Software Engineers (who already have more than a decade of experience behind them at UCL) there are now groups of Research Infrastructure Developers, Data Scientists, and Data Stewards. None of the roles that the teams take on are new, but there are advantages to treating the people who make up those professions as members of a profession, rather than assorted and frequently rather isolated postdocs. Firstly, we now have a pool of people who can exchange experiences, impart knowledge to one another, and lend each other a bit of moral support. Secondly, it enables the development of focused career paths. No longer do research technology professionals need to kick their heels working on barely recognized tasks until they get an opportunity to break into the research big time. Their importance is recognized and can be rewarded.

There are now more than a dozen professional research data stewards in ARC. Team members develop and support services, collaborate with research teams from other departments to ensure that their data is as well managed and as FAIR as possible (Findable, Accessible, Interoperable, and Reusable), and undertake research themselves. Examples of research projects include work with eChild; preparing data packs for the Medical Research Council Clinical Trials Unit (MRC CTU); supplying the MAESaM and CAAL archaeology projects with geospatial data and mapping software expertise and helping to prepare bids across a range of disciplines. Some projects are more infrastructure based, such as the EU-funded DICE project to establish services for data processing pipelines. Other work is focused on improving UCL’s services and their coordination, such as the ‘3rd-party data’ project, which seeks to help researchers obtain data from other organisations and enable broader awareness of and access to that data. We’re also working with departments, helping them migrate data to centrally managed storage.

The ARC Research Data Stewards are not the only people engaged in data stewardship at UCL. Many people across different projects and teams are involved in aspects of data stewardship. Most obviously, our close colleagues in UCL Library’s Research Data Management team, but also those working on services to provide particular datasets or metadata, plus all those on research contracts working away at polishing and processing data in labs, libraries, and offices across Bloomsbury and beyond. We will shortly begin forming a Data Stewardship Community of Practice, to create a forum where everyone involved in this important work can exchange ideas and start to form a sense of what really constitutes ‘best practice’.

If you are based at UCL and are potentially interested in working with us, drop us a line at researchdata-support@ucl.ac.uk.

Get involved!

alt=""

The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

 

 

[1] European Commission, Directorate-General for Research and Innovation, Realising the European open science cloud – First report and recommendations of the Commission high level expert group on the European open science cloud, Publications Office, 2016, https://data.europa.eu/doi/10.2777/940154

Research Data Management: A year in review

By Rafael, on 12 February 2024

Guest post by Dr Christiana McMahon, Research Data Support Officer, in celebration of International Love Data Week 2024.

From that spark of an idea through to publishing research findings, the Research Data Management team have once again been on-hand to support staff and students.

What’s been happening?

A new version of the Research Data Repository is now available simplifying the process of archiving and preserving research outputs here at UCL for the longer-term.

In 2023 we published 200 items 151 of which were datasets.

Graph to show items published in the UCL Research Repository in 2023.

 

We had over 120,000 downloads and over 240,000 viewsOver the past year…

  • The most downloaded record was: Griffiths, David; Boehm, Jan (2019). SynthCity Dataset – Area 1. University College London. Dataset.
  • The most viewed record was: Heenan, Thomas; Jnawali, Anmol; Kok, Matt; Tranter, Thomas; Tan, Chun; Dimitrijevic, Alexander; et al. (2020). Lithium-ion Battery INR18650 MJ1 Data: 400 Electrochemical Cycles (EIL-015). University College London. Dataset.
  • The most cited record was: Manescu, Petru; Shaw, Mike; Elmi, Muna; Zajiczek, Lydia; Claveau, Remy; Pawar, Vijay; et al. (2020). Giemsa Stained Thick Blood Films for Clinical Microscopy Malaria Diagnosis with Deep Neural Networks Dataset. University College London. Dataset.

More information is available about the UCL Research Data Repository.  Alternatively, check our FAQs.

Data Management Plan Reviews

The RDM team can review data management plans providing researchers with feedback in-line with UCL’s expectations and funding agency requirements where these apply. In 2023, we reviewed 32 data management plans covering over 10 different funding agencies. More information is available in our website.

Mini-tutorial: Research data lifecycle

The RDM team often refer to the research data lifecycle, but what is it? Essentially, these are the different stages of the research process from planning and preparation through to archiving your research outputs, making them discoverable to the wider research community and members of the public.

The four stages:

1: Get ready – You’ve had an idea for a research study so it’s time to start making plans and getting prepared. Have you considered writing a data management plan?

  • Remember, if you are in receipt of external funding, there may be data management requirements to consider.
  • Feel free to reach out to Open Science and Research Support to assist you.

2: Let’s go – You are now actively researching putting all those research plans into action.

  • Don’t forget to revisit your data management plan and update it to reflect your latest decision making.
  • It’s also useful to consider documenting your research as you progress.

3: Ta-dah – The research is complete and it’s time to archive your research outputs to preserve them for the longer-term.

  • Aim to utilise subject-specific archives and repositories where possible.
  • Creating a metadata record in a public facing online catalogue with links to any related publications can be useful to building online networks of linked research outputs.
  • Consider making your research outputs as openly accessible as possible remembering that controlling or restricting access is fine as long as it is justified and there is a set data access protocol in place to facilitate a data access request.
  • Did you know you can archive most research outputs in the UCL Research Data Repository?

4: Wow! I think I can use thismaking your research discoverable to others for potential reuse can help to maximise research opportunities

And so the research data lifecycle begins again!

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Join us for International Love Data Week!

By Rafael, on 7 February 2024

Guest post by Iona Preston, Research Data Support Officer.

Next week (February 12-16), we’re excited to be celebrating International Love Data Week. We’ll be looking at how data is shared and reused within our UCL and academic community, highlighting the support available across UCL for these initiatives. This year’s theme, “My Kind of Data,” focuses on data equity, inclusion, and disciplinary communities. We’ll be blogging and posting on X throughout the week, so please join us to learn more.

International Love Data Week 2024 poster

Here’s a sneak preview of what’s coming up:

  • Did you know the Research Data Management team can review your data management plan and support you in publishing your data in our Research Data Repository? Find out more about our last year in review with Christiana McMahon, Research Data Support Officer.
  • Have you met any members of our Data Stewards team? James Wilson, Head of Research Data Services, will be explaining how you can collaborate with them to streamline the process of managing and preserving your data, thereby supporting reproducibility and transparency in your research.
  • Are you seeking tools to support best practices in data management for your specific discipline? We have some suggestions from Iona Preston, Research Data Support Officer.
  • You may have heard of FAIR data – but what does that mean in practice? Join Research Data Steward Shipra Suman and Senior Research Data Steward Victor Olago as they discuss projects where they’ve supported making data FAIR.
  • And, finally, to round off the week, Senior Research Data Steward Michelle Harricharan will talk about a project the Data Stewards are carrying out to better support UCL researchers in accessing and managing external datasets.

We look forward to engaging with you throughout the week and hope you enjoy learning more about research data at UCL.

And get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Open Access Week: A year in review

By Kirsty, on 25 October 2023

It has become somewhat of a tradition now for there to be a post during Open Access Week that reviews the previous year. While the middle of October may seem like a strange time to take stock, it is after all the anniversary of the Launch of the Office for Open Science & Scholarship and we like to stop and celebrate another year.

This year we celebrated another successful conference and our first back in person for us! We also had another first with a workshop taking place simultaneously online and in person on the topic of equity in authorship. This work has been fed into a UCL statement on Authorship that will be released in the coming months.

We also released a brand new page bringing together all of the training and support information across all the Open Science affiliated teams to make it easier to navigate and get your questions answered.

In the past year all of the teams that form part of the office have worked hard on developing new services and making improvements to existing ones.

The Open Access team have been working hard updating RPS and the new Profiles tool to replace IRIS. They also support both Gold and Green Open Access Activity across the university.

Over 18,500 items have been uploaded to UCL Discovery in the last 12 months, bringing the total to over 166,000! Of these, there are over 23,000 theses to be explored. They have also made 3,383 papers Gold OA, 2,700 of which were using our transformative agreements with publishers.

The Research Data Management team have been working hard behind the scenes doing an overhaul of their support materials, testing new materials for training and supporting the ever-growing Research Data Repository.

In the past 12 months we have had over 1000 new datasets from 226 users. Quite notably, we have had over 200,000 downloads which just goes to show the value of sharing your data as well as your other research outputs!

The Citizen Science support service has moved on in leaps and bounds since this time last year, creating content, liaising with colleagues across the university, collaborating to launch the UCL Citizen Science Academy and this week we were able to launch the brand new Citizen Science online community.

Hopefully that gives you a taste of what we have been up to and the numbers of the last year, scroll back through the blog for more information and to get an idea of the detail of what we have been up to. It’s been a great year and here’s to the next!

Happy Open Access Week!

Open Access Week activities

By Kirsty, on 13 October 2023

Open Access Week is almost upon us!

Keep your eyes open for a series of blog posts on Creative Commons, citizen science, the recent activities of UCL Press and an exciting review of a year in open access.

This year’s theme is Community over Commercialisation. Creative Commons licences sit at the heart of this discussion. To this end, we invite you to a drop-in session on Tuesday the 24th of October to address questions around creating and using Creative Commons materials. The session is on Teams and you can join at any time. Bring along your questions or just join to discuss how CC supports equitable access to a wide range of works, from scholarly publications to open and FAIR data to images and music.

We have already announced our wonderful winners of the Open Science and Scholarship awards. UCL colleagues can also join us on Wednesday to celebrate and network with the winners, tickets are still available!

We will be posting and tweeting regularly throughout the week about the services and support available to researchers and I hope that we can get some good discussions going!

See you there!

UCL Research Data Repository: Publishing research outputs for staff and PhD students across in 2022

By Harry, on 17 February 2023

Dr Christiana McMahon & Christine Buckley – Research Data Support Officers

At UCL, we have a dedicated Research Data Repository. This can be used by staff and research students to archive and preserve research outputs. This can be anything from your datasets to a poster you presented at a conference.

What have we published?

In total, we published 162 items!

Total number of views in 2022: 172059

Total number of downloads in 2022: 117830

What is a Data Management Plan (DMP)?

By Harry, on 15 February 2023

Dr Christiana McMahon & Christine Buckley – Research Data Support Officers

A Data Management Plan or DMP is an essential part of research data management and is usually completed in the first stage of any research project. It can help you think clearly about what data you will collect and how to store, curate, back up, archive and share this data.

You’ll find that many funders include a DMP as part of their grant applications, and we are more than happy to help review these.

You can check our recently updated webpage to learn how to create your DMP. 

How do I get support?

Just email us a copy of your plan to lib-researchsupport@ucl.ac.uk, or you can create your plan in DMPonline and request feedback.

How many DMPs have we reviewed?

Over the course of 2022, we reviewed a total of 39 plans, most of which supported grant applications submitted by researchers here at UCL.

The most popular months for sharing plans for feedback with the Research Data Management team were… April, June and October!