X Close

Open@UCL Blog

Home

Menu

Getting a Handle on Third-Party Datasets: Researcher Needs and Challenges

By Rafael, on 16 February 2024

Guest post by Michelle Harricharan, Senior Research Data Steward, in celebration of International Love Data Week 2024.

ARC Data Stewards have completed the first phase of work on the third-party datasets project, aiming to help researchers better access and manage data provided to UCL by external organisations.

alt=""

The problem:

Modern research often requires access to large volumes of data generated outside of universities. These datasets, provided to UCL by third parties, are typically generated during routine service delivery or other activities and are used in research to identify patterns and make predictions. UCL research and teaching increasingly rely on access to these datasets to achieve their objectives, ranging from NHS data to large-scale commercial datasets such as those provided by ‘X’ (formerly known as Twitter).

Currently, there is no centrally supported process for research groups seeking to access third-party datasets. Researchers sometimes use departmental procedures to acquire personal or university-wide licenses for third-party datasets. They then transfer, store, document, extract, and undertake actions to minimize information risk before using the data for various analyses. The process to obtain third-party data involves significant overhead, including contracts, compliance (IG), and finance. Delays in acquiring access to data can be a significant barrier to research. Some UCL research teams also provide additional support services such as sharing, managing access to, licensing, and redistributing specialist third-party datasets for other research teams. These teams increasingly take on governance and training responsibilities for these specialist datasets. Concurrently, the e-resources team in the library negotiates access to third-party datasets for UCL staff and students following established library procedures.

It has long been recognized that UCL’s processes for acquiring and managing third-party data are uncoordinated and inefficient, leading to inadvertent duplication, unnecessary expense, and underutilisation of datasets that could support transformative research across multiple projects or research groups. This was recognised in the “Data First, 2019 UCL Research Data Strategy”.

What we did:

Last year, the ARC Data Stewards team reached out to UCL professional services staff and researchers to understand the processes and challenges they faced regarding accessing and using third-party research datasets. We hoped that insights from these conversations could be used to develop more streamlined support and services for researchers and make it easier for them to find and use data already provided to UCL by third parties (where this is within licensing conditions).

During this phase of work, we spoke with 14 members of staff:

  • 7 research teams that manage third-party datasets
  • 7 members of professional services that support or may support the process, including contracts, data protection, legal, Information Services Division (databases), information security, research ethics and integrity, and the library.

What we’ve learned:

An important aspect of this work involved capturing the existing processes researchers use when accessing, managing, storing, sharing, and deleting third-party research data at UCL. This enabled us to understand the range of processes involved in handling this type of data and identify the various stakeholders involved—or who potentially need to be involved. In practice, we found that researchers follow similar processes to access and manage third-party research data, depending on the security of the dataset. However, as there is no central, agreed procedure to support the management of third-party datasets in the organization, different parts of the process may be implemented differently by different teams using the methods and resources available to them. We turned the challenges researchers identified in accessing and managing this type of data into requirements for a suite of services to support the delivery and management of third-party datasets at UCL.

Next steps:

 We have been working on addressing some of the common challenges researchers identified. Researchers noted that getting contracts agreed and signed off takes too long, so we reached out to the RIS Contract Services Team, who are actively working to build additional capacity into the service as part of a wider transformation programme.

Also, information about accessing and managing third-party datasets is fragmented, and researchers often don’t know where to go for help, particularly for governance and technical advice. To counter this, we are bringing relevant professional services together to agree on a process for supporting access to third-party datasets.

Finally, respondents noted that there is too much duplication of data. The costs for data are high, and it’s not easy to know what’s already available internally to reuse. In response, we are building a searchable catalogue of third-party datasets already licensed to UCL researchers and available for others to request access to reuse.

Our progress will be reported to the Research Data Working Group, which acts as a central point of contact and a forum for discussion on aspects of research data support at UCL. The group advocates for continual improvement of research data governance.

If you would like to know more about any of these strands of work, please do not hesitate to reach out (email: researchdata-support@ucl.ac.uk). We are keen to work with researchers and other professional services to solve these shared challenges and accelerate research and collaboration using third-party datasets.

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Research Data Stewardship at UCL

By Rafael, on 13 February 2024

Guest post by James A J Wilson, Head of Research Data in Advance Research Computing at UCL, in celebration of International Love Data Week 2024.

The image depicts a vibrant poster for International Love Data Week 2024. In the center of the poster, the main theme 'My Kind of Data' is displayed at the centre. Below it, the hashtag #lovedata2024 is displayed

A Research Data Steward is a relatively recent term for someone undertaking a range of jobs that have already been undertaken for some time, albeit sometimes without due appreciation. If you have helped researchers manage their data – helping with data management plans, adding metadata, providing services for data hosting, preparing datasets for analysis, scripting data transformations, readying data for sharing or publication, or engaging in long-term data preservation and curation – you may have unwittingly been a data steward.

As the importance of data for enabling research reproducibility and transparency becomes more widely recognized, so does the importance of good data stewardship.  In 2016, the European Commission’s publication ‘Realising the European Open Science Cloud’, estimated that, “on average, about 5% of research expenditure should be spent on properly managing and stewarding data”[1]. Whilst the world and UCL are not at that level yet, the importance of managing research data more effectively has not passed the university by.

Advanced Research Computing (ARC) has established four different Research Technology professions. Besides our Research Software Engineers (who already have more than a decade of experience behind them at UCL) there are now groups of Research Infrastructure Developers, Data Scientists, and Data Stewards. None of the roles that the teams take on are new, but there are advantages to treating the people who make up those professions as members of a profession, rather than assorted and frequently rather isolated postdocs. Firstly, we now have a pool of people who can exchange experiences, impart knowledge to one another, and lend each other a bit of moral support. Secondly, it enables the development of focused career paths. No longer do research technology professionals need to kick their heels working on barely recognized tasks until they get an opportunity to break into the research big time. Their importance is recognized and can be rewarded.

There are now more than a dozen professional research data stewards in ARC. Team members develop and support services, collaborate with research teams from other departments to ensure that their data is as well managed and as FAIR as possible (Findable, Accessible, Interoperable, and Reusable), and undertake research themselves. Examples of research projects include work with eChild; preparing data packs for the Medical Research Council Clinical Trials Unit (MRC CTU); supplying the MAESaM and CAAL archaeology projects with geospatial data and mapping software expertise and helping to prepare bids across a range of disciplines. Some projects are more infrastructure based, such as the EU-funded DICE project to establish services for data processing pipelines. Other work is focused on improving UCL’s services and their coordination, such as the ‘3rd-party data’ project, which seeks to help researchers obtain data from other organisations and enable broader awareness of and access to that data. We’re also working with departments, helping them migrate data to centrally managed storage.

The ARC Research Data Stewards are not the only people engaged in data stewardship at UCL. Many people across different projects and teams are involved in aspects of data stewardship. Most obviously, our close colleagues in UCL Library’s Research Data Management team, but also those working on services to provide particular datasets or metadata, plus all those on research contracts working away at polishing and processing data in labs, libraries, and offices across Bloomsbury and beyond. We will shortly begin forming a Data Stewardship Community of Practice, to create a forum where everyone involved in this important work can exchange ideas and start to form a sense of what really constitutes ‘best practice’.

If you are based at UCL and are potentially interested in working with us, drop us a line at researchdata-support@ucl.ac.uk.

Get involved!

alt=""

The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

 

 

[1] European Commission, Directorate-General for Research and Innovation, Realising the European open science cloud – First report and recommendations of the Commission high level expert group on the European open science cloud, Publications Office, 2016, https://data.europa.eu/doi/10.2777/940154