X Close

Open@UCL Blog

Home

Menu

Getting a Handle on Third-Party Datasets: Researcher Needs and Challenges

By Rafael, on 16 February 2024

Guest post by Michelle Harricharan, Senior Research Data Steward, in celebration of International Love Data Week 2024.

ARC Data Stewards have completed the first phase of work on the third-party datasets project, aiming to help researchers better access and manage data provided to UCL by external organisations.

alt=""

The problem:

Modern research often requires access to large volumes of data generated outside of universities. These datasets, provided to UCL by third parties, are typically generated during routine service delivery or other activities and are used in research to identify patterns and make predictions. UCL research and teaching increasingly rely on access to these datasets to achieve their objectives, ranging from NHS data to large-scale commercial datasets such as those provided by ‘X’ (formerly known as Twitter).

Currently, there is no centrally supported process for research groups seeking to access third-party datasets. Researchers sometimes use departmental procedures to acquire personal or university-wide licenses for third-party datasets. They then transfer, store, document, extract, and undertake actions to minimize information risk before using the data for various analyses. The process to obtain third-party data involves significant overhead, including contracts, compliance (IG), and finance. Delays in acquiring access to data can be a significant barrier to research. Some UCL research teams also provide additional support services such as sharing, managing access to, licensing, and redistributing specialist third-party datasets for other research teams. These teams increasingly take on governance and training responsibilities for these specialist datasets. Concurrently, the e-resources team in the library negotiates access to third-party datasets for UCL staff and students following established library procedures.

It has long been recognized that UCL’s processes for acquiring and managing third-party data are uncoordinated and inefficient, leading to inadvertent duplication, unnecessary expense, and underutilisation of datasets that could support transformative research across multiple projects or research groups. This was recognised in the “Data First, 2019 UCL Research Data Strategy”.

What we did:

Last year, the ARC Data Stewards team reached out to UCL professional services staff and researchers to understand the processes and challenges they faced regarding accessing and using third-party research datasets. We hoped that insights from these conversations could be used to develop more streamlined support and services for researchers and make it easier for them to find and use data already provided to UCL by third parties (where this is within licensing conditions).

During this phase of work, we spoke with 14 members of staff:

  • 7 research teams that manage third-party datasets
  • 7 members of professional services that support or may support the process, including contracts, data protection, legal, Information Services Division (databases), information security, research ethics and integrity, and the library.

What we’ve learned:

An important aspect of this work involved capturing the existing processes researchers use when accessing, managing, storing, sharing, and deleting third-party research data at UCL. This enabled us to understand the range of processes involved in handling this type of data and identify the various stakeholders involved—or who potentially need to be involved. In practice, we found that researchers follow similar processes to access and manage third-party research data, depending on the security of the dataset. However, as there is no central, agreed procedure to support the management of third-party datasets in the organization, different parts of the process may be implemented differently by different teams using the methods and resources available to them. We turned the challenges researchers identified in accessing and managing this type of data into requirements for a suite of services to support the delivery and management of third-party datasets at UCL.

Next steps:

 We have been working on addressing some of the common challenges researchers identified. Researchers noted that getting contracts agreed and signed off takes too long, so we reached out to the RIS Contract Services Team, who are actively working to build additional capacity into the service as part of a wider transformation programme.

Also, information about accessing and managing third-party datasets is fragmented, and researchers often don’t know where to go for help, particularly for governance and technical advice. To counter this, we are bringing relevant professional services together to agree on a process for supporting access to third-party datasets.

Finally, respondents noted that there is too much duplication of data. The costs for data are high, and it’s not easy to know what’s already available internally to reuse. In response, we are building a searchable catalogue of third-party datasets already licensed to UCL researchers and available for others to request access to reuse.

Our progress will be reported to the Research Data Working Group, which acts as a central point of contact and a forum for discussion on aspects of research data support at UCL. The group advocates for continual improvement of research data governance.

If you would like to know more about any of these strands of work, please do not hesitate to reach out (email: researchdata-support@ucl.ac.uk). We are keen to work with researchers and other professional services to solve these shared challenges and accelerate research and collaboration using third-party datasets.

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Finding Data Management Tools for Your Research Discipline

By Rafael, on 14 February 2024

Guest post by Iona Preston, Research Data Support Officer, in celebration of International Love Data Week 2024.

Various gardening tools arranged on a dark wooden background

Photo by Todd Quackenbush on Unsplash.

While there are a lot of general resources to support good research data management practices – for example UCL’s Research Data Management webpages – you might sometimes be looking for something a bit more specific. It’s good practice to store your data in a research data repository that is subject specific, where other people in your research discipline are most likely to search for data. However, you might not know where to begin your search. You could be looking for discipline-specific metadata standards, so your data is more easily reusable by academic colleagues in your subject area. This is where subject-specific research data management resources become valuable. Here are some resources for specific subject areas and disciplines that you might find useful: 

  • The Research Data Management Toolkit for Life Sciences
    This resource guides you through the entire process of managing research data, explaining which tools to use at each stage of the research data lifecycle. It includes sections on specific life science research areas, from plant sciences to rare disease data. These sections also cover research community-specific repositories and examples of metadata standards. 
  • Visual arts data skills for researchers: Toolkits
    This consists of two different tutorials covering an introduction to research data management in the visual arts and how to create an appropriate data management plan. 
  • Consortium of European Social Science Data Archives
    CESSDA brings together data archives from across Europe in a searchable catalogue. Their website includes various resources for social scientists to learn more about data management and sharing, along with an extensive training section and a Data Management Expert Guide to lead you through the data management process. 
  • Research Data Alliance for Disciplines (various subject areas)
    The Research Data Alliance is an international initiative to promote data sharing. They have a webpage with special interest groups in various academic research areas, including agriculture, biomedical sciences, chemistry, digital humanities, social science, and librarianship, with useful resource lists for each discipline. 
  • RDA Metadata Standards Catalogue (all subject areas)
    This directory helps you find a suitable metadata scheme to describe your data, organized by subject area, featuring specific schemes across a wide range of academic disciplines. 
  • Re3Data (all subject areas)
    When it comes to sharing data, we always recommend you check if there’s a subject specific repository first, as that’s the best place to share. If you don’t know where to start finding one, this is a great place to look with a convenient browse feature to explore available options within your discipline.

These are only some of the different discipline specific tools that are available. You can find more for your discipline on the Research Data Management webpages. If you need any help and advice on finding data management resources, please get in touch with the Research Data Management team on lib-researchsupport@ucl.ac.uk 

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Research Data Stewardship at UCL

By Rafael, on 13 February 2024

Guest post by James A J Wilson, Head of Research Data in Advance Research Computing at UCL, in celebration of International Love Data Week 2024.

The image depicts a vibrant poster for International Love Data Week 2024. In the center of the poster, the main theme 'My Kind of Data' is displayed at the centre. Below it, the hashtag #lovedata2024 is displayed

A Research Data Steward is a relatively recent term for someone undertaking a range of jobs that have already been undertaken for some time, albeit sometimes without due appreciation. If you have helped researchers manage their data – helping with data management plans, adding metadata, providing services for data hosting, preparing datasets for analysis, scripting data transformations, readying data for sharing or publication, or engaging in long-term data preservation and curation – you may have unwittingly been a data steward.

As the importance of data for enabling research reproducibility and transparency becomes more widely recognized, so does the importance of good data stewardship.  In 2016, the European Commission’s publication ‘Realising the European Open Science Cloud’, estimated that, “on average, about 5% of research expenditure should be spent on properly managing and stewarding data”[1]. Whilst the world and UCL are not at that level yet, the importance of managing research data more effectively has not passed the university by.

Advanced Research Computing (ARC) has established four different Research Technology professions. Besides our Research Software Engineers (who already have more than a decade of experience behind them at UCL) there are now groups of Research Infrastructure Developers, Data Scientists, and Data Stewards. None of the roles that the teams take on are new, but there are advantages to treating the people who make up those professions as members of a profession, rather than assorted and frequently rather isolated postdocs. Firstly, we now have a pool of people who can exchange experiences, impart knowledge to one another, and lend each other a bit of moral support. Secondly, it enables the development of focused career paths. No longer do research technology professionals need to kick their heels working on barely recognized tasks until they get an opportunity to break into the research big time. Their importance is recognized and can be rewarded.

There are now more than a dozen professional research data stewards in ARC. Team members develop and support services, collaborate with research teams from other departments to ensure that their data is as well managed and as FAIR as possible (Findable, Accessible, Interoperable, and Reusable), and undertake research themselves. Examples of research projects include work with eChild; preparing data packs for the Medical Research Council Clinical Trials Unit (MRC CTU); supplying the MAESaM and CAAL archaeology projects with geospatial data and mapping software expertise and helping to prepare bids across a range of disciplines. Some projects are more infrastructure based, such as the EU-funded DICE project to establish services for data processing pipelines. Other work is focused on improving UCL’s services and their coordination, such as the ‘3rd-party data’ project, which seeks to help researchers obtain data from other organisations and enable broader awareness of and access to that data. We’re also working with departments, helping them migrate data to centrally managed storage.

The ARC Research Data Stewards are not the only people engaged in data stewardship at UCL. Many people across different projects and teams are involved in aspects of data stewardship. Most obviously, our close colleagues in UCL Library’s Research Data Management team, but also those working on services to provide particular datasets or metadata, plus all those on research contracts working away at polishing and processing data in labs, libraries, and offices across Bloomsbury and beyond. We will shortly begin forming a Data Stewardship Community of Practice, to create a forum where everyone involved in this important work can exchange ideas and start to form a sense of what really constitutes ‘best practice’.

If you are based at UCL and are potentially interested in working with us, drop us a line at researchdata-support@ucl.ac.uk.

Get involved!

alt=""

The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

 

 

[1] European Commission, Directorate-General for Research and Innovation, Realising the European open science cloud – First report and recommendations of the Commission high level expert group on the European open science cloud, Publications Office, 2016, https://data.europa.eu/doi/10.2777/940154

Research Data Management: A year in review

By Rafael, on 12 February 2024

Guest post by Dr Christiana McMahon, Research Data Support Officer, in celebration of International Love Data Week 2024.

From that spark of an idea through to publishing research findings, the Research Data Management team have once again been on-hand to support staff and students.

What’s been happening?

A new version of the Research Data Repository is now available simplifying the process of archiving and preserving research outputs here at UCL for the longer-term.

In 2023 we published 200 items 151 of which were datasets.

Graph to show items published in the UCL Research Repository in 2023.

 

We had over 120,000 downloads and over 240,000 viewsOver the past year…

  • The most downloaded record was: Griffiths, David; Boehm, Jan (2019). SynthCity Dataset – Area 1. University College London. Dataset.
  • The most viewed record was: Heenan, Thomas; Jnawali, Anmol; Kok, Matt; Tranter, Thomas; Tan, Chun; Dimitrijevic, Alexander; et al. (2020). Lithium-ion Battery INR18650 MJ1 Data: 400 Electrochemical Cycles (EIL-015). University College London. Dataset.
  • The most cited record was: Manescu, Petru; Shaw, Mike; Elmi, Muna; Zajiczek, Lydia; Claveau, Remy; Pawar, Vijay; et al. (2020). Giemsa Stained Thick Blood Films for Clinical Microscopy Malaria Diagnosis with Deep Neural Networks Dataset. University College London. Dataset.

More information is available about the UCL Research Data Repository.  Alternatively, check our FAQs.

Data Management Plan Reviews

The RDM team can review data management plans providing researchers with feedback in-line with UCL’s expectations and funding agency requirements where these apply. In 2023, we reviewed 32 data management plans covering over 10 different funding agencies. More information is available in our website.

Mini-tutorial: Research data lifecycle

The RDM team often refer to the research data lifecycle, but what is it? Essentially, these are the different stages of the research process from planning and preparation through to archiving your research outputs, making them discoverable to the wider research community and members of the public.

The four stages:

1: Get ready – You’ve had an idea for a research study so it’s time to start making plans and getting prepared. Have you considered writing a data management plan?

  • Remember, if you are in receipt of external funding, there may be data management requirements to consider.
  • Feel free to reach out to Open Science and Research Support to assist you.

2: Let’s go – You are now actively researching putting all those research plans into action.

  • Don’t forget to revisit your data management plan and update it to reflect your latest decision making.
  • It’s also useful to consider documenting your research as you progress.

3: Ta-dah – The research is complete and it’s time to archive your research outputs to preserve them for the longer-term.

  • Aim to utilise subject-specific archives and repositories where possible.
  • Creating a metadata record in a public facing online catalogue with links to any related publications can be useful to building online networks of linked research outputs.
  • Consider making your research outputs as openly accessible as possible remembering that controlling or restricting access is fine as long as it is justified and there is a set data access protocol in place to facilitate a data access request.
  • Did you know you can archive most research outputs in the UCL Research Data Repository?

4: Wow! I think I can use thismaking your research discoverable to others for potential reuse can help to maximise research opportunities

And so the research data lifecycle begins again!

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Join us for International Love Data Week!

By Rafael, on 7 February 2024

Guest post by Iona Preston, Research Data Support Officer.

Next week (February 12-16), we’re excited to be celebrating International Love Data Week. We’ll be looking at how data is shared and reused within our UCL and academic community, highlighting the support available across UCL for these initiatives. This year’s theme, “My Kind of Data,” focuses on data equity, inclusion, and disciplinary communities. We’ll be blogging and posting on X throughout the week, so please join us to learn more.

International Love Data Week 2024 poster

Here’s a sneak preview of what’s coming up:

  • Did you know the Research Data Management team can review your data management plan and support you in publishing your data in our Research Data Repository? Find out more about our last year in review with Christiana McMahon, Research Data Support Officer.
  • Have you met any members of our Data Stewards team? James Wilson, Head of Research Data Services, will be explaining how you can collaborate with them to streamline the process of managing and preserving your data, thereby supporting reproducibility and transparency in your research.
  • Are you seeking tools to support best practices in data management for your specific discipline? We have some suggestions from Iona Preston, Research Data Support Officer.
  • You may have heard of FAIR data – but what does that mean in practice? Join Research Data Steward Shipra Suman and Senior Research Data Steward Victor Olago as they discuss projects where they’ve supported making data FAIR.
  • And, finally, to round off the week, Senior Research Data Steward Michelle Harricharan will talk about a project the Data Stewards are carrying out to better support UCL researchers in accessing and managing external datasets.

We look forward to engaging with you throughout the week and hope you enjoy learning more about research data at UCL.

And get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, and join our mailing list to be part of the conversation!

Open Science & Scholarship Awards Winners!

By Kirsty, on 26 October 2023

A huge congratulations to all of the prize winners and a huge thanks to everyone that came to our celebration yesterday! It was lovely to hear from a selection of the winning projects and celebrate together. The OOSS team and the UKRN Local leads Sandy and Jessie had a lovely time networking with everyone.

Just in case you weren’t able to join us to hear the prize winners talk about their projects, Sandy has written short profiles of all of the winning projects below.

Category: Academic staff

Winner: Gesche Huebner and Mike Fells, BSEER, Built Environment

Gesche and Mike were nominated for the wide range of activities that they have undertaken to promote open science principles and activities in the energy research community. Among other things, they have authored a paper on improving energy research, which includes a checklist for authors, delivered teaching sessions on open, reproducible research to their department’s PhD students as well as staff at the Centre for Research Into Energy Demand Solutions, which inspired several colleagues to implement the practices, they created guidance on different open science practices aimed at energy researchers, including professionally filmed videos, as well as developed a toolkit for improving the quality, transparency, and replicability of energy research (i.e., TReQ), which they presented at multiple conferences. Gesche and Mike also regularly publish pre-analysis plans of their own research, make data and code openly available when possible, publish preprints, and use standard reporting guidelines.

Honourable mention: Henrik Singmann, Brain Sciences

Henrik was nominated for their consistent and impactful contribution to the development of free and open-source software packages, mostly for the statistical programming language R. The most popular software tool he developed is afex, which provides a user-friendly interface for estimating one of the most commonly used statistical methods, analysis of variance (ANOVA). afex, first released in 2012 and actively maintained since, has been cited over 1800 times. afex is also integrated into other open-source software tools, such as JASP and JAMOVI, as well as teaching materials. With Quentin Gronau, Henrik also developed bridgesampling, a package for principled hypothesis testing in a Bayesian statistical framework. Since its first release in 2017, bridgesampling has already been cited over 270 times. Other examples of packages for which they are the lead developer or key contributor are acss, which calculates the algorithmic complexity for short strings, MPTinR and MPTmultiverse, as well as rtdists and (together with their PhD student Kendal Foster) fddm. Further promoting the adoption of open-source software, Henrik also provides statistics consultation sessions at his department and uses open-source software for teaching the Master’s level statistics course.

Honourable mention: Smita Salunke, School of Pharmacy

Smita is recognised for their role in the development of the The Safety and Toxicity of Excipients for Paediatrics (STEP) database, an open-access resource compiling comprehensive toxicity information of excipients. The database was established in partnership with European and the United States Paediatric Formulation Initiative. To create the database, numerous researchers shared their data. To date, STEP has circa 3000 registered users across 44 countries and 6 continents. The STEP database has also been recognised as a Research Excellence Framework (REF) 2021 impact case study. Additionally, the European Medicines Agency frequently refer to the database in their communications; the Chinese Centre for Drug Evaluation have also cited the database in their recent guidelines. Furthermore, the Bill and Melinda Gates Foundation have provided funds to support a further 10 excipients for inclusion in STEP. The development and evaluation of the STEP database have been documented in three open-access research papers. Last but not least, the database has been integrated into teaching materials, especially in paediatric pharmacy and pharmaceutical sciences.

Category: Professional Services staff

Winner: Miguel Xochicale, Engineering Sciences and Mathematical & Physical Sciences

Miguel hosted the “Open-source software for surgical technologies” workshop at the 2023 Hamlyn symposium on Medical Robotics, a half-day session that brought together experts from software engineering in medical imagining, academics specialising in surgical data science, and researchers at the forefront of surgical technology development. During the workshop, speakers discussed the utilisation of cutting-edge hardware; fast prototyping and validation of new algorithms; maintaining fragmented source code for heterogenous systems; developing high performance of medical image computing and visualisation in the operating room; and benchmarks of data quality and data privacy. Miguel subsequently convened a panel discussion, underscoring the pressing need of additional open-source guidelines and platforms that ensure that open-source software libraries are not only sustainable but also receive long-term support and are seamlessly translatable to clinic settings. Miguel made recording of the talks and presentations, along with a work-in-progress white paper that is curated by them, and links to forums for inviting others to join their community available on Github.

Honourable mention: Marcus Pedersen, PHS

The Global Business School for Health (GSBH) introduced changes to its teaching style, notably, a flipped classroom. Marcus taught academics at their department how to use several mostly freely available learning technologies, such as student-created podcasts, Mentimeter, or Microsoft Sway, to create an interactive flipped classroom. Marcus further collected feedback from students documenting their learning journey and experiences with flipped teaching to evaluate the use of  the tools. Those insights have been presented in a book chapter (Betts, T. & Oprandi, P. (Eds.). (2022). 100 Ideas for Active Learning. OpenPress @ University of Sussex) and in talks for UCL MBA and Master’s students as well as at various conferences. The Association of Learning Technology also awarded Marcus the ELESIG Scholar Scheme 23/24 to continue their research.

Category: Students

Winner: Seán Kavanagh, Chemistry

Séan was nominated for his noteworthy contribution to developing user-friendly open-source software for the computational chemistry/physics research community. They have developed several codes during their PhD, such as doped, ShakeNBreak and vaspup2.0 for which they are the lead developer, as well as PyTASER and easyunfold for which they are a co-lead developer. Séan not only focuses on efficient implementation but also on user-friendliness along with comprehensive documentation and tutorials. They have produced comprehensive video walkthroughs of the codes and the associated theories, amassing over 20,000 views on YouTube and SpeakerDeck. It is important to note that software development is not the primary goal of Séan’s PhD research (which focuses on characterizing solar cell materials), and so their dedication to top-quality open-source software development is truly commendable. Additionally, Séan has consistently shared the data of all his publications and actively encourages open-access practices in his collaborations/mentorship roles, having assisted others in making their data available online and building functionality in their codes to save outputs in transferable and interoperable formats for data.

Honourable mention: Julie Fabre, Department of Neuromuscular Diseases

Julie is recognized for developing the open-source toolbox bombcell, that automatically assesses large amounts of data that are collected simultaneously from hundreds of neurons (i.e., groups of spikes). This tool considerably reduces labour per experiment and enables long-term neuron recording, which was previously intractable. As bombcell has been released under the open-source copyleft GNU General Public License 3, all future derived work will also be free and open source. Bombcell has already been used in another open-source toolbox with the same licence, UnitMatch. The toolbox’s code is extensively documented, and Julie adopted the Open Neurophysiology Environment, a standardised data format that enables quick understanding and loading of data files. In 2022, Julie presented bombcell in a free online-course. This course was attended by over 180 people, and the recorded video has since been viewed over 800 times online. Bombcell is currently regularly used in a dozen labs in Europe and the United States. It has already been used in two peer-reviewed publications, and in two manuscripts that are being submitted for publication with more studies underway.

Honourable mention: Maxime Beau, Division of Medicine

Maxime is recognized for leading the development of NeuroPyxels, the first open-source library to analyze Neuropixels data in Python. NeuroPyxels, hosted on a GitHub public repository and licensed under the GNU general public license, is actively used across several neuroscience labs in Europe and the United States (18 users have already forked the repository). Furthermore, NeuroPyxels relies on a widely accepted neural data format; this built-in compatibility with community standards ensures that users can easily borrow parts of NeuroPyxels and seamlessly integrate them with their application. NeuroPyxels has been a great teaching medium in several summer schools. Maxime has been a teaching assistant at the “Paris Spring School of Imaging and Electrophysiology” for three years, the FENS course “Interacting with Neural Circuits” at Champalimaud for two years, and the UCL Neuropixels course for three years where NeuroPyxels has been an invaluable tool to get students started with analysing neural data in Python.

Honourable mention: Yukun Zhou, Centre for Medical Image Computing

Yukun was nominated for developing open-source software for analysing images of the retina. The algorithm, termed AutoMorph, consists of an entire pipeline from image quality assessment to image segmentation to feature extraction in tabular form. A strength of AutoMorph is that it was developed using openly available data and so its underlying code can be easily reproduced and audited by other research groups.Although only published 1 year ago, AutoMorph has already been used by research groups from four continents and led to three new collaborations with Yukun’s research group at UCL. Moreover, AutoMorph has been run on the entire retinal picture dataset in the UK Biobank study with the features soon being made available for the global research community. Yukun has been complimented on the ease with which any researcher can immediately download the AutoMorph tools and deploy on their own datasets. Moreover, the availability of AutoMorph has encouraged other research groups, who are conducting similar work, to make their own proprietary systems openly available.

Category: Open resources, publishing, and textbooks 

Winner: Talia Isaacs, IOE, UCL’s Faculty of Education and Society

Talia is recognized for their diverse and continuous contributions to open access publishing. As Co-Editor of the journal Language Testing, they spearheaded SAGE’s CRediT pilot scheme, requiring standardized author contribution statements; they approved and supported Special Issue Editors’ piloting of transparent review for a special issue on “Open science in Language Testing”, encouraged authors to submit pre-prints, and championed open science in Editor workshops and podcasts. Additionally, in 2016, Multilingual Matters published Talia’s edited volume as their first open access monograph. Talia also discussed benefits of open access book publication in the publisher’s blog. As a result, the publisher launched an open access funding model, matching funding for at least one open access book a year. Further showcasing their dedication to open science, Talia archived the first corpus of patient informed consent documents for clinical trials on UK Data Service and UCL’s research repository, and delivered a plenary on “reducing research waste” at the British Association for Applied Linguistics event. They have also advocated for the adoption of registered reports at various speaking events, Editorial Board presentation, in a forthcoming article, editorial, and social media campaign. 

Honourable mention: Michael Heinrich and Banaz Jalil, School of Pharmacy

Banaz and Michael were nominated for co-leading the development of the ConPhyMP-Guidelines. Ethnopharmacology is a flourishing field of medical/pharmaceutical research. However, results are often non-reproducible. The ConPhyMP-Guidelines are a new tool that defines how to report the chemical characteristics of medicinal plant extracts used in clinical, pharmacological, and toxicological research. The paper in which the guidelines are presented is widely used (1613 downloads / 8,621 views since Sept 2022). An online tool, launched in August 2023 and accessible via the Society for Medicinal Plant and Natural Product Research (GA) website, facilitates the completion of the checklist. Specifically, the tool guides the researchers in selecting the most relevant checklists for conducting and reporting research accurately and completely.

Honourable mention: Talya Greene, Brain Sciences 

Talya is recognized for leading the creation of a toolkit that enables traumatic stress researchers to move toward more FAIR (Findable, Accessible, Interoperable and Reusable) data practices. This project is part of the FAIR theme within the Global Collaboration on Traumatic Stress. Two main milestones have so far been achieved: 1) In collaboration with Bryce Hruska, Talya has collated existing resources that are relevant to the traumatic stress research community to learn about and improve their FAIR data practices. 2) Talya also collaborated with Nancy Kassam-Adams to conduct an international survey with traumatic stress researchers about their attitudes and practices regarding FAIR data in order to identify barriers and facilitators of data sharing and reuse. The study findings have been accepted for publication in the European Journal of Psychotraumatology. Talya has also presented the FAIR toolkit and the findings of the survey at international conferences (e.g., the International Society for Traumatic Stress Studies annual conference, the European Society for Traumatic Stress Studies Biennial Conference).

Sensitive data – where and how to archive

By Kirsty, on 16 August 2022

Guest post by James Houghton, Research Data Support Officer

It is always essential to protect the personal identity of participants or information that could jeopardise the safety of a building, an endangered species, or similar. Deleting data at the end of a project is often necessary to guarantee privacy and security. But this data is sometimes of immense value. The potential usefulness of data could be weighed against the likelihood of an accidental release and the risk of harm if an unintentional release did occur.

There are options for archiving data with access controls for researchers who feel strongly that their data should be preserved. Some repositories have built-in access controls that ensure sensitive data can only be accessed by specific persons who have undergone an application process. Only a few data repositories offer this feature and will still have a remit controlling what data they can accept. Here are some examples.

  • ReShare (UK Data Service) – This site is a social data research repository created to share data for the Economic and Social Research Council (ESRC)-funded projects but is open for submissions from other sources!
  • The European Genome-phenome Archive (EGA) – The EGA offers service for permanent archiving and sharing of personally identifiable clinical data generated for biomedical research projects or in the context of research-focused healthcare systems

The UK Data Service, which runs the ReShare archive, provides functional on-data access control and explains how to implement it.

If you are concerned about storing the data live, even with access controls, consider storing the raw data offline. The existence of the data can be advertised online by creating an entry in a repository that announces the data’s presence and explains how to access it. The repository record will also assign a DOI to cite the dataset properly. Making sure the offline information is stored securely can be challenging, however. There needs to be a specific process to ensure the data is secure and accessible on request.

Dealing with the long-term archiving of sensitive data is complicated. The UCL Research Data Management Team can assist with this. Get in touch if you need support!

Bookings now open for UCL Open Science Conference 2022

By Kirsty, on 15 March 2022

We are very pleased to finally be able to announce that bookings are officially open for the UCL Open Science conference 2022!

The conference is taking place online across two days, and as a special trial run this year we have selected one session to be run as a hybrid event, which will be available online and in person on the UCL campus. If you want to attend the conference online, and the Citizen Science session in person you will need a ticket for both.

Tickets are free and open to everyone that is interested. Sessions will be recorded and the recordings will be shared on the blog and via social media after the event.

Download the programme

DAY 1 – 6th April –

Morning Session: 10.00 -12.30 ONLINE

What does Open Science mean to me?

Here at UCL, the phrase ‘Open Science’ routinely refers to the steps taken to open up the research process to the benefit of the wider research community and beyond. Consequently, members of the UCL community are being actively encouraged to embrace open science practices – and the cultural changes that inevitably follow. Plus, we are subsequently well placed to explore related potential opportunities including greater transparency of the research process, maximising research potential of existing resources and embedding a greater sense of trustworthiness and accountability to your research.

However, it seems the deeper we delve into the concept of Open Science, the more we seek to contextualise this phrase and question what it means to an individual’s working practices.

Kickstart your research with Open Data and Code

This session will look at some of the approaches you can take to go beyond simply sharing your data and code and instead making it Open and FAIR – Findable, Accessible, Interoperable, and Reusable. Assuming little prior knowledge, we will hear from researchers and research technology professionals about how they approach making research software open source, techniques for openness when dealing with computational research, the role that can be played by Electronic Lab Notebooks, and data repositories in the Open Science ecosystem

Afternoon Session: 1.30-3.30pm ONLINE & IN PERSON:

How does Citizen Science change us?

Recent research about the impact of citizen science projects tends to focus on how public ‘participation’ in scientific research enhances knowledge outcomes for projects, or enhances the scientific literacy of participating citizen scientists. The benefits to participating individuals and communities are often assumed, and very little literature examines the personal dilemmas and challenges that individuals negotiate, or how citizen science projects change the behaviour of policymakers.

We aim to explore these gaps by inviting different perspectives on the question “How does citizen science change us?” Discussions will examine how participation in citizen science projects impacts on the different individuals involved – the citizen scientists, academic researchers, community members, policymakers – and ask how impacts on individuals can translate into wider political, societal and organisational transformations

This session will be online using the same link as the main conference. If you want to join this session in person, please also register on Eventbrite.

DAY 2 – 7th April – ONLINE 10.00 -12.30

10.10-11.20 UKRI Town Hall

The new UKRI Open Access policy has dominated discussions of the future of Open Access in the last year. This session proposes to allow the audience free rein to openly discuss the new policy with key members of the team at UKRI. After a brief presentation of the policy and guidance as it stands, the audience will be invited to pose their questions in an open forum.

11.20-12.30 Open Science and the Global South

Open Access publishing has been broadly embraced as a solution to the issue of paywalls which are often barriers to accessing research articles and, therefore, barriers to research itself. Open Access publishing removes the cost for those that may wish to read an article, but the publication process must still be paid for. Finding sustainable ways of doing this is a challenge, especially for institutions based in the global south where budgets may be more limited.

 

What can I do with my data at the end of a project?

By Kirsty, on 18 February 2022

The important thing to remember about data as you reach the end of a project is that while sharing your data openly can have some brilliant outcomes, for you as a researcher, but also for the development of your research area, sometimes it isn’t possible to be truly open. This doesn’t mean that there is nothing you can do.

The core principles you should consider when planning and managing your data are those of FAIR. These principles apply whether your data is being made open or not – FAIR and Open are not mutually exclusive.

  • Your outputs should be Findable – this means they should be discoverable by the wider academic community and the public. If your outputs can be made open, this is a case of choosing an appropriate repository, but this still applies if your data cannot be shared. For example, if your data is commercially sensitive or cannot be fully anonymised, it should still be made known that the data exists so that interested parties can still find out about the project. They may approach you for further details or to collaborate.
  • They should be Accessible – you should make sure that you, or the system you choose uses unique identifiers, high quality metadata and a clear use of language and access protocols. This goes for data you might want to share, which could be the whole dataset, a derived subset of the data, the data that underlies a specific publication or even just a record of the project as discussed above.
  • They should be Interoperable – this means that any data structures or file types you use need to be able to be opened and used by others. This is an important consideration, as your data, code, annotations or any other file needs to be reusable into the future when the original programme you used may not be around anymore.
  • Finally, they should be Reusable – enabling the repurposing of research outputs to maximise their potential. This means that file types need to be operational long after publication as discussed above, but also files need to be annotated in such a way that someone else can use it accurately to either reproduce or build upon your research. If data cannot be shared, you can still take action. If you have used commercial or clinical datasets, or personally identifiable data that needs to remain restricted, you can share how you processed the data to reach your published conclusions, or the code you used. This will enable researchers who access or collect the same data from different participants to reproduce your research, without access to the same exact data.

When combined, these four elements help lower barriers to research outputs and facilitate secondary researchers finding, understanding, reusing and repurposing your research to realise additional research opportunities and maximise existing resources, even if you can’t share the data in full.

Whatever happens to your data at the end of a project should have been in the plan from the beginning. You should be aware from the start how much of your data can be shared, and you will have thought about where and how you were going to share it. Of course, if you aren’t sure, there are teams that can help make these decisions, and a wide range of advice available, even about less common topics like finding data to form part of your project, how to negotiate a Material Transfer Agreement in order to use it, how to securely destroy sensitive data or even how to cite data appropriately, just to name a few!

With this post we bring our Love Data Week full circle, back to the teams that can support you, and the importance of a good plan. Thank you for joining our activities this week, we hope you enjoyed it.

Data Sharing Highlights

By Kirsty, on 17 February 2022

At UCL there is a recognition that there is more to publishing research than simply books and papers in print and PDF format.

One of the ways this is supported is through the UCL Research Data Repository. Our institutional publishing platform can be accessed here: https://rdr.ucl.ac.uk/

The RDR can be used to publish all kinds of data, in a raw or visual format. All items are given a DOI and can be referenced to the same standard as a journal article or book. Many published items are supporting data for other publications, but a large number are also standalone items.

Visual Archaeological Data

A great example of using the RDR to present and share visual information is this archaeological data record: https://doi.org/10.5522/04/11385852.v1

This record is part of a project to document and catalogue archaeological sites across Central Asia. The record includes photographs of the site, height map data, 3D models along with some descriptive information about the site itself and co-ordinates which specify the location. The 3D models are also hosted live on the Sketchfab website, but including them in this data record provides an extra level of preservation. While the Sketchfab website might become defunct, this data record becomes part of the UCL permanent collection and will retained. The files could be used to reconstruct the 3D models if needed.

Sharing Research Methods

Sharing data is important, but sharing research methodology can be a really powerful way to improve reproducibility and transparency. The UCL RDR can also be used to share methods and designs used in research. It’s worth highlighting that although the RDR has “data” in the name it is very flexible in terms of what can be shared. Questionnaires, study designs, posters and presentations are all accepted.

As an example of sharing research methods this Survey of Clinical Trial units published with the repository contains not only an anonymised set of survey responses but also the exact survey document used. This means there is a clear record of the exact questions used to generate the data that could be easily used to recreate a similar survey at later date, for example, to see how trends have shifted over time.

Access this record here: https://doi.org/10.5522/04/7992998.v1

Beyond the RDR: Publishing data from the Natsal Surveys

The UCL RDR is just one platform for sharing data, and is not always the most appropriate for a particular project. A great example of presenting data in a way that makes it available to explore for a general audience is an online interactive that’s been developed using data from the last Natsal survey.

The British National Surveys of Sexual Attitudes and Lifestyles (or ‘Natsal’ for short) are a project led by UCL. So far three versions of the survey have been completed approximately every decade, the first in 1990, and since then over 45000 people have been interviewed, each one randomly-selected from across the country so that the data can be considered as broadly representative of the British population. First carried out in response to the HIV and AIDS epidemic, the Natsal surveys have become the leading source of reliable information about sex in Britain. Each of the surveys provides a snapshot of the nation’s sexual behaviour, experiences, and attitudes, and together they paint a comprehensive picture of how the nation’s sex lives change over time. Interviews for the next and fourth Natsal are due to start this summer.

The Natsal research team have been sharing their data through the UK Data Service for many years, providing a useful resource for other researchers. Data released from the last survey goes one step further to make sure the data are accessible beyond professional researchers and statisticians. Anyone will be able to explore the latest Natsal data using a freely, visually engaging interactive explorer, developed as a collaboration between UCL and the Open University, which also includes a taster animation and interactive activity ‘No sex please, we’re British!’ which allows anyone to see how their own views correspond to the survey’s findings for the population as a whole.