X Close

Centre for Advanced Research Computing

Home

ARC is UCL's research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship

Menu

Archive for April, 2024

The importance of collaboration: The latest engagement between DiRAC and ARC

By Connor Aird, on 26 April 2024

When time is scarce on a research project, it is important to continuously plan and effectively collaborate with the whole team. A good example of this is the DiRAC project, Spontaneous Symmetry Breaking in 3d Models of Fermions with Prof Simon Hands (PI) which, due to a funding deadline, had to be delivered in 5 weeks. This project aims to explore the phase diagram of a relativistic field theory of fermions using a code base developed by Prof Hands et al, known as thirring-rhmc. However, the collaboration with ARC and Prof Hands covered a much smaller scope.  

Aims 

Our aim was to migrate the work of a PhD student (Dr Jude Worthy) into the default branch of the thirring-rhmc code base. Once this was completed, the intention was that some performance improvements could be investigated as part of the project. Jude’s work implemented a higher accuracy but consequently lower performance formulation (Wilson kernel) of something already implemented in the code base (Shamir kernel). This reduction in performance is the reason for the desire to gain some performance improvements. However, from the PI’s initial comments, it was clear that the key aim remained the code migration – “…I’m increasingly convinced it only makes sense to pursue this research program further if an improved formulation is employed, so the Shamir -> Wilson transition as essential”. 

Obstacles 

Several obstacles threatened the success of this project. Development on the original version of thirring-rhmc had continued throughout Jude’s PhD but unfortunately git had not been used to develop the Wilson kernel. Therefore, the two codes had diverged significantly with no clear indication as to what degree. Due to this divergence, it was vital to develop a continuous testing suite to have any chance of success. However, the outputs of thirring-rhmc are statistical in nature and can, whilst remaining correct, vary significantly with only slight changes to the code. Therefore, a lot of domain specific knowledge would be required to design these tests. 

What we did 

This project’s strict time constraints required us to take a methodical approach to planning our work. For each task, we defined a clear definition of done and ensured we understood how that individual piece of work helped progress towards our key aim. Continuously planning our tasks in this way was essential to our success. 

The lack of clarity around what changes in the Wilson kernel were significant meant our first task was to set up reliable unit tests. With these tests in place, we could confidently alter the code and catch any breaking changes we might introduce. Helpfully, some stale tests were already present in the repository. With Simon’s domain knowledge, we were able to update these existing tests to create a working test suite. When these tests failed and highlighted issues we couldn’t solve independently, we were able to quickly reach a solution through regular communication with Simon. Simon’s domain knowledge was an invaluable asset throughout the project. As a bonus, we were able to demonstrate the confidence regular testing gave us when carrying out large refactors and migrations. This will hopefully increase the chances of Simon’s research team continuing to maintain and build upon these tests, therefore preventing the tests going stale again. This is a great example of how close collaboration between RSEs and Researchers can benefit both parties. 

This close collaboration and communication with Simon helped to quickly increase our knowledge of the code base and research domain. Due to this better understanding, we identified the likely causes of two known issues with Jude’s code. Most notably, we identified that the inflated value of an input parameter was a key reason for the Wilson kernels reduced performance.  

Conclusion 

To conclude, RSEs and Researchers work best together when they effectively communicate. Siloing the domain knowledge of these two parties only reduces the chances of success. Our projects are collaborations and can only succeed if we work in this way from the very beginning. 

Research Integrity in an AI-Enabled World

By Samantha Ahern, on 5 April 2024

Over the last 15 months there has been much debate, hype and concern relating to capabilities of tools and platforms leveraging Large Language Models (LLMs) and media generators. Broadly termed Generative AI. The predominant narrative in Higher Education has been around the perceived threat to academic integirty and associated value to degrees. As such a lot of focus and discussion has focused on taught students, assessment design and “AI-proof” assessment. This has been coupled with concerns relating to the inability to reliably detect generated content, and the disproportionate number of false positives related to non-native English speakers text submitted to various platforms.

AI generated image of a researches using AI in front of the UCL porticoHowever, despite the proliferation of Generative AI enabled research tools and platforms, numerous workshops offering increased research output productivity and publications asking authors to declare whether or not these tools were used in producing outputs there has been limited discussion with relation to staff and research integrity.

Coupled with the publication of initial findings from a study on staff use of these tools by Watermayer, Lanclos and Phipps that included use to complete “little things like health and safety stuff, or ethics, or summarizing reports” and potential safety risks from fine-tuning models as reported in the Stanford Univeristy published policy briefing Safety Risks from Customizing Foundation Models via Fine-Tuning a workshop focusing on the interplay of Generative AI and research integrity and ethics was proposed as an AIUK Fringe event.

Research Integrity in an AI-Enabled World took place on Monday 25th March 2024. The aim was to explore how we think Generative AI enabled tools and platforms, could and should impact on the research process, and what the integrity and ethics implication are. Eventually aim would be to produce a policy white paper.

The event was organised so that there was a series of thought provoking talks in the morning, followed by a world-cafe style session in the afternoon. The event was held under the Chatham House Rule to enable open and frank discussion of the topic and arising issues.

The first set of talks predominantly focused on ethical issues. There were discussions on authorship, and the nature of authorship where multiple actors are involved e.g. training data creators, platform developers and prompters.  Bias in image generation, reinforcing misconceptions and stereotypes. Culminating in a talk on the University of Salfords evolving approach to Generative AI and research ethics.

The second set of talks was focused on current capabilities, limitations and implications of using Generative AI enabled tools in the research pipeline, predomintly focusing on qualitative analysis. This session included a discussion around evidence synthesis and the need to find more efficient methods whilst maintaining reliability and a breadth of knowledge, and different approaches using “traditional” machine learning approaches versus use of large language models. Enhanced capabilities of Computer Aided Qualitative Data Analysis Systems and implications for methodological approaches were also introduced and discussed. The session concluded with a talk from Prof Jeremy Watson about the work currently being undertaken by the UK Committee on Research Integrity’s AI working group, of which he is member. Key themes currently under consideration by UKCORI are:

  • Governance
  • Roles and Responsibilities
  • Skills and Training
  • Public Understanding and Expectations
  • Attribution and Ownership – IP, etc.
  • Understanding Data Inputs and Models
  • Need for Research in AI and Integrity

During the world-cafe session participants addressed the following questions:

  • What do we mean by Research Integrity in an AI-Enabled Research Environment?
  • Are there degrees of Research Integrity based on discipline and how embedded AI use is in the research process?
  • What are the key ethical and legal considerations?

Including the following participant proposed questions:

  • Generative AI is extremely good at in-filling uncertainty, where details of images become filled with bias. Should the responsibility of bias be equally on a prompter who enables this by omission?
  • Recalibration of government and private funded RI in AI? Isn’t this the foundation of biases for RI?

Outputs from the world cafe session will be analysed over the next few weeks, and workshop participants were invited to contribute to the development of workshop outputs.

Key themes that emerged from the event include:

  • Transparency
  • Criticality
  • Responsibility
  • Fitness for purpose
  • Data protection and privacy
  • Digital divide – privilege and harms
  • Training – education

Social media post about the workshopThe workshop was well received by participants, with the participants rate their overall experience of the event as 4.71 out of 5.

The speaker sessions were rated as very good by over 70% of participants. With the world cafe being mentioned as a highlight of the event.

 

 

 

As the proposer, organising and the host of the event I can’t help but still wonder:

  • Can we ethically and with genuine integrity use tools which are fundamentally ethically flawed?
  • Why are we accepting of these issues?
  • How should we be pushing back?

I will leave you with these words from Arudhati Roy with which I opened the event: