X Close

Centre for Advanced Research Computing

Home

ARC is UCL's research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship

Menu

The importance of collaboration: The latest engagement between DiRAC and ARC

By Connor Aird, on 26 April 2024

When time is scarce on a research project, it is important to continuously plan and effectively collaborate with the whole team. A good example of this is the DiRAC project, Spontaneous Symmetry Breaking in 3d Models of Fermions with Prof Simon Hands (PI) which, due to a funding deadline, had to be delivered in 5 weeks. This project aims to explore the phase diagram of a relativistic field theory of fermions using a code base developed by Prof Hands et al, known as thirring-rhmc. However, the collaboration with ARC and Prof Hands covered a much smaller scope.  

Aims 

Our aim was to migrate the work of a PhD student (Dr Jude Worthy) into the default branch of the thirring-rhmc code base. Once this was completed, the intention was that some performance improvements could be investigated as part of the project. Jude’s work implemented a higher accuracy but consequently lower performance formulation (Wilson kernel) of something already implemented in the code base (Shamir kernel). This reduction in performance is the reason for the desire to gain some performance improvements. However, from the PI’s initial comments, it was clear that the key aim remained the code migration – “…I’m increasingly convinced it only makes sense to pursue this research program further if an improved formulation is employed, so the Shamir -> Wilson transition as essential”. 

Obstacles 

Several obstacles threatened the success of this project. Development on the original version of thirring-rhmc had continued throughout Jude’s PhD but unfortunately git had not been used to develop the Wilson kernel. Therefore, the two codes had diverged significantly with no clear indication as to what degree. Due to this divergence, it was vital to develop a continuous testing suite to have any chance of success. However, the outputs of thirring-rhmc are statistical in nature and can, whilst remaining correct, vary significantly with only slight changes to the code. Therefore, a lot of domain specific knowledge would be required to design these tests. 

What we did 

This project’s strict time constraints required us to take a methodical approach to planning our work. For each task, we defined a clear definition of done and ensured we understood how that individual piece of work helped progress towards our key aim. Continuously planning our tasks in this way was essential to our success. 

The lack of clarity around what changes in the Wilson kernel were significant meant our first task was to set up reliable unit tests. With these tests in place, we could confidently alter the code and catch any breaking changes we might introduce. Helpfully, some stale tests were already present in the repository. With Simon’s domain knowledge, we were able to update these existing tests to create a working test suite. When these tests failed and highlighted issues we couldn’t solve independently, we were able to quickly reach a solution through regular communication with Simon. Simon’s domain knowledge was an invaluable asset throughout the project. As a bonus, we were able to demonstrate the confidence regular testing gave us when carrying out large refactors and migrations. This will hopefully increase the chances of Simon’s research team continuing to maintain and build upon these tests, therefore preventing the tests going stale again. This is a great example of how close collaboration between RSEs and Researchers can benefit both parties. 

This close collaboration and communication with Simon helped to quickly increase our knowledge of the code base and research domain. Due to this better understanding, we identified the likely causes of two known issues with Jude’s code. Most notably, we identified that the inflated value of an input parameter was a key reason for the Wilson kernels reduced performance.  

Conclusion 

To conclude, RSEs and Researchers work best together when they effectively communicate. Siloing the domain knowledge of these two parties only reduces the chances of success. Our projects are collaborations and can only succeed if we work in this way from the very beginning. 

Comments are closed.