X Close

Centre for Advanced Research Computing

Home

ARC is UCL's research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship

Menu

The importance of collaboration: The latest engagement between DiRAC and ARC

By Connor Aird, on 26 April 2024

When time is scarce on a research project, it is important to continuously plan and effectively collaborate with the whole team. A good example of this is the DiRAC project, Spontaneous Symmetry Breaking in 3d Models of Fermions with Prof Simon Hands (PI) which, due to a funding deadline, had to be delivered in 5 weeks. This project aims to explore the phase diagram of a relativistic field theory of fermions using a code base developed by Prof Hands et al, known as thirring-rhmc. However, the collaboration with ARC and Prof Hands covered a much smaller scope.  

Aims 

Our aim was to migrate the work of a PhD student (Dr Jude Worthy) into the default branch of the thirring-rhmc code base. Once this was completed, the intention was that some performance improvements could be investigated as part of the project. Jude’s work implemented a higher accuracy but consequently lower performance formulation (Wilson kernel) of something already implemented in the code base (Shamir kernel). This reduction in performance is the reason for the desire to gain some performance improvements. However, from the PI’s initial comments, it was clear that the key aim remained the code migration – “…I’m increasingly convinced it only makes sense to pursue this research program further if an improved formulation is employed, so the Shamir -> Wilson transition as essential”. 

Obstacles 

Several obstacles threatened the success of this project. Development on the original version of thirring-rhmc had continued throughout Jude’s PhD but unfortunately git had not been used to develop the Wilson kernel. Therefore, the two codes had diverged significantly with no clear indication as to what degree. Due to this divergence, it was vital to develop a continuous testing suite to have any chance of success. However, the outputs of thirring-rhmc are statistical in nature and can, whilst remaining correct, vary significantly with only slight changes to the code. Therefore, a lot of domain specific knowledge would be required to design these tests. 

What we did 

This project’s strict time constraints required us to take a methodical approach to planning our work. For each task, we defined a clear definition of done and ensured we understood how that individual piece of work helped progress towards our key aim. Continuously planning our tasks in this way was essential to our success. 

The lack of clarity around what changes in the Wilson kernel were significant meant our first task was to set up reliable unit tests. With these tests in place, we could confidently alter the code and catch any breaking changes we might introduce. Helpfully, some stale tests were already present in the repository. With Simon’s domain knowledge, we were able to update these existing tests to create a working test suite. When these tests failed and highlighted issues we couldn’t solve independently, we were able to quickly reach a solution through regular communication with Simon. Simon’s domain knowledge was an invaluable asset throughout the project. As a bonus, we were able to demonstrate the confidence regular testing gave us when carrying out large refactors and migrations. This will hopefully increase the chances of Simon’s research team continuing to maintain and build upon these tests, therefore preventing the tests going stale again. This is a great example of how close collaboration between RSEs and Researchers can benefit both parties. 

This close collaboration and communication with Simon helped to quickly increase our knowledge of the code base and research domain. Due to this better understanding, we identified the likely causes of two known issues with Jude’s code. Most notably, we identified that the inflated value of an input parameter was a key reason for the Wilson kernels reduced performance.  

Conclusion 

To conclude, RSEs and Researchers work best together when they effectively communicate. Siloing the domain knowledge of these two parties only reduces the chances of success. Our projects are collaborations and can only succeed if we work in this way from the very beginning. 

Randomising Blender scene properties for semi-automated data generation

By Ruaridh Gollifer, on 12 December 2023

Blender is a free and open-source software for 3D geometry rendering. Uses include modelling, simulation, animation, virtual reality applications, and more recently synthetic datasets generation. This last application is of particular interest in the field of medical imaging, where often there is limited real data that can be used to train machine learning models. By creating large amounts of synthetic but realistic data, we can improve the performance of models in tasks such as polyp detection in image guided surgery. Synthetic data generation has other advantages since using tools like Blender gives us more control and we can generate a variety of ground truth data from segmentation masks to optic flow fields, which in real data would be very challenging to generate or would involve extensive time consuming manual labelling. Another advantage of this approach is that often we can easily scale up our synthetic datasets by randomising parameters of the modelled 3D geometry. There can be challenges to make the data realistic and representative of the real data. 

The Problem 

The aim was to develop an add-on that would help researchers and medical imaging experts determine which range of parameter values make realistic synthetic images. Prior to the project, the dataset generation involved a more laborious process of manually creating scenes in Blender with parameters changed manually for introducing variation in the datasets. A more efficient process was needed during the prototyping of synthetic dataset generation to decide what range of parameters make sense visually, and therefore in the future, to more easily extend to other use cases.

What we did 

In collaboration with the UCL Wellcome / EPSRC Centre for Interventional and Surgical Sciences (WEISS), research software engineers from ARC have developed a Blender add-on to randomise relevant parameters for the generation of datasets for polyp detection within the colon. The add-on was originally developed to render a highly diverse and (near) photo-realistic synthetic dataset of laparoscopic surgery camera views. To replicate the different camera positions used in surgery as well as the shape and appearance of the tissues, we focused on randomising three main components of the scene: camera transforms (camera orientation and location), geometry and materials. However, we allowed for more flexibility beyond these 3 main groups of parameters, implementing utilities to randomise other user-defined properties. The software also allows the following features: 1) setting the minimum and maximum bounds through an input file, 2) setting a randomisation seed for reproducibility, 3) exporting output parameters for a chosen number of frames to an output file. The add-on includes testing through Pytest, documentation for users and developers, example input and output files and a sample Blender scene.

The outcomes 

Version 1.0.0 of the Blender Randomiser is available under a BSD 3-Clause License. The GitHub repo is public where the software can be downloaded and installed with instructions provided on how to use the add-on. Examples of what can be produced in Blender can be found at the UCL Research Data Repository (N.B. these examples were produced manually prior to completion of this project).

Developer notes are also available to allow contributions. 

 

Sofia Minano and Ruaridh Gollifer

Simulating light propagation through matter.

By Sam Cunliffe, on 31 October 2023

Observing how light interacts with materials allows us to develop non-invasive medical imaging techniques, that rely on these interactions to assemble an image or infer an appropriate diagnosis.

Light interacts with materials in many different ways. One of the most commonly observed interactions is dispersion; which causes white light to split into individual colours, creating phenomena like rainbows (light from the sun dispersing through raindrops). Another commonly observed interaction is refraction; which causes light to change direction as it passes between two materials, responsible for straight objects like straws appearing to be disjointed when placed into water. To completely describe what is going on in these interactions, we have to use a system of equations known as Maxwell’s equations. We also have to consider some additional parameters that describe the particular material(s) that the light is interacting with. In their most general form, Maxwell’s equations are very complex but have the advantage that almost all materials and interactions can be modelled by them. Solving these equations is, in general, impossible to do with pen and paper, so we need software to do this for us.

Software like this has a wide variety of applications in biomedical optics; notably optical coherence tomography (non-invasive medical imaging of the eye), multiphoton microscopy, and wavefront shaping. For example; we can use this software to model light propagating in the retina: simulating a retina scan. Then we can perform a retina scan for a patient in real life, and use our simulation to better understand the scan. Retinal scans often hint at a particular change to the retina, without being definitive, in the early stages of disease. We can use our simulation to test what types of changes to a retina can lead to observed signatures in an image and therefore help in achieving a diagnosis.

The Problem

In collaboration with the UCL Medical Physics and Biomedical Engineering department, developers from ARC have worked to open up a legacy C and MATLAB library which simulates light propagating through matter. This software was initially developed as part of a PhD thesis approximately 20 years ago and has been continuously developed since then. However, the need to rapidly answer research questions led to the code becoming less sustainable and harder for others to use. Whilst the core functionality was already there; the library needed updating to a more modern language and aligning with the FAIR4SW principles.

What we did

The aim of the project was to be able to provide users with a program that they can give custom input which describes the material they want to simulate, pass this to the software and receive an output they can use in further analysis. We wanted users not to have to worry about the internal workings of the software; only having to download the library code, build and install it once, and be ready for future analyses. We used modern build tools to standardise the build and install of the software, we aimed to make our instructions as straightforward and operating-system-independent as possible. We also set up automated testing of the software and wrote example scripts that users can modify to easily create input files in the correct format.

The outcomes

Version 1.0.1 of the Time Domain Maxwell Solver (TDMS), is now available under a GPL-3.0 license. You can download from GitHub, and install and run on all operating systems. The project has a public-facing website and a growing collection of examples. We also have developer documentation so anyone can contribute in the future.

TDMS 1.0.1 now has a number of new features, including the option to switch between different solver methods (how the simulation is performed), select custom regions over which to compute (to save wasting computation time), and the ability select different techniques for extracting output information through interpolation.

The ARC software engineers were a joy to work with. They brought knowledge of modern software engineering practice and quickly understood the code, and the underlying physics, as required to very effectively re-engineer the code. This collaboration with ARC will hopefully allow for a new range of users to access TDMS and significantly increase its impact.

Will Graham and Sam Cunliffe