X Close

Centre for Advanced Research Computing

Home

ARC is UCL's research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship

Menu

Archive for the 'Technologies' Category

k-Plan now available to researchers!

By Sam Cunliffe, on 11 December 2023

One of ARC’s longest-running collaborations is with the Biomedical Ultrasound Group. Over the past three years, we’ve been developing a graphical user interface to simulate ultrasound treatment plans!

The k-Plan Logo

This software is called k-Plan, and licences are now available for sale through UCL’s commercial partner, BrainBox (who also sell ultrasound transducers).

Screenshot of the k-Plan GUI

If you’re interested in medical ultrasound, and think this software might help you: you can read the full UCL press release, or you can see some more snapshots of k-Plan in action.

The people behind the work…

Our collaboration is managed and led by Bradley Treeby. As well as me, there’s a full roster of research software engineers who’ve worked hard at various times over the last three years to make this happen:

  • Panayiotis Georgiou, ex-UCL now ARM.
  • Timothy Spain, ex-UCL now NERSC, 🇳🇴.
  • Ilektra Christidi, ARC, UCL.
  • Alessandro Felder, ARC, UCL.
  • Orod Razeghi, ex-UCL now University of Cambridge.
  • Idil Ozdemir, ARC, UCL.
  • Connor Aird, ARC, UCL.

We also have collaborators from the Brno University of Technology who work behind the scenes on the middleware and back-end of k-Plan and run the planning simulations in the cloud.

Using continuous integration efficiently

By Matt Graham, on 3 November 2023

Use of continuous integration (CI) is an important part of creating robust and reproducible research software, however running automated jobs on services such as GitHub Actions and GitLab CI/CD comes with an associated energy, and so environmental, cost.

As an example, in the 30 days from 2nd October to 1st November 2023, the UCL/TLOmodel repository ran GitHub Actions workflow jobs which used 1937 hours of runner time. As a (very rough) back-of-the-envelope estimate, assuming an average runner power consumption of 12W[1] this equates to a total monthly CI energy usage for this project of 23kWh, which is about 10% of a typical UK’s households monthly electricity usage or the monthly energy usage of around 8 UCL employee’s laptops[2].

Below are some simple ways to reduce GitHub Actions and GitLab CI/CD runner usage without compromising the gains that automated testing and deployment brings. These approaches also have the side benefit for private GitHub repositories of reducing usage of the free Actions minutes quota.

  1. Automatically cancel redundant jobs
    For jobs triggered by pushing commits to a pull or merge request branch, we may push new commits and trigger new job runs while previous runs are still in progress. Typically we will only care about the job results on the latest commit, and so the previous job runs will be redundant. The concurrency.group and concurrency.cancel-in-progress properties in GitHub Actions workflows can be used to automatically cancel already in progress job runs at different grouping levels – for example for workflows triggered for the same pull request, branch or tag. In GitLab, jobs which have the interruptible property set to true will be automatically cancelled when the Auto-cancel redundant pipelines project setting is enabled. Both GitHub and GitLab also allow manually skipping job runs when pushing new commits by including a token such as [skip ci] or [ci skip] in the commit message.
  2. Use filters and rules to only run jobs when needed
    Commonly some checks are only relevant to a subset of the files in a repository. For example if a branch only involves changes to a Markdown README file we probably do not need to run a test suite for the source code in a repository. In GitHub Actions we can use the optional paths and paths-ignore properties of the push and pull-request triggers to only run workflow jobs when the files changed do or do not match one or more patterns. Similarly in GitLab we can use rules to set conditions on a when a job runs, with specifically the rules:changes property allowing limiting runs to when files matching specified patterns are changed. It may also make sense to have jobs only run when pull requests are no longer drafts and are marked as ready to review. This can be achieved in GitHub Actions using a combination of the on.pull_requests.types property and using an if condition on the job that checks github.event.pull_request.draft is false.
  3. Set the scheduled job frequency appropriately
    Both GitHub Actions and GitLab CI/CD pipelines allow running jobs on a schedule using crontab syntax. While we sometimes refer to nightly jobs, it is worth considering carefully what the appropriate frequency is for such scheduled jobs based on the use case and wider context of the project, and whether running jobs at for example a weekly frequency might be sufficient. For example running daily jobs to check whether tests for a package pass against the latest compatible versions of upstream dependencies may make sense for a a package with a large userbase and development team where breaking changes in upstream packages are both likely to be encountered in practice quickly, and where there is likely to be sufficient developer capacity to resolve the issues in a similar timeframe. On the other hand for a package with fewer users and contributors, having a test job identify such breaking changes at a weekly frequency may be sufficient.
  4. Cache job dependencies
    A common step in CI jobs is setting up dependencies on the runner, often using a language specific package manager like npm or pip. Rather than redownloading and building dependencies afresh on each run, we can use caching features to reuse the dependencies built in previous jobs (providing the new run uses the same versions of dependencies). The GitHub cache action provides a generic approach for performing caching in GitHub Actions jobs, while language specific setup actions such as setup-python have built in support for setting up dependency caching. GitLab similarly provides support for caching dependencies.
  5. Set timeouts to avoid misbehaving jobs running for long times
    The default timeout for GitHub Actions jobs is 6 hours. If you expect a job to run in much less time than that, setting a more conservative timeout can avoid misbehaving jobs which hang or run very slowly, from inadvertently using a large amount of runner time. The default timeout on GitLab is shorter at 60 minutes but can similarly be adjusted.
  6. Chain jobs intelligently and fail fast
    Commonly we will run an assortment of test and checks as part of CI workflows with a corresponding wide range of run times. Code formatting and linting checks will typically be the quickest to run, while test suites may contain both quicker unit tests as well as integration and system tests which can take longer to run. Ensuring faster running checks and tests run first, and halting further jobs from running if these fail, will avoid wasting compute time running slow tests unnecessarily before changes to fix the faster running checks are made. Taking this one step further, for very quick checks such as linting and formatting, frameworks like pre-commit can be used to ensure checks are run automatically when committing changes, reducing the chance of CI jobs failing and needing to be rerun in the first place. In GitHub Actions the jobs.<job_id>.needs property can be used to specify that other jobs must successfully complete before another job runs. When running a matrix strategy job which creates one job run for each of a set of configurations (corresponding to for example different versions, operating systems or groups of tests) the jobs.<job_id>.strategy.fail-fast property can be set to true to cancel any other still in progress job runs in the matrix if a single run fails. GitLab pipeline jobs also have a needs property for specifying dependencies.

  1. Estimating the power usage of a CI job runner is complicated as typically jobs will be run on virtual machines (VMs) on cloud-hosted servers, with generally there being multiple VMs running in parallel on each server and potentially multiple job runners per VM. The analysis in Aldossary and Djemame (2019) suggests a VM with 4 virtual central processing units (CPUs) running on a host with eight-core Intel Xeon E3-1230 V2 CPU and 16GB of DDR3 RAM when running with an active load with 80% CPU utilization can be attributed a mean power consumption of around 40W (see Figure 7). If we assume four job runners per VM (one per virtual CPU) this corresponds to around 10W per runner. To account for the additional power overhead for data centre infrastructure such as cooling, we assume a power usage effectiveness of 1.2 (based on figures provided by Microsoft for Azure), giving an average overall power draw of 12W per runner. ↩︎

  2. This is based on an assumption of an average power draw of 20W (under a mix of idling and working at load) for a Dell Latitude 5410 laptop with a 36.5 hour working week and four weeks in a month giving an estimate of 2.92kWh used per laptop per month. ↩︎

Simulating light propagation through matter.

By Sam Cunliffe, on 31 October 2023

Observing how light interacts with materials allows us to develop non-invasive medical imaging techniques, that rely on these interactions to assemble an image or infer an appropriate diagnosis.

Light interacts with materials in many different ways. One of the most commonly observed interactions is dispersion; which causes white light to split into individual colours, creating phenomena like rainbows (light from the sun dispersing through raindrops). Another commonly observed interaction is refraction; which causes light to change direction as it passes between two materials, responsible for straight objects like straws appearing to be disjointed when placed into water. To completely describe what is going on in these interactions, we have to use a system of equations known as Maxwell’s equations. We also have to consider some additional parameters that describe the particular material(s) that the light is interacting with. In their most general form, Maxwell’s equations are very complex but have the advantage that almost all materials and interactions can be modelled by them. Solving these equations is, in general, impossible to do with pen and paper, so we need software to do this for us.

Software like this has a wide variety of applications in biomedical optics; notably optical coherence tomography (non-invasive medical imaging of the eye), multiphoton microscopy, and wavefront shaping. For example; we can use this software to model light propagating in the retina: simulating a retina scan. Then we can perform a retina scan for a patient in real life, and use our simulation to better understand the scan. Retinal scans often hint at a particular change to the retina, without being definitive, in the early stages of disease. We can use our simulation to test what types of changes to a retina can lead to observed signatures in an image and therefore help in achieving a diagnosis.

The Problem

In collaboration with the UCL Medical Physics and Biomedical Engineering department, developers from ARC have worked to open up a legacy C and MATLAB library which simulates light propagating through matter. This software was initially developed as part of a PhD thesis approximately 20 years ago and has been continuously developed since then. However, the need to rapidly answer research questions led to the code becoming less sustainable and harder for others to use. Whilst the core functionality was already there; the library needed updating to a more modern language and aligning with the FAIR4SW principles.

What we did

The aim of the project was to be able to provide users with a program that they can give custom input which describes the material they want to simulate, pass this to the software and receive an output they can use in further analysis. We wanted users not to have to worry about the internal workings of the software; only having to download the library code, build and install it once, and be ready for future analyses. We used modern build tools to standardise the build and install of the software, we aimed to make our instructions as straightforward and operating-system-independent as possible. We also set up automated testing of the software and wrote example scripts that users can modify to easily create input files in the correct format.

The outcomes

Version 1.0.1 of the Time Domain Maxwell Solver (TDMS), is now available under a GPL-3.0 license. You can download from GitHub, and install and run on all operating systems. The project has a public-facing website and a growing collection of examples. We also have developer documentation so anyone can contribute in the future.

TDMS 1.0.1 now has a number of new features, including the option to switch between different solver methods (how the simulation is performed), select custom regions over which to compute (to save wasting computation time), and the ability select different techniques for extracting output information through interpolation.

The ARC software engineers were a joy to work with. They brought knowledge of modern software engineering practice and quickly understood the code, and the underlying physics, as required to very effectively re-engineer the code. This collaboration with ARC will hopefully allow for a new range of users to access TDMS and significantly increase its impact.

Will Graham and Sam Cunliffe

A world map for website health checks with Grafana

By t.band, on 30 June 2021

UCL’s Geochronology software “IsoplotR” is available as an R package (just the calculations or with a web GUI) or online in a number of different locations. We already have Prometheus installed on one of our machines collecting metrics from various pieces of software we have running, and we have Grafana to display nice dashboards of these metrics. But could we add a little world map with little green or red dots for whether the various IsoplotR installations are up or down?

IsoplotR does not export Prometheus metrics, but polling the /version.txt HTTP endpoint on IsoplotR’s web interface should be enough to tell if it is running. So it looks like we need some sort of exporter to translate from  “can I hit /version.txt” to something Prometheus is happy with. There is of course a standard, official way to do this, but it is not obvious from the documentation. The answer is the Blackbox Exporter.

How do we use the Blackbox Exporter? Once again, the documentation is not clear. Do we configure Blackbox? Or do we configure Prometheus? Or both? My guess that configuring Prometheus would work nicely was borne out; just cargo-culting the example configuration and changing the list of endpoints to poll was enough.

I decided to run Blackbox from a Docker image, but the documentation here says to run:

docker run --rm -d -p 9115:9115 --name blackbox_exporter \
    -v `pwd`:/config prom/blackbox-exporter:master \
    --config.file=/config/blackbox.yml

But this just results in errors in the log, because we haven’t made a blackbox.yml file (and don’t want to). So instead we run:

docker run --rm -d -p 9115:9115 --name blackbox_exporter \
    prom/blackbox-exporter:master

That’s better!

curl "http://127.0.0.1:9115/probe?module=http_2xx"\
"&target=http://chinageology.org:8080/version.txt"

Produces some nice Prometheus output.

So having added the configuration to Prometheus and restarted it, we can check that it works from the Prometheus web front end. So I need a PromQL query. Eventually I figured out that the Prometheus output we got in the last stage was going to be a huge clue, so I tried this:

probe_success

and got a list of websites with their up/down status!

Now on to Grafana. We will need to add a world map plugin. I installed Grafana as a snap in Ubuntu Server, so the commands needed to install this plugin were different from the documentation:

sudo grafana.grafana-cli plugins install grafana-worldmap-panel

snap restart grafana

Now, looking at my Grafana browser tab I was expecting to see this Worldmap panel to be available. But it wasn’t. I could see it in the plugins list, but the new panel type wasn’t available to choose. It took me way too long to hit F5 and see if a refreshed page would show it to me. It did.

Now the tricky bit. We know I want “probe_success” as the PromQL query, but the Worldmap panel has confusing options. Should I be using Table or Time Series output? Table didn’t work for me, producing a Data error “TypeError: this.datapoints is undefined”. So, Time Series it is. Worldmap has options for how to get the locations: “country”, “state”, “geohash”… none of these are right; I don’t have anything in the query or values returned that mentions the location of the website polled. We probably have to go with “json endpoint” which allows us to provide a mapping from … something … to a location. But map from what?

Eventually this is what I figured out: the Worldmap panel looks at the legend of the data and tries to turn that into a location. If you choose “json endpoint” you can supply your own mapping. So, I set the query legend to:

{{instance}}

Then I added a json file in /var/www/html where an nginx installation on the same machine would see it and serve it up. The file looks a bit like this:

[
 {
  "key": "http://isoplotr.es.ucl.ac.uk/version.txt",
  "latitude": 51.525304,
  "longitude": -0.133806,
  "name": "London"
 },
 {
  "key": "http://isoplotr.geo.utexas.edu/version.txt",
  "latitude": 30.285623,
  "longitude": -97.736181,
  "name": "Austin"
 },
 ...
]

And I checked I could load this file in a browser from another machine (this might be overkill). Then on the Worldmap panel setting I could set “Location Data” to “json endpoint” and “Endpoint url” to the URL that serves up the JSON. And it worked! A few tweaks to make it look nice (like setting a single threshold to 1 and the colors to red and green) and we have our map!

So, not the most elegant or robust solution (I’ll have to maintain that isopotr-locations.json file by hand instead of being able to have the locations in with the Prometheus configuration) but it looks nice.