X Close

Centre for Advanced Research Computing


ARC is UCL's research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship


Archive for November, 2023

First Julia workshop at ARC

By David Pérez-Suárez, on 15 November 2023

Last Friday (the 10th of November) we run our first ever Julia workshop. After years of having an expert in our team – Mosè – who has been introducing the rest of the team to this wonderful language and even convinced some collaborators to use it on their projects, we’ve done the jump to teach it to the UCL community.

paperplane imageThis first workshop was limited to a reduced number of learners (10-12). Seven of which attended (most of them physicists!), with our team of three (myself instructing, with Mosè and Tuomas helping — also all physicists 😅) made the learners’ experience very positive.

Originally, we were going to use the Carpentries Julia lesson available in the incubator. However, Mosè and I decided against it as the expected previous knowledge was higher than what we were aiming for. Therefore, we created our own lesson!

Our lesson started with the basics, different types of numbers, strings and how all them fit in the family of types in Julia. We introduced some of the quirks Julia surprises you with when you come from a different language. This was key in our lesson! We started to write a function as if it was Python — which was what we expected to be the most familiar for our cohort. From there, we were introducing new concepts and syntax to make our code more “julianic” (I’ve come up with that term, so it may not be the one used by the Julia community). We covered the basics (types, function, conditionals, loops and plotting) during the morning session. After lunch, we went to introduce how to use other libraries to solve polynomials and ordinary differential equations. We even introduced unit testing and had time to learn how to work with CSV files with DataFrames and gave a quick overview of Pluto.

During the preparation of the material and the class, I was constantly supported by Mosè, bouncing lots of ideas and suggestions. We’ve even found a bug in one of the libraries we were going to use that they fixed instantly after Mosè reported it.

The class went smoothly. We encountered some problems with the installation of Julia and some unexpected slowness when installing libraries (we reported it after the workshop, and it was also fixed straight away!). This is some of the feedback we’ve received at the end of the day:

  • Great course, learned a lot.
  • The course has been great. The pace is good and it allows us to ask any questions we have.
  • Comparison’s to Python really helped me appreciate the advantages of Julia. Paper plane example was great.
  • Very good course, covered all the right topics for a 1-day intro session.

Personally, I don’t remember a class that has gone so well! With very little difficulties, covering everything we were planning to do and answering very interesting questions from our learners. It may have been due to the small number of learners, or because of their previous programming experience, or the similar background across all of them, or maybe, it’s because Julia is easy to learn 😉. Whatever reason it is, I really want to repeat it, with a larger class and a more varied background of learners. There’s no reason for only letting the physicists have fun with Julia, right.

So, if you are interested in learning Julia, be sure that we will repeat more sessions like this one! This may be too basic for you? Don’t worry, we are also planning to run a more advanced workshop focused on Julia for HPC during Term 2’s reading week. Keep an eye out for our future announcements.

Now that we have started, we won’t stop!

Using continuous integration efficiently

By Matt Graham, on 3 November 2023

Use of continuous integration (CI) is an important part of creating robust and reproducible research software, however running automated jobs on services such as GitHub Actions and GitLab CI/CD comes with an associated energy, and so environmental, cost.

As an example, in the 30 days from 2nd October to 1st November 2023, the UCL/TLOmodel repository ran GitHub Actions workflow jobs which used 1937 hours of runner time. As a (very rough) back-of-the-envelope estimate, assuming an average runner power consumption of 12W[1] this equates to a total monthly CI energy usage for this project of 23kWh, which is about 10% of a typical UK’s households monthly electricity usage or the monthly energy usage of around 8 UCL employee’s laptops[2].

Below are some simple ways to reduce GitHub Actions and GitLab CI/CD runner usage without compromising the gains that automated testing and deployment brings. These approaches also have the side benefit for private GitHub repositories of reducing usage of the free Actions minutes quota.

  1. Automatically cancel redundant jobs
    For jobs triggered by pushing commits to a pull or merge request branch, we may push new commits and trigger new job runs while previous runs are still in progress. Typically we will only care about the job results on the latest commit, and so the previous job runs will be redundant. The concurrency.group and concurrency.cancel-in-progress properties in GitHub Actions workflows can be used to automatically cancel already in progress job runs at different grouping levels – for example for workflows triggered for the same pull request, branch or tag. In GitLab, jobs which have the interruptible property set to true will be automatically cancelled when the Auto-cancel redundant pipelines project setting is enabled. Both GitHub and GitLab also allow manually skipping job runs when pushing new commits by including a token such as [skip ci] or [ci skip] in the commit message.
  2. Use filters and rules to only run jobs when needed
    Commonly some checks are only relevant to a subset of the files in a repository. For example if a branch only involves changes to a Markdown README file we probably do not need to run a test suite for the source code in a repository. In GitHub Actions we can use the optional paths and paths-ignore properties of the push and pull-request triggers to only run workflow jobs when the files changed do or do not match one or more patterns. Similarly in GitLab we can use rules to set conditions on a when a job runs, with specifically the rules:changes property allowing limiting runs to when files matching specified patterns are changed. It may also make sense to have jobs only run when pull requests are no longer drafts and are marked as ready to review. This can be achieved in GitHub Actions using a combination of the on.pull_requests.types property and using an if condition on the job that checks github.event.pull_request.draft is false.
  3. Set the scheduled job frequency appropriately
    Both GitHub Actions and GitLab CI/CD pipelines allow running jobs on a schedule using crontab syntax. While we sometimes refer to nightly jobs, it is worth considering carefully what the appropriate frequency is for such scheduled jobs based on the use case and wider context of the project, and whether running jobs at for example a weekly frequency might be sufficient. For example running daily jobs to check whether tests for a package pass against the latest compatible versions of upstream dependencies may make sense for a a package with a large userbase and development team where breaking changes in upstream packages are both likely to be encountered in practice quickly, and where there is likely to be sufficient developer capacity to resolve the issues in a similar timeframe. On the other hand for a package with fewer users and contributors, having a test job identify such breaking changes at a weekly frequency may be sufficient.
  4. Cache job dependencies
    A common step in CI jobs is setting up dependencies on the runner, often using a language specific package manager like npm or pip. Rather than redownloading and building dependencies afresh on each run, we can use caching features to reuse the dependencies built in previous jobs (providing the new run uses the same versions of dependencies). The GitHub cache action provides a generic approach for performing caching in GitHub Actions jobs, while language specific setup actions such as setup-python have built in support for setting up dependency caching. GitLab similarly provides support for caching dependencies.
  5. Set timeouts to avoid misbehaving jobs running for long times
    The default timeout for GitHub Actions jobs is 6 hours. If you expect a job to run in much less time than that, setting a more conservative timeout can avoid misbehaving jobs which hang or run very slowly, from inadvertently using a large amount of runner time. The default timeout on GitLab is shorter at 60 minutes but can similarly be adjusted.
  6. Chain jobs intelligently and fail fast
    Commonly we will run an assortment of test and checks as part of CI workflows with a corresponding wide range of run times. Code formatting and linting checks will typically be the quickest to run, while test suites may contain both quicker unit tests as well as integration and system tests which can take longer to run. Ensuring faster running checks and tests run first, and halting further jobs from running if these fail, will avoid wasting compute time running slow tests unnecessarily before changes to fix the faster running checks are made. Taking this one step further, for very quick checks such as linting and formatting, frameworks like pre-commit can be used to ensure checks are run automatically when committing changes, reducing the chance of CI jobs failing and needing to be rerun in the first place. In GitHub Actions the jobs.<job_id>.needs property can be used to specify that other jobs must successfully complete before another job runs. When running a matrix strategy job which creates one job run for each of a set of configurations (corresponding to for example different versions, operating systems or groups of tests) the jobs.<job_id>.strategy.fail-fast property can be set to true to cancel any other still in progress job runs in the matrix if a single run fails. GitLab pipeline jobs also have a needs property for specifying dependencies.

  1. Estimating the power usage of a CI job runner is complicated as typically jobs will be run on virtual machines (VMs) on cloud-hosted servers, with generally there being multiple VMs running in parallel on each server and potentially multiple job runners per VM. The analysis in Aldossary and Djemame (2019) suggests a VM with 4 virtual central processing units (CPUs) running on a host with eight-core Intel Xeon E3-1230 V2 CPU and 16GB of DDR3 RAM when running with an active load with 80% CPU utilization can be attributed a mean power consumption of around 40W (see Figure 7). If we assume four job runners per VM (one per virtual CPU) this corresponds to around 10W per runner. To account for the additional power overhead for data centre infrastructure such as cooling, we assume a power usage effectiveness of 1.2 (based on figures provided by Microsoft for Azure), giving an average overall power draw of 12W per runner. ↩︎

  2. This is based on an assumption of an average power draw of 20W (under a mix of idling and working at load) for a Dell Latitude 5410 laptop with a 36.5 hour working week and four weeks in a month giving an estimate of 2.92kWh used per laptop per month. ↩︎