November | 2024 | UCL Centre for Advanced Research Computing

Archive for November, 2024

The ARC TRE

By George Svarovsky, on 28 November 2024

The Advanced Research Computing Trusted Research Environment (ARC TRE) is available for UCL researchers. It can be found at https://tre.arc.ucl.ac.uk/.

What is the ARC TRE?

The ARC TRE provides compute environments, with desktops for each researcher, for projects working on sensitive data. You can think of each project space as a secured virtual room with computers and a network – and strict rules about who can bring data in and take data out. It’s a safe setting, where safe projects with safe people can work on safe data and produce safe outputs.

We’re now onboarding projects for which the environment is suitable, via the Information Governance assurance process. Suitability of the environment will expand through 2025, with certification for NHS data early in the year, and ISO27001 certification in mid-year; so please do consider the environment for projects in planning.

How is it different?

In designing a next-generation environment for research computing on sensitive data, ARC had some ambitious principles in mind.

One size does not fit all. The environment allows projects to choose the compute resources they need, and also the tools, and even their internal governance and fine-grained data permissions. Changes in one project do not affect any other project.

For science to be reproducible, its software setup must be too. The TRE provides projects with Container technology (Docker/Podman and Snap), a key enabler for making software setup repeatable. Not everyone will be familiar with this technology! – with this design decision, ARC is doubling down on its mission: to support researchers making best use of the tools, practices and systems that enable computational science and digital scholarship. Our Collaborations & Consultancy team are ready to help.

Unbounded research needs unbounded compute. ARC is investing in research compute, and these platforms should be just as available for use with sensitive data as for any other research. The TRE uses cloud practices to provision the required security, initially on a public cloud platform, AWS – giving virtually infinite capacity – and, in future, onto ARC’s research compute and storage platforms – giving access to high-performance compute at much lower cost.

Who can use it?

The ARC TRE is running under a new Information Security Management System (ISMS), which is in the process of being operationalised at ARC. This means the environment is not yet ready to take NHS data, or data that explicitly requires ISO27001 certification, but both certifications are planned for the first half of 2025. So, if your data does not require these certifications, or you are planning research for later in 2025, you can consider using the ARC TRE. In any case, the Information Governance assurance team or the TRE team will be happy to advise on the suitability of the environment for your research.

In terms of costs, the ARC TRE has a straightforward and permissive pricing structure:

For basic usage (a moderately powerful t3a.medium desktop per user, and 100GB of project storage), the ARC TRE is free.
If you require more than this, we will ask you to include some “facility” costs in your funding. We use a transparent model based on AWS pricing. As an example, requesting a p3.2xlarge (with a V100 GPU with 16GB VRAM) will cost about £140 a month.
We will not rescind access or charge you, even if you end up using more compute than originally estimated, or if the AWS price fluctuates. (In the case of accidental or unreasonable overuse, we’ll have a polite conversation!)

Early adopters of the ARC TRE might find they are talking to the team, and influencing the development of the environment, more than they expect! The project is run using Agile and DevOps, and we will continuously improve the platform (always taking great care to ensure projects in flight are not adversely affected).

What’s in it for ARC?

ARC is UCL’s research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship. We expect concrete learning from and for each TRE project: benefiting both ARC and the research teams.

As ARC collaborates on different projects with their own configurations and procedures to solve their own unique challenges, we will be building up an armoury of configurations and procedures that can be used again. A configuration that worked well with one dataset will be invaluable for another project with a similar dataset. Data owners will find that we can meet their needs repeatedly and repeatably.

As we are expecting to continue to manage this service through many hundreds of projects, we will be able to take the knowledge learnt from one project into future projects, as well as working out how to gather and organize this information to improve continuously our information security and user experience.

Project Example: Machine-Learning Assisted Diagnosis

We are engaged with a research and innovation group with an ambition to provide a mobile phone-based solution for assisting the diagnosis of jaundice in babies, a prime indicator of liver disease later in life. This project will be supported by the TRE through its innovation journey from research to spin-out:

Consented patient photographs are captured and ingressed into the TRE, and medical records ingressed from the hospital.
Machine learning (ML) models can be developed in a TRE Project using the babies’ sensitive data.
The ML models can continue to be refined in the TRE while and after the production system is built on the public AWS cloud, with new model versions being egressed and deployed.
Learnings and components from the TRE Project can be shared and reused, including best practices, and hospital data pipeline integration.

What about the Data Safe Haven?

The current Data Safe Haven was designed over ten years ago. Its environment meets many research requirements, but it was not designed to support more modern data handling methodologies. The ARC TRE will provide a modern computing environment with the additional flexibility and power to support those UCL researchers for whom the Data Safe Haven is limiting, providing the same level of Information Governance assurance. Development is under way and is being carried out in an iterative fashion, with new features being rolled out over time.

UCL ARC and ISD are committed to supporting the Data Safe Haven until the ARC TRE is ready to take on all existing researchers. As new features are available in the TRE, research studies will be migrated in a controlled and supported manner. At this stage it is hard to put a timescale on this. Any changes will be fully supported with training and assistance.

What’s next?

As mentioned, the team is set up to continuously improve the ARC TRE. We have several capabilities that we know will be needed, and we will work with researchers to refine these requirements and to discover new ones.

Machine ingress. We’re already working on a secure API so that data can be fed into Projects without human interaction.

More shared resources. ARC and researchers can make contributions to the resources available to all ARC TRE projects, including common software tools and reference data. We expect to greatly expand the available resources over the coming months.

Inter-project sharing. We intend to allow projects with the requisite information governance to exchange data without it having to leave the environment.

Project templates. We strongly encourage projects in the ARC TRE to use software version control for their project materials. This is an essential tool for collaboration; and provides a history to aid transparency and recall of decisions. ARC will demonstrate how to re-use existing project setup repositories, which can become templates for multiple projects.

Archiving. The ARC TRE is not intended for long-term storage of research data; instead, it’s a compute environment for active research. However, many projects require archival of sensitive data. For now, researchers can leave their projects on the TRE, where the data is securely backed-up. We’re working with the ARC Data Stewards to ensure that project archival requirements can be met optimally in future.

Windows desktops. The ARC TRE provides Red Hat Enterprise Linux desktops. Our initial user feedback has been (slightly surprisingly) positive about this decision – but we know that many projects in the Data Safe Haven make use of Windows tools. A managed Windows environment for sensitive data will continue to exist, for long as it’s needed – design of the long-term solution is ongoing.

Thanks

A great many people have been involved in this project, at ARC, ISD, ISG and beyond, and we are grateful to one and all.

The core technical team of Tom Young (lead), Robert Vickerstaff, and Sunny Park; enabled by delivery manager Rupert Roker. Architecture and security management system led by Tim Machin, ably supported by Finley Bacon who also took on the service management; and the ARC information governance team Martin Donnelly, Preeti Matharu and Victor Olago. Ongoing and valued expertise from Jack Hindley, Dan Bretherton, Trevor Peacock, Anthony Peacock (no relation), David Solomon, Werner Niekirk, Ben Thomas, and others who have been pestered about one thing or another since January. User representation from Nick Owen, Socrates Varakliotis, John Watts, Michelle Harricharan, and many more. And the leadership that sponsored and fought for the project: Donna Swan, Rachel Cooper, and James Hetherington.

Want to know more?

The environment is visible to anyone with a UCL login, at https://tre.arc.ucl.ac.uk/

Information on Sensitive Data and Trusted Research Environments

Homepage of the Trusted Research Environment Assurance process

George Svarovsky is a Principal Research Software Engineer working in UCL’s Centre for Advanced Research Computing. George is the Product Owner for the Trusted Research Environment, working with the service’s users to understand their needs and ensure that the service his team builds meets those needs. If you would like to get in touch with George to talk about how the TRE can best support your research then please contact the team.

Filed under Uncategorized

Tags: data-safe-haven, research-software, secure-data-environment, TRE, trusted-research-environment

Comments Off on The ARC TRE

UCL ARC at the 2024 International RSE Conference

By Jonathan Cooper, on 18 November 2024

This year’s RSE conference took place from 3^rd – 5^th September in Newcastle. Around 20 ARC staff attended in person, with several others joining in the hybrid experience remotely. The latter has been steadily improving year on year, at least for contributing to the formal parts of the conference programme. The informal conversations are still much better in person, and this aspect was particularly appreciated by our newer RSEs. This blog post summarises our joint impressions of the conference, based on a debriefing discussion we had in our weekly “Collaboration Hour” later in September, and edited by Jonathan Cooper. We have raced against the conference committee to get this blog post out before the conference materials are published!

The programme had a mix of technical topics and sessions devoted to how we work as RSEs, and indeed wider “Research Technology Professionals” (RTPs) as well. RTP is a newly coined phrase used by UKRI among others to refer to the wide range of specialist roles that exist alongside traditional researchers, encompassing all the ARC professions and more. Several initiatives are aiming to leverage the success of the RSE movement to advance other professions in a similar way, some of which we are involved in or proposed at the conference, notably in the RSE leaders and aspiring leaders satellite event on the Monday. This has grown massively since its inception as a safe space for those struggling to create RSE groups to share the pain and learn from each other! Now there are many kinds of RSE group, many individuals in different RTP leadership roles, and much more wide-ranging discussions as a result. I particularly appreciated the session on the skills that leaders need in this environment – what people have found helpful and how we should be growing the next generation.

A similar topic was covered in the RSE Competencies workshop, although this covered all areas of RSE skills and tried to categorise these. We ended up with more non-technical skills than those focused on specific technologies. The work is ongoing with monthly community meetings, aiming to build a toolkit that will help people advance their careers: identify skills they need and avenues for training and professional development.

Several sessions focused on project management and Agile approaches. We heard from Manchester how they are adapting Scrum for their research projects, notably the categorisation of projects that they have according to how large they are and how engaged the researchers are, and therefore the different sort of tweaks they’re made to Scrum in each case. These seem to be fairly similar to how we operate in ARC, but in a more formalised structure. We contributed to the discussion session they ran on the following day (led by Sarah Jaffa, formerly of ARC!) with Monika and I doing a double act presenting a high level view of our approach and a summary of Kanban. An important theme of that session was that project management is as much about self-care as it is delivering on the goals of the project, and these aspects need to be well balanced or both suffer. A special interest group (SIG) is being set up dedicated to project management, and we have continued discussions within ARC too, with a recent blog post on adapting the agile values and principles to a research context.

Others in the group focused more on the technical programme. Mutation testing was one highlight – described as sort of like test coverage, but your code is randomly changed to see what breaks. If no tests fail then you may have revealed an untested code path that needs to be tested and fixed. It’s good for catching edge cases that haven’t been thought of but does take time to run. We noted that this is good as part of a wider array of testing approaches that can be used, for example hypothesis testing (randomising code inputs rather than the code itself).

Best practices for setting up development environments were covered in a couple of talks, and how this is perhaps one aspect that distinguishes an RSE from a CS researcher. These range from use of pip and pipenv in Python to things like dev containers and Nix. These are important for reproducibility. The Netherlands eScience Centre python project template had a nice feature that allows updating projects created using a template when updates to the template itself are made.

Several talks looked at performant Python. We were surprised (perhaps unfairly) at how much impact simply upgrading to the latest Python version can have. Tools like numba and approaches such as vectorisation were well known, but tips for using list comprehensions, sometimes in preference even to Pandas apply operations, were appreciated and will be useful for several of our projects.

As you might expect from ARC we had significant involvement in the high-performance computing sessions, including Tuomas running a “birds of a feather” (BOF) event for the HPC RSE community and giving several talks. Talks not by us covered a range of topics, including the age-old comparison of the merits of different languages and porting between them, the newer frameworks aiming to ease GPU programming, portability between different hardware, and debugging parallel programs. We enjoyed trying out the Grace Hopper chips in IsambardAI, and discussing how to utilise HPC in the most environmentally sustainable way. The conclusion from Archer2 is that given the CO₂ released by manufacturing HPC systems, the best option is to run them as intensively as possible since this maximises the research done for a given carbon cost – and indeed that personal lifestyle changes may be a better option for minimising your impact!

Green RSE was the focus for another BOF which some ARC staff attended. A SIG is being set up for this, trying to raise awareness of what RSEs can do and consider what training might be helpful. This is something we want to get involved with more at ARC, starting with an inventory of our current state in conjunction with the department’s Green Team.

The Fortran satellite event was very well run. It revealed that many people want similar improvements to the Fortran ecosystem to support automated testing and the like. We have recently started an initiative along those lines at ARC so will be trying to work with the wider community on this and avoid duplication of effort, having now met some relevant people.

Some talks focused more on particular research domains. Given ARC’s current efforts developing Trusted Research Environments (TREs) we were interested by the Turing’s approach. They worry less about packages coming into the secure environment, and advocated for just proxying CRAN, PyPI and the like, while making sure that your infrastructure is set up securely enough that things can’t get out unless you want them to. So if something does get in and cause havoc, it shouldn’t be able to egress any data and it should only affect a single study or project. This is also the approach we take in ARC’s TRE.

The unconference session was a highlight for some, particularly discussion of developing software as a medical device. This covered trying to work with people in the institution to come up with processes, but also trying to figure out what the role of RSEs and the Society of RSE should be in that process. Are we just followers or the ones coming up with the process? How much of the regulatory side is our responsibility and who else we need to work with? No firm answers from that, but lots of questions! A prototype tool we developed for one project may be useful here if we can get funding and/or collaborators to continue the work.

All found the keynote talks inspiring, especially the one from Anne-Marie Imafidon. The test driven development workshop got a mention as being really clearly presented with great materials, as did an interesting C++ graphics library called Morphologica.

We are all looking forward to next year’s conference in Warwick. On a personal level, I’m especially pleased that it is likely not to conflict with INSET days and I’ll be able to be there in person again.

Filed under Community, Event, Opinion

Comments Off on UCL ARC at the 2024 International RSE Conference

Alignment of the Agile Manifesto to a Research Context

By Monika Byrne Svata, on 14 November 2024

This article proposes aligning the language of the original Agile Manifesto – written over 20 years ago, for software development in a commercial setting – with our current context of digital research projects involving research software engineering, data science, data stewardship, and research infrastructure development.

This work was inspired by discussion about the wording of the Agile Manifesto during the regular Agile Training for Research projects that we run in UCL’s Advanced Research Computing Centre (ARC) for senior staff in our Collaborations team. To gain wider input from colleagues we devoted two ARC “Collaboration Hour” sessions to this topic, with additional conversation held on Slack, some email input, and a period where a draft of this article was available for internal comment.

We hold that the core ideas behind Agile, such as responding to change, valuing people interactions, etc., are valid and beneficial in a research context. However, the specific expression of these may be able to be improved on – in true Agile fashion! Our aim is that this will make it easier to apply the Agile principles in the management of our Collaborations projects, by removing the cognitive dissonance caused by the language inspired by a different context. By publishing this article, we hope that others will see a similar benefit, and we invite feedback from the community.

The Original Agile Manifesto

It originated in February 2001 in a meeting of representatives from emerging ‘lightweight’ software development approaches in response to the need for an alternative to documentation driven, heavyweight software development processes.

Although there are many frameworks to aid the application of Agile approaches for particular settings, the manifesto emphasises that the change of culture within organisations and teams is the key element and the condition of the success of implementing Agile ways of working.

“While the Manifesto provides some specific ideas… there is a deeper theme of values based on trust and respect for each other and promoting organizational models based on people, collaboration, and building the types of organizational communities in which we would want to work.”

“So, in the final analysis, the meteoric rise of interest in – and sometimes tremendous criticism of – Agile Methodologies is about the mushy stuff of values and culture.”

For a fuller history, visit the Agile Manifesto website.

The original Agile Manifesto contains 4 Agile Values, and 12 Agile Principles.

Below we give the original text of each alongside our updated version and discuss the reasons for our proposed revisions.

Key Terms

Although the wording of each of the values and principles has been considered separately, to make sure that it reflects the best of both the original meaning and its application to research/academia, we found it useful to give an initial consideration and space for discussion to some of the repeated key terms and the reality of research projects.

Original wording	Discussion about new wording
Customer	‘Customer’ implies negotiation and a zero-sum game, rather than a collaboration with a common goal. This also applies to the term ‘Client’. ‘End user’ is a specific term that might not correctly reflect the reality of a research project or correctly describe the collaborators. ‘Collaborators’, ‘our collaborators’, ‘all collaborators’ feel like the terms best describing this role.
Valuable software / Working Software	Terms like ‘valuable software’, ‘working software’ or ‘digital artefacts’ are too limiting, as the outputs of collaborations projects are often other than software (e.g. research, teaching/education, service, etc.) The suggested terms that felt acceptable included ‘desired outcome’, ‘academic output’, ‘the research’, ‘research outcome’.
Developers	Research Technical Professionals – RTP
Business people	Depending on the context of the individual principles, terms like ‘researchers’ or ‘domain experts’ felt appropriate.
Major current areas of pain for research projects	The original context of the Agile Manifesto, expressed in the Agile Values, was that it was responding to the reality of rigid overplanning and over-documenting, where any change, learning, or other deviation from the original assumptions was seen as disruptive and a risk. As the reality of research projects in 2024 carries different issues and risks, we wanted to keep these in mind, so that the values address these. Some of the pain points of research projects highlighted in the discussion: Insufficient documentation (leaving ‘breadcrumbs’ behind) The scale and ambiguity of the research outcomes Parallel working on multiple projects Limited longevity of the projects and teams due to grant work

Revised Agile Values

Below is the original wording of Agile Values followed by the new wording that is the result of ARC-wide discussion, and in our view best represents their application to research projects.

Original:

Individuals and interactions over processes and tools.
Working software over comprehensive documentation.
Customer collaboration over contract negotiation.
Responding to change over following a plan.

That is, while there is value in the items on the right, we value the items on the left more.

Agile Values for Research Projects:

In these statements, while there is value in the items on the right, we value the items on the left more.

Individuals and interactions supported by suitable processes and tools.
Working solutions supported by adequate documentation.
Collaboratively responding to change supported by agile planning.

Discussion Points:

To highlight the importance of all elements of delivery (including documentation, tools, processes, planning etc.), we agreed to move the sentence stressing this point to the start. For the same reason, we changed the word ‘over’ for ‘supported by’.
To denote that the processes and tools are in service of the main outcome, we added the word ‘suitable’.
The term ‘comprehensive’ documentation has been updated to ‘adequate’ documentation to reflect that the detail, format, and amount of documentation needs to be fit for purpose rather than a goal or outcome in its own right.
‘Contract negotiation’ in research is different than in a business setting, being typically less adversarial and restricted to agreement with funders. The concept as evoked in the original values applies more to the process of requirements elicitation and jointly planning for the project delivery, so we agreed to merge the values related to contracts and to planning, with the overarching theme of collaborative work. This is to stress that the nature of scoping, planning and delivery of research projects is collaborative and evolving, rather than a fixed result of prior negotiations.

Revised Agile Principles

For each principle we set out the original wording followed by the new wording that is the result of the ARC-wide discussion and best represents their application to research projects.

There was general agreement and no challenge to this principle in the discussion. Everyone in the team is familiar with team or sprint retrospectives and broadly in agreement about their usefulness.
The challenge in this space might be in details of the practice of retrospectives (or similar techniques) – their frequency, who runs this meeting, who attends the meeting – to make sure that it brings the intended benefits, and in the ways the learnings are actively fed back into the working practices of the team.

Filed under Agile Project Management, Collaboration, Transforming Research Communities

Comments Off on Alignment of the Agile Manifesto to a Research Context

Archive for November, 2024

The ARC TRE

What is the ARC TRE?

How is it different?

Who can use it?

What’s in it for ARC?

Project Example: Machine-Learning Assisted Diagnosis

What about the Data Safe Haven?

What’s next?

Thanks

Want to know more?

UCL ARC at the 2024 International RSE Conference

Alignment of the Agile Manifesto to a Research Context

The Original Agile Manifesto

Key Terms

Revised Agile Values

Original:

Agile Values for Research Projects:

Discussion Points:

Revised Agile Principles

Principle 1

Original:

New Wording for Research Projects:

Discussion Points:

Principle 2

Original:

New Wording for Research Projects:

Discussion Points:

Principle 3

Original:

New Wording for Research Projects:

Discussion Points:

Principle 4

Original:

New Wording for Research Projects:

Discussion Points:

Principle 5

Original:

Wording for Research Projects – No Change:

Discussion Points:

Principle 6

Original:

New Wording for Research Projects:

Discussion Points:

Principle 7

Original:

New Wording for Research Projects:

Discussion Points:

Principle 8

Original:

New Wording for Research Projects:

Discussion Points:

Principle 9

Original:

Wording for Research Projects – No Change:

Discussion Points:

Principle 10

Original:

Wording for Research Projects – No Change:

Discussion Points:

Principle 11

Original:

Wording for Research Projects – No Change:

Discussion Points:

Principle 12

Original:

Wording for Research Projects – No Change:

Discussion Points:

Recent Posts

Archives

Categories

Meta