The ARC TRE
By George Svarovsky, on 28 November 2024
The Advanced Research Computing Trusted Research Environment (ARC TRE) is available for UCL researchers. It can be found at https://tre.arc.ucl.ac.uk/.
What is the ARC TRE?
The ARC TRE provides compute environments, with desktops for each researcher, for projects working on sensitive data. You can think of each project space as a secured virtual room with computers and a network – and strict rules about who can bring data in and take data out. It’s a safe setting, where safe projects with safe people can work on safe data and produce safe outputs.
We’re now onboarding projects for which the environment is suitable, via the Information Governance assurance process. Suitability of the environment will expand through 2025, with certification for NHS data early in the year, and ISO27001 certification in mid-year; so please do consider the environment for projects in planning.
How is it different?
In designing a next-generation environment for research computing on sensitive data, ARC had some ambitious principles in mind.
One size does not fit all. The environment allows projects to choose the compute resources they need, and also the tools, and even their internal governance and fine-grained data permissions. Changes in one project do not affect any other project.
For science to be reproducible, its software setup must be too. The TRE provides projects with Container technology (Docker/Podman and Snap), a key enabler for making software setup repeatable. Not everyone will be familiar with this technology! – with this design decision, ARC is doubling down on its mission: to support researchers making best use of the tools, practices and systems that enable computational science and digital scholarship. Our Collaborations & Consultancy team are ready to help.
Unbounded research needs unbounded compute. ARC is investing in research compute, and these platforms should be just as available for use with sensitive data as for any other research. The TRE uses cloud practices to provision the required security, initially on a public cloud platform, AWS – giving virtually infinite capacity – and, in future, onto ARC’s research compute and storage platforms – giving access to high-performance compute at much lower cost.
Who can use it?
The ARC TRE is running under a new Information Security Management System (ISMS), which is in the process of being operationalised at ARC. This means the environment is not yet ready to take NHS data, or data that explicitly requires ISO27001 certification, but both certifications are planned for the first half of 2025. So, if your data does not require these certifications, or you are planning research for later in 2025, you can consider using the ARC TRE. In any case, the Information Governance assurance team or the TRE team will be happy to advise on the suitability of the environment for your research.
In terms of costs, the ARC TRE has a straightforward and permissive pricing structure:
- For basic usage (a moderately powerful t3a.medium desktop per user, and 100GB of project storage), the ARC TRE is free.
- If you require more than this, we will ask you to include some “facility” costs in your funding. We use a transparent model based on AWS pricing. As an example, requesting a p3.2xlarge (with a V100 GPU with 16GB VRAM) will cost about £140 a month.
- We will not rescind access or charge you, even if you end up using more compute than originally estimated, or if the AWS price fluctuates. (In the case of accidental or unreasonable overuse, we’ll have a polite conversation!)
Early adopters of the ARC TRE might find they are talking to the team, and influencing the development of the environment, more than they expect! The project is run using Agile and DevOps, and we will continuously improve the platform (always taking great care to ensure projects in flight are not adversely affected).
What’s in it for ARC?
ARC is UCL’s research, innovation and service centre for the tools, practices and systems that enable computational science and digital scholarship. We expect concrete learning from and for each TRE project: benefiting both ARC and the research teams.
As ARC collaborates on different projects with their own configurations and procedures to solve their own unique challenges, we will be building up an armoury of configurations and procedures that can be used again. A configuration that worked well with one dataset will be invaluable for another project with a similar dataset. Data owners will find that we can meet their needs repeatedly and repeatably.
As we are expecting to continue to manage this service through many hundreds of projects, we will be able to take the knowledge learnt from one project into future projects, as well as working out how to gather and organize this information to improve continuously our information security and user experience.
Project Example: Machine-Learning Assisted Diagnosis
We are engaged with a research and innovation group with an ambition to provide a mobile phone-based solution for assisting the diagnosis of jaundice in babies, a prime indicator of liver disease later in life. This project will be supported by the TRE through its innovation journey from research to spin-out:
- Consented patient photographs are captured and ingressed into the TRE, and medical records ingressed from the hospital.
- Machine learning (ML) models can be developed in a TRE Project using the babies’ sensitive data.
- The ML models can continue to be refined in the TRE while and after the production system is built on the public AWS cloud, with new model versions being egressed and deployed.
- Learnings and components from the TRE Project can be shared and reused, including best practices, and hospital data pipeline integration.
What about the Data Safe Haven?
The current Data Safe Haven was designed over ten years ago. Its environment meets many research requirements, but it was not designed to support more modern data handling methodologies. The ARC TRE will provide a modern computing environment with the additional flexibility and power to support those UCL researchers for whom the Data Safe Haven is limiting, providing the same level of Information Governance assurance. Development is under way and is being carried out in an iterative fashion, with new features being rolled out over time.
UCL ARC and ISD are committed to supporting the Data Safe Haven until the ARC TRE is ready to take on all existing researchers. As new features are available in the TRE, research studies will be migrated in a controlled and supported manner. At this stage it is hard to put a timescale on this. Any changes will be fully supported with training and assistance.
What’s next?
As mentioned, the team is set up to continuously improve the ARC TRE. We have several capabilities that we know will be needed, and we will work with researchers to refine these requirements and to discover new ones.
Machine ingress. We’re already working on a secure API so that data can be fed into Projects without human interaction.
More shared resources. ARC and researchers can make contributions to the resources available to all ARC TRE projects, including common software tools and reference data. We expect to greatly expand the available resources over the coming months.
Inter-project sharing. We intend to allow projects with the requisite information governance to exchange data without it having to leave the environment.
Project templates. We strongly encourage projects in the ARC TRE to use software version control for their project materials. This is an essential tool for collaboration; and provides a history to aid transparency and recall of decisions. ARC will demonstrate how to re-use existing project setup repositories, which can become templates for multiple projects.
Archiving. The ARC TRE is not intended for long-term storage of research data; instead, it’s a compute environment for active research. However, many projects require archival of sensitive data. For now, researchers can leave their projects on the TRE, where the data is securely backed-up. We’re working with the ARC Data Stewards to ensure that project archival requirements can be met optimally in future.
Windows desktops. The ARC TRE provides Red Hat Enterprise Linux desktops. Our initial user feedback has been (slightly surprisingly) positive about this decision – but we know that many projects in the Data Safe Haven make use of Windows tools. A managed Windows environment for sensitive data will continue to exist, for long as it’s needed – design of the long-term solution is ongoing.
Thanks
A great many people have been involved in this project, at ARC, ISD, ISG and beyond, and we are grateful to one and all.
The core technical team of Tom Young (lead), Robert Vickerstaff, and Sunny Park; enabled by delivery manager Rupert Roker. Architecture and security management system led by Tim Machin, ably supported by Finley Bacon who also took on the service management; and the ARC information governance team Martin Donnelly, Preeti Matharu and Victor Olago. Ongoing and valued expertise from Jack Hindley, Dan Bretherton, Trevor Peacock, Anthony Peacock (no relation), David Solomon, Werner Niekirk, Ben Thomas, and others who have been pestered about one thing or another since January. User representation from Nick Owen, Socrates Varakliotis, John Watts, Michelle Harricharan, and many more. And the leadership that sponsored and fought for the project: Donna Swan, Rachel Cooper, and James Hetherington.
Want to know more?
The environment is visible to anyone with a UCL login, at https://tre.arc.ucl.ac.uk/
Information on Sensitive Data and Trusted Research Environments
Homepage of the Trusted Research Environment Assurance process
George Svarovsky is a Principal Research Software Engineer working in UCL’s Centre for Advanced Research Computing. George is the Product Owner for the Trusted Research Environment, working with the service’s users to understand their needs and ensure that the service his team builds meets those needs. If you would like to get in touch with George to talk about how the TRE can best support your research then please contact the team.