X Close

Data Management Planning for Secure Services (DMP-SS)

Home

Just another Blogs.ucl.ac.uk site

Menu

Archive for the 'Workshop' Category

Data management planning – Report on 18th July 2012 Workshop

By F D ( Tito ) Castillo, on 14 January 2013

Data management planning

As a research community, we all appreciate that our research data are important assets. A great deal of time and money is dedicated to collecting and processing them. But their value beyond the initial research lies in the ease with which they can be shared and re-used to support further research. Researchers and the institutions they are part of need to plan for the complete data life-cycle – from data collection through to archiving – to facilitate this and ensure their data can realize their full potential.

Data management plans: why do we need them?

For researchers, data management planning should be an integral part of their research planning, allowing research projects to be more accurately costed and resourced.

For institutions, having standard data management processes and clear guidance for researchers on data management allows the institution to be confident its research retains value and safeguards against reputational damage.

UK research councils are placing increasing emphasis on the importance of publicly-funded research being shared in a timely way. To ensure that grant applicants are able to meet this requirement, a data management plan (DMP) is now required for grant applications by most research councils.

Meeting data management requirements

Data management poses a number of challenges to researchers and to institutions. Data storage requires infrastructure with capacity we could hardly have anticipated a few decades ago, and with it come the associated costs of hardware, facilities and energy.
Ensuring that data collected has adequate metadata using suitable metadata standards (for example Data Documentation Initiative (DDI), widely used in social science) is another element that needs consideration if data is to be easy to find and share. Not all areas of research have adopted common metadata frameworks as yet, although in some academic areas these are already well established.

Then there are issues around security that need to be considered, including controlling access to data, protecting confidential data and ensuring data is backed up appropriately.

Different areas of research have different requirements in each of these areas. Some, such as astrophysics, require far greater storage capacity and computational power, while others, such as health, need to consider patient confidentiality and often require far higher levels of data security.

What standards are required for data management?

Research funding councils do not stipulate specific standards for data management at this stage. They do require that it meet generally accepted standards and follow best practice.

Relating to this, it seems unlikely that funders will have the resources to check that researchers are complying with their DMPs. However, researchers who do not follow best practice risk significant damage to their reputations, and institutions risk losing valuable assets. It is in everyone’s best interests that good data management systems and standards are supported.

There are certified standards for data management and data security that could be used in some areas of research. Although externally audited information security management systems is not currently a requirement, internationally recognized certification would provide independent assurance of a high level of risk management and data security.

Making DMPs more than just a box-ticking exercise

DMPonline

To support researchers in data management planning, the Digital Curation Centre (DCC) has developed DMPonline, a web-based tool for creating DMPs. The tool was developed to incorporate the data management requirements of all the UK research councils. By mapping each council’s requirements to its 118 questions DMPonline allows researchers to create tailored DMPs.

Developed on an open source platform, DMPonline has the potential to integrate with a number of different systems. It also has the capabilities for research councils to manage their specific guidance for each question.

Bridging the gap between researchers and local data services

Data management is not a task that can be undertaken by researchers in isolation from service managers and data resource planners. For data management to be effective it requires everyone to work together through every phase of data management planning, and this is where well thought through DMPs would be invaluable.
For researchers in large institutions, finding the right person to speak to about data storage and processing can be time-consuming and confusing. Having clear guidance on who to contact could save time and streamline the production of DMPs.

Researchers also need to understand what the best practice for managing and securing their particular type of data is. The resilience and accessibility of data has implications on data storage costs. For example, could your data be backed up to tape rather than to another server? Retrieving it from tape in the event of system failure would take a couple of days, but the cost of the storage would be significantly less.

DMP tools like DMPonline have the potential to be used by institutions as a tool in bridging the gap between researchers and data services. They could either be hosted by the institution, or hosted by DMPonline and customized to incorporate the institution’s administrative requirements as well as those of particular funders. The contact details for the person to contact on different aspects of the data management planning could then be provided as part of the relevant guidance notes associated with specific questions. Additional questions around capacity and timings could be incorporated, allowing IT departments to better manage their resources.

There is also the potential for institution- or department-based systems to share information with DMPonline .

Data preservation: what’s involved?

How data is archived in an immutable form is the final phase in the data life-cycle. Most funders require that data be preserved for between five to ten years beyond the time-frame of the project, although some require longer. However, there is no specific guidance for this.

To facilitate sharing and re-use, data need to be in a format that won’t degrade and they need to be deposited in data repositories that are relatively easy to search and retrieve files from. This requires agreement on common metadata standards and the inclusion of the metadata in catalogues that other researchers can find easily.

A number of leading research publications now also require details on data management for published articles and for data to be available for scrutiny for a certain time after the publication date.

Although the cost of archiving data is far less than that of storing ‘live’ data, how this preservation is funded is under debate. Some councils do expect to see costs for data preservation included in the grant application, and some have their own data depositories, while other funders see data preservation as the responsibility of the institution.

Data management at UCL

At UCL initial estimates from the newly formed Research Data Project are that the institution will require 2 petabytes (1 petabyte=1,000 terabytes) for collecting and processing ‘live’ data, and another 2 petabytes for archiving research data at the end 2014. However, this will not comprise all the institution’s data, much of which is currently held on departmental networks. Exactly how the system will be funded in the long-term is still being considered.
UCL generates vast quantities of research data, but researchers are often not clear on where their data needs to be stored or who to talk to about DMPs. Besides upgrading Legion (UCL’s current platform for computationally intensive research), there are a number of initiatives underway to investigate options for improving data management and to cope with the volume of data that is being produced.

The Research Data Project is in the early stages of considering how to increase the institution’s capacity for the storage of live data and for archiving data. Many departments do not have the funds to support this themselves. The Economic and Social Research Council (ESRC) now also requires all institutions to have a road-map of how they will be supporting data management in the future and UCL is developing a Research Data Policy which defines standards and identifies roles and responsibilities in the creation, storage and archiving of research data at UCL.

On a departmental level, epiLab-SS (a secure computing service at the UCL MRC Centre of Epidemiology for Child Health), is looking at how data management planning and information security can be integrated, and creating a single secure system for data collection and management. epiLab-SS, is currently entering the second stage in its bid to become an ISO27001:2005 Certified Data Centre (certification awarded Sept 2012) and the JISC-funded DMP-SS (Secure Service) project is looking at using DDIv3 as a metadata standard that could be used for marking-up the entire data lifecycle for epidemiology and public health research, from planning through collection to archiving. The information regarding data management could then be shared with DMPonline to generate DMPs.

Looking at data management challenges beyond the institution, UCL’s Medical School is part of a bid to set up an MRC e-Health Informatics Research Centre which will look at how NHS and research data can be integrated to support research (awarded in July 2012 and due to launch 1st May 2013).

In conclusion

Managing data is a complex undertaking. It would seem that infrastructure, costs and responsibilities for data preservation are issues that will take some time to resolve within institutions, especially those as large as UCL.

However, DMPs have the potential to support better resource planning and project costing if they become more than just a box-ticking exercise. For this to happen institutions need to put in place systems that encourage communications between researchers and data services and ensure that researchers understand the importance of data security and what this entails.


This report is based on the presentations and discussions around data management planning and public health research held at the Institute of Child Health on 18 July 2012. Organiser: Dr Tito Castillo, Senior Information Systems Consultant, MRC Centre of Epidemiology for Child Health

Data Management Planning Workshop – Faculty of Population Health Sciences

By F D ( Tito ) Castillo, on 1 May 2012

Date: Wednesday 18th July 2012

Time: 1pm – 4pm

Venue: Leolin Price Lecture Theatre, Institute of Child Health

Public health research data takes time, effort and considerable resources to collect yet the associated data management practices can vary considerably and often datasets are misplaced or difficult to reuse. Management of public health data must also consider potentially complex legislative and ethical dynamics that demand effective information governance at all stages in the data life-cycle.

Recent initiatives in the medical and related sciences and across funding organisations (National Science Foundation, National Institute of Health, MRC, BBSRC, Cancer Research UK, Wellcome Trust, and ESRC) have highlighted the need for researchers and support staff to demonstrate both the capability and intention to manage their public health research datasets effectively. As an example, all MRC grant applicants are now expected to submit a Data Management Plan (DMP) as part of their funding proposals. Recently the Digital Curation Centre (DCC) has proposed an adaptable checklist of key questions to be addressed by researchers at different stages in this life-cycle.

The DMP-SS project, based at UCL, seeks to explore the application of this checklist through the application of relevant information standards. The project addresses both the structural and procedural aspects of research data management planning.

The intention of this workshop is to:

  1. present both DMPOnline (a tool developed by the DCC) and the MRC’s latest requirements on data management planning;
  2. describe on-going initiatives within UCL, including the DMP-SS project, and how they relate to population health research;
  3. seek to identify key priorities to develop these UCL initiatives further.

Agenda

13:00

Introduction

13:05

MRC Data management plans Veerle van den Eynden, MRC Data Support Service Project Manager

13:20

DCC DMPOnline Martin Donnolly, Digital Curation Centre

13:40

DMP-SS Tito Castillo, MRC Centre of Epidemiology for Child Health

14:00

Group introduction and discussion

14:20

 Coffee break

14:40

UCL research data Max Wilkinson, Head of Research Data Services, UCL

15:00

Platform Technologies Jacky Pallas, Platform Technologies, UCL

15:10

MRC e-Health bid Spiros Denaxas, CALIBER Project, UCL

15:20

LSHTM perspective Frieda Midgely & Gareth Knight, LSHTM

15:40

Group discussion – priorities and challenges

 

For registration goto: http://dmpss.eventbrite.com/

Data Management Planning Workshop at ICH

By F D ( Tito ) Castillo, on 23 November 2011

Summary

On Friday 18th November we held a workshop that brought together all interested parties in the JISC Enhancing DMPonline Projects funding stream. The day focused on developing a mutual understanding of the potential challenges and opportunities for extension of the Digital Curation Centre’s DMPOnline tool.

Present

DMP-SS Team: (UCL Institute of Child Health & MRC Unit for Lifelong health & Ageing)

  • Tito Castillo – Principal Investigator, MRC Centre of Epidemiology for Child Health
  • Stelios Alexandrakis – Project manager and Lead Developer, MRC Centre of Epidemiology for Child Health
  • Kevin Garwood – Software Developer
  • Michael Waters, Research & Innovations

Digital Curation Centre (DMPOnline)

  • Martin Donnelly – Project Manager
  • Adrian Richardson – Lead Software Developer

Oxford University (Oxford DMPOnline project)

  • David Shotton – Principal Investigator
  • Richard Jones – Cottage Labs

Observers

  • Arofan Gregory – Metadata Technology & Open Data Foundation
  • Veerle Van den Eynden – MRC Data Support Service Project Manager Medical Research Council
  • Jonathan Tedds, Senior Research Liaison Manager, University of Leicester
  • Simon Hodson, JISC Programm Manager

Issues

After a thought provoking discussion we were left with a few key messages to consider and act upon:

  1. Is the existing DCC centralised hosting model for Data Management Plans going to be widely acceptable amongst the wider research community? The existing DCC model involves the management of a centralised repository of data management plans. It was suggested in the meeting that this may prove to be unacceptable to many researchers and we will need to examine the possibilities of federated models being used.
  2. Can we express the current DMPOnline data model in a form that is suitable for encorporation into DDI? Arofan Gregory agreed to lead an initiative to develop a more formal modeol of the current DCC checklist that may be used in future to inform the further development of the DDI ‘life-cycle’ standard. Arofan’s clear view was that there are siginificant elements of the existing DDI model that do not explicitly support the DMP concepts although it would be highly desirable to consider their inclusion in a later version of DDI.
  3. What is the possible relationship between the IRAS (Integrated Research Application System) and DMP and would there be oportunities to interoperate? The IRAS system is used for submission of a range of approval applications and appears to make use of some form of common interchange standard. The question was askes as to whether thie DMP process could support the IRAS system in some way and how widespread is the use of IRAS at the moment.
  4. Is it possible for the DMP-SS project team to story board the proposed interaction between a DMP registry and their proposed Information Security management System? The relationship between the proposed ISMS and DMP registry is still unclear and it is important to begin to describe the proposed information flows and use cases in support of the ongoing development. Stelios Alexandrakis agreed that he would begin this work and Tito castillo agreed to share the current mapping that has been carried out between the DMP checklist and ISO-27001 controls.
  5. DDI ‘life-cycle’ training opportunity. Arofan Gregory noted that there is a possibility of DDI training being commissioned by UK Data Archive during 2012 however this has yet to be finalised. There was widespread interest among the group to attend suc a course if and when it takes place.

Agenda

10.00 Welcome and introductions
10.20 The background to the JISC programme (Simon Hodson)
10.40 DMPOnline history and strategic plan (Martin Donnely)
11.10 Break
11.30 Introduction to the DDI Standard and associated tools (Arofan Gregory)
12:15 Discussion
12:30 Lunch
13:20 DMP-SS Project discussion (led by Tito Castillo)
14:20 Break
14:30 Oxford DMPOnline discussion (led by David Shotton)
15:30 Opportunities for collaboration and enhancement
16:00 Close