X Close

Data Management Planning for Secure Services (DMP-SS)

Home

Just another Blogs.ucl.ac.uk site

Menu

Data management planning – Report on 18th July 2012 Workshop

By F D ( Tito ) Castillo, on 14 January 2013

Data management planning

As a research community, we all appreciate that our research data are important assets. A great deal of time and money is dedicated to collecting and processing them. But their value beyond the initial research lies in the ease with which they can be shared and re-used to support further research. Researchers and the institutions they are part of need to plan for the complete data life-cycle – from data collection through to archiving – to facilitate this and ensure their data can realize their full potential.

Data management plans: why do we need them?

For researchers, data management planning should be an integral part of their research planning, allowing research projects to be more accurately costed and resourced.

For institutions, having standard data management processes and clear guidance for researchers on data management allows the institution to be confident its research retains value and safeguards against reputational damage.

UK research councils are placing increasing emphasis on the importance of publicly-funded research being shared in a timely way. To ensure that grant applicants are able to meet this requirement, a data management plan (DMP) is now required for grant applications by most research councils.

Meeting data management requirements

Data management poses a number of challenges to researchers and to institutions. Data storage requires infrastructure with capacity we could hardly have anticipated a few decades ago, and with it come the associated costs of hardware, facilities and energy.
Ensuring that data collected has adequate metadata using suitable metadata standards (for example Data Documentation Initiative (DDI), widely used in social science) is another element that needs consideration if data is to be easy to find and share. Not all areas of research have adopted common metadata frameworks as yet, although in some academic areas these are already well established.

Then there are issues around security that need to be considered, including controlling access to data, protecting confidential data and ensuring data is backed up appropriately.

Different areas of research have different requirements in each of these areas. Some, such as astrophysics, require far greater storage capacity and computational power, while others, such as health, need to consider patient confidentiality and often require far higher levels of data security.

What standards are required for data management?

Research funding councils do not stipulate specific standards for data management at this stage. They do require that it meet generally accepted standards and follow best practice.

Relating to this, it seems unlikely that funders will have the resources to check that researchers are complying with their DMPs. However, researchers who do not follow best practice risk significant damage to their reputations, and institutions risk losing valuable assets. It is in everyone’s best interests that good data management systems and standards are supported.

There are certified standards for data management and data security that could be used in some areas of research. Although externally audited information security management systems is not currently a requirement, internationally recognized certification would provide independent assurance of a high level of risk management and data security.

Making DMPs more than just a box-ticking exercise

DMPonline

To support researchers in data management planning, the Digital Curation Centre (DCC) has developed DMPonline, a web-based tool for creating DMPs. The tool was developed to incorporate the data management requirements of all the UK research councils. By mapping each council’s requirements to its 118 questions DMPonline allows researchers to create tailored DMPs.

Developed on an open source platform, DMPonline has the potential to integrate with a number of different systems. It also has the capabilities for research councils to manage their specific guidance for each question.

Bridging the gap between researchers and local data services

Data management is not a task that can be undertaken by researchers in isolation from service managers and data resource planners. For data management to be effective it requires everyone to work together through every phase of data management planning, and this is where well thought through DMPs would be invaluable.
For researchers in large institutions, finding the right person to speak to about data storage and processing can be time-consuming and confusing. Having clear guidance on who to contact could save time and streamline the production of DMPs.

Researchers also need to understand what the best practice for managing and securing their particular type of data is. The resilience and accessibility of data has implications on data storage costs. For example, could your data be backed up to tape rather than to another server? Retrieving it from tape in the event of system failure would take a couple of days, but the cost of the storage would be significantly less.

DMP tools like DMPonline have the potential to be used by institutions as a tool in bridging the gap between researchers and data services. They could either be hosted by the institution, or hosted by DMPonline and customized to incorporate the institution’s administrative requirements as well as those of particular funders. The contact details for the person to contact on different aspects of the data management planning could then be provided as part of the relevant guidance notes associated with specific questions. Additional questions around capacity and timings could be incorporated, allowing IT departments to better manage their resources.

There is also the potential for institution- or department-based systems to share information with DMPonline .

Data preservation: what’s involved?

How data is archived in an immutable form is the final phase in the data life-cycle. Most funders require that data be preserved for between five to ten years beyond the time-frame of the project, although some require longer. However, there is no specific guidance for this.

To facilitate sharing and re-use, data need to be in a format that won’t degrade and they need to be deposited in data repositories that are relatively easy to search and retrieve files from. This requires agreement on common metadata standards and the inclusion of the metadata in catalogues that other researchers can find easily.

A number of leading research publications now also require details on data management for published articles and for data to be available for scrutiny for a certain time after the publication date.

Although the cost of archiving data is far less than that of storing ‘live’ data, how this preservation is funded is under debate. Some councils do expect to see costs for data preservation included in the grant application, and some have their own data depositories, while other funders see data preservation as the responsibility of the institution.

Data management at UCL

At UCL initial estimates from the newly formed Research Data Project are that the institution will require 2 petabytes (1 petabyte=1,000 terabytes) for collecting and processing ‘live’ data, and another 2 petabytes for archiving research data at the end 2014. However, this will not comprise all the institution’s data, much of which is currently held on departmental networks. Exactly how the system will be funded in the long-term is still being considered.
UCL generates vast quantities of research data, but researchers are often not clear on where their data needs to be stored or who to talk to about DMPs. Besides upgrading Legion (UCL’s current platform for computationally intensive research), there are a number of initiatives underway to investigate options for improving data management and to cope with the volume of data that is being produced.

The Research Data Project is in the early stages of considering how to increase the institution’s capacity for the storage of live data and for archiving data. Many departments do not have the funds to support this themselves. The Economic and Social Research Council (ESRC) now also requires all institutions to have a road-map of how they will be supporting data management in the future and UCL is developing a Research Data Policy which defines standards and identifies roles and responsibilities in the creation, storage and archiving of research data at UCL.

On a departmental level, epiLab-SS (a secure computing service at the UCL MRC Centre of Epidemiology for Child Health), is looking at how data management planning and information security can be integrated, and creating a single secure system for data collection and management. epiLab-SS, is currently entering the second stage in its bid to become an ISO27001:2005 Certified Data Centre (certification awarded Sept 2012) and the JISC-funded DMP-SS (Secure Service) project is looking at using DDIv3 as a metadata standard that could be used for marking-up the entire data lifecycle for epidemiology and public health research, from planning through collection to archiving. The information regarding data management could then be shared with DMPonline to generate DMPs.

Looking at data management challenges beyond the institution, UCL’s Medical School is part of a bid to set up an MRC e-Health Informatics Research Centre which will look at how NHS and research data can be integrated to support research (awarded in July 2012 and due to launch 1st May 2013).

In conclusion

Managing data is a complex undertaking. It would seem that infrastructure, costs and responsibilities for data preservation are issues that will take some time to resolve within institutions, especially those as large as UCL.

However, DMPs have the potential to support better resource planning and project costing if they become more than just a box-ticking exercise. For this to happen institutions need to put in place systems that encourage communications between researchers and data services and ensure that researchers understand the importance of data security and what this entails.


This report is based on the presentations and discussions around data management planning and public health research held at the Institute of Child Health on 18 July 2012. Organiser: Dr Tito Castillo, Senior Information Systems Consultant, MRC Centre of Epidemiology for Child Health