By F D ( Tito ) Castillo, on 17 July 2013
The Data Management Plan (DMP) editor has just been released byMetadata Technology on their OpenMetadata portal.
More content will be added soon, but for the time being the page has linkas to download the editor application, the source code repository and a developer’s guide.
By F D ( Tito ) Castillo, on 21 May 2013
Following our earlier post that epilab-SS service now meets the NHS criteria for information security and governance (Level 2). We can confirm that AIMES Grid Services CIC Ltd, the data centre provider for epiLab-SS, have recently been notified that their submission to NHS Information Governance Toolkit team has been reviewed and found to meet their requirements. This means that, in addition to their pre-existing ISO-27001 certification and G-Cloud Assured Services, AIMES now also meets the NHS criteria for information security and governance (Level 2).
AIMES status can be viewed here
epilab-SS status can be found here.
This will add to the dual certification (cloud/institution) model of information security assurance that we have been collaborating on and we look forward to improving it even further during future projects.
By F D ( Tito ) Castillo, on 3 May 2013
The following report has been written by Arofan Gregory (Metadata Technology) as a result of his companies engagement in the DMP-SS project. It provides a helpful position paper on the potential role of the Data Documentation Initiative standard (DDI) for the representation of research data management plans.
By F D ( Tito ) Castillo, on 23 April 2013
The epiLab-SS secure service has recently been notified that it has successfully achieved “Level 2” compliance to the NHS information governance toolkit. This toolkit, based on the ISO-27001 information security standard, is a standardised assurance process that is mandated for all NHS organisations. Universities and other academic research groups have recently been required to adopt the toolkit to address aspects of personal information handling, in particular where access to unconsented identifiable datasets. More information on the epiLab-SS compliance can be found at the following link.
By F D ( Tito ) Castillo, on 26 March 2013
The value of information
Academic research involves the collection and management of information from disparate sources to build upon or refine a body of knowledge. Although research in itself should have some intrinsic value to society, the costs of the associated activities can also be considerable. These cost are not merely financial since they also involve time and effort as well as potential ethical compromises “for the greater good”, as in animal experimentation or placebo-controlled trials.
Since research is costly then it follows that the component parts that are derived from or support this activity must be of value. In everyday life most people understand the need to protect valuables and typically carry out their own personal risk assessment to determine how to secure their own possessions; in many cases locking doors, shredding papers or employing trusted third-party services. Generally, this is done without consciously thinking about the process, adopting societal norms (or ‘standards’) in respect of most security-related decisions.
Organizations are not individuals, and cannot carry out this instinctive risk assessment without a helping hand from some man-made constructs. James Reason, in his 2000 BMJ article “Human error: models and management” elegantly describes the need to apply a System Approach, based on the assumption of the inevitability of human error and the need to adapt the conditions within which humans work rather than embarking on futile attempts to change the human condition.
“When an adverse event occurs, the important issue is not who blundered, but how and why the defences failed.” Reason, J (2000)
Just as Reason’s paper has fundamentally affected our approach to risk management in UK healthcare, so it should also highlight the wider issues in relation to the risk of information security incidents in all aspects of the research data life-cycle. It clearly articulates the rationale for well understood standards to support information security, what would commonly referred to as an Information Security Management System (ISMS).
A standard for Information Security (ISO-27001)
Although it is perfectly reasonable to attempt to implement an ISMS without reference to existing standards it is highly desirable to do so. A standard provides a well-established framework drawn from past experience (and mistakes) of others. More importantly, Standards offer reference points against which systems may be benchmarked and audited. Although it is not possible to measure security, it is possible to measure conformance to a prescribed standard. By adopting a suitable information security standard and being audited successfully against this it is possible to assure others that appropriate controls with associated governance are in place within an organization.
The internationally recognized information security standard is called ISO-27001 and forms the ‘requirements’ of an ISMS. Each of these requirements specifies things that ‘shall’ be done. ISO-27002 is the associated Code of Practice for information security management  which describes what ‘should’ be done to implement the standard. The subtle distinction is that this second document simply provides recommendations for implementation of an ISO-27001 compliant ISMS.
ISO-27001 provides a taxonomy of 138 security controls plus an introductory clause introducing risk assessment and treatment. Each of the security categories contains one or more controls that are designed to meet the control objective. The controls that are described within the standard are not an exhaustive list and, depending on the results of risk assessment, not all controls will be required for a given ISMS.
Properties of an ISO-27001 ISMS
Any meaningful discussion of information security must begin with a simple question: ‘what are we seeking to secure?’ Although this may seem to be a trivial statement it is actually of fundamental importance in that the scope of the system must be defined, in other words the boundaries must be clearly described for the information to be secured.
The development of an ISMS that complies with the complete ISO-27001 standard is a major challenge for any organization and success depends clearly defining the scope of such a system; too small and the process is rarely cost-effective but too large and it may be unachievable. In practice, an initial high-level risk assessment and cost benefit analysis should help to identify the appropriate focus for such a system.
The cornerstone of an ISMS is effective risk assessment. Risk assessments are difficult to carry out and there is no silver bullet. The key point is that risk assessment is part of an on-going process of continuous improvement. In basic terms there are a series of steps that need to be followed.
- Identify the information assets that need to be protected.
- Identify any vulnerabilities that relate to these assets
- Identify threats that need to be guarded against.
- Estimate the likelihood of threats exploiting vulnerabilities (otherwise known as risks)
To be systematic you need to define a threshold level of ‘acceptable risk’ above which additional controls will be required.
The ISO 27000 series documents provide a taxonomy of 138 control that are appropriate along with guidance on their implementation. A key facet of all controls is that they need to be owned by someone (i.e. a responsible party or organization) and it should be possible to define means by which the effectiveness of each control may be assured and audited. The list of 138 controls is not intended to be exhaustive and it’s important to consider additional controls, if required, that are not explicitly referred to in the standard.
Statement of Applicability (SoA)
ISO-27001 prescribes the creation of a summary document that itemizes all of the 138 controls plus any additional controls and clearly states whether each control has been selected with reference to where evidence of the control can be found. Where controls have not been selected there should be clearly stated reasons for this. The SoA acts as a summary reference document that, taken in conjunction with the Scope Statement, should provide an auditor with a high-level view of an ISMS.
Like many similar management systems, an ISMS is dynamic and should follow the plan-do-check-act cycle (also known as the Deming Cycle). Made popular by Dr W. Edwards Deming, the father of modern quality control, the approach involves a process of continuous improvement through multiple iterations. It is worth noting that other management system standards, like ISO-9001, apply similar cyclical process models, and a suitably-designed ISMS should be able to accommodate many of the requirements of these other systems.
The standard outlines the requirements of each of these four steps in the cycle within concisely within just four pages before going on to provide requirements for:
- Documentation (including document and record control)
- Management responsibility in respect of their own commitment, provision of resources and programmes of training and awareness.
- Internal audit
- Management review
- Continuous Improvement, including corrective and preventive action
In practice, the dynamic aspect of the management of an ISMS is often the most difficult part to get right but this is where the iterative technique allows for successive improvement over time.
1. Reason J: Human error: models and management. BMJ 2000, 320(7237):768-770.
2. BSI: Information technology. Security techniques. Information security management systems. Requirements. In: BS ISO/IEC 27001:2005/BS 7799-2:2005. Edited by IST/33: BSI; 2005.
3. BSI: Information technology. Security techniques. Code of practice for information security management. In: BS ISO/IEC 27002:2005, BS 7799-1:2005,BS ISO/IEC 17799:2005. Edited by IST/33: BSI; 2005.
By F D ( Tito ) Castillo, on 14 January 2013
Data management planning
As a research community, we all appreciate that our research data are important assets. A great deal of time and money is dedicated to collecting and processing them. But their value beyond the initial research lies in the ease with which they can be shared and re-used to support further research. Researchers and the institutions they are part of need to plan for the complete data life-cycle – from data collection through to archiving – to facilitate this and ensure their data can realize their full potential.
Data management plans: why do we need them?
For researchers, data management planning should be an integral part of their research planning, allowing research projects to be more accurately costed and resourced.
For institutions, having standard data management processes and clear guidance for researchers on data management allows the institution to be confident its research retains value and safeguards against reputational damage.
UK research councils are placing increasing emphasis on the importance of publicly-funded research being shared in a timely way. To ensure that grant applicants are able to meet this requirement, a data management plan (DMP) is now required for grant applications by most research councils.
Meeting data management requirements
Data management poses a number of challenges to researchers and to institutions. Data storage requires infrastructure with capacity we could hardly have anticipated a few decades ago, and with it come the associated costs of hardware, facilities and energy.
Ensuring that data collected has adequate metadata using suitable metadata standards (for example Data Documentation Initiative (DDI), widely used in social science) is another element that needs consideration if data is to be easy to find and share. Not all areas of research have adopted common metadata frameworks as yet, although in some academic areas these are already well established.
Then there are issues around security that need to be considered, including controlling access to data, protecting confidential data and ensuring data is backed up appropriately.
Different areas of research have different requirements in each of these areas. Some, such as astrophysics, require far greater storage capacity and computational power, while others, such as health, need to consider patient confidentiality and often require far higher levels of data security.
What standards are required for data management?
Research funding councils do not stipulate specific standards for data management at this stage. They do require that it meet generally accepted standards and follow best practice.
Relating to this, it seems unlikely that funders will have the resources to check that researchers are complying with their DMPs. However, researchers who do not follow best practice risk significant damage to their reputations, and institutions risk losing valuable assets. It is in everyone’s best interests that good data management systems and standards are supported.
There are certified standards for data management and data security that could be used in some areas of research. Although externally audited information security management systems is not currently a requirement, internationally recognized certification would provide independent assurance of a high level of risk management and data security.
Making DMPs more than just a box-ticking exercise
To support researchers in data management planning, the Digital Curation Centre (DCC) has developed DMPonline, a web-based tool for creating DMPs. The tool was developed to incorporate the data management requirements of all the UK research councils. By mapping each council’s requirements to its 118 questions DMPonline allows researchers to create tailored DMPs.
Developed on an open source platform, DMPonline has the potential to integrate with a number of different systems. It also has the capabilities for research councils to manage their specific guidance for each question.
Bridging the gap between researchers and local data services
Data management is not a task that can be undertaken by researchers in isolation from service managers and data resource planners. For data management to be effective it requires everyone to work together through every phase of data management planning, and this is where well thought through DMPs would be invaluable.
For researchers in large institutions, finding the right person to speak to about data storage and processing can be time-consuming and confusing. Having clear guidance on who to contact could save time and streamline the production of DMPs.
Researchers also need to understand what the best practice for managing and securing their particular type of data is. The resilience and accessibility of data has implications on data storage costs. For example, could your data be backed up to tape rather than to another server? Retrieving it from tape in the event of system failure would take a couple of days, but the cost of the storage would be significantly less.
DMP tools like DMPonline have the potential to be used by institutions as a tool in bridging the gap between researchers and data services. They could either be hosted by the institution, or hosted by DMPonline and customized to incorporate the institution’s administrative requirements as well as those of particular funders. The contact details for the person to contact on different aspects of the data management planning could then be provided as part of the relevant guidance notes associated with specific questions. Additional questions around capacity and timings could be incorporated, allowing IT departments to better manage their resources.
There is also the potential for institution- or department-based systems to share information with DMPonline .
Data preservation: what’s involved?
How data is archived in an immutable form is the final phase in the data life-cycle. Most funders require that data be preserved for between five to ten years beyond the time-frame of the project, although some require longer. However, there is no specific guidance for this.
To facilitate sharing and re-use, data need to be in a format that won’t degrade and they need to be deposited in data repositories that are relatively easy to search and retrieve files from. This requires agreement on common metadata standards and the inclusion of the metadata in catalogues that other researchers can find easily.
A number of leading research publications now also require details on data management for published articles and for data to be available for scrutiny for a certain time after the publication date.
Although the cost of archiving data is far less than that of storing ‘live’ data, how this preservation is funded is under debate. Some councils do expect to see costs for data preservation included in the grant application, and some have their own data depositories, while other funders see data preservation as the responsibility of the institution.
Data management at UCL
At UCL initial estimates from the newly formed Research Data Project are that the institution will require 2 petabytes (1 petabyte=1,000 terabytes) for collecting and processing ‘live’ data, and another 2 petabytes for archiving research data at the end 2014. However, this will not comprise all the institution’s data, much of which is currently held on departmental networks. Exactly how the system will be funded in the long-term is still being considered.
UCL generates vast quantities of research data, but researchers are often not clear on where their data needs to be stored or who to talk to about DMPs. Besides upgrading Legion (UCL’s current platform for computationally intensive research), there are a number of initiatives underway to investigate options for improving data management and to cope with the volume of data that is being produced.
The Research Data Project is in the early stages of considering how to increase the institution’s capacity for the storage of live data and for archiving data. Many departments do not have the funds to support this themselves. The Economic and Social Research Council (ESRC) now also requires all institutions to have a road-map of how they will be supporting data management in the future and UCL is developing a Research Data Policy which defines standards and identifies roles and responsibilities in the creation, storage and archiving of research data at UCL.
On a departmental level, epiLab-SS (a secure computing service at the UCL MRC Centre of Epidemiology for Child Health), is looking at how data management planning and information security can be integrated, and creating a single secure system for data collection and management. epiLab-SS, is currently entering the second stage in its bid to become an ISO27001:2005 Certified Data Centre (certification awarded Sept 2012) and the JISC-funded DMP-SS (Secure Service) project is looking at using DDIv3 as a metadata standard that could be used for marking-up the entire data lifecycle for epidemiology and public health research, from planning through collection to archiving. The information regarding data management could then be shared with DMPonline to generate DMPs.
Looking at data management challenges beyond the institution, UCL’s Medical School is part of a bid to set up an MRC e-Health Informatics Research Centre which will look at how NHS and research data can be integrated to support research (awarded in July 2012 and due to launch 1st May 2013).
Managing data is a complex undertaking. It would seem that infrastructure, costs and responsibilities for data preservation are issues that will take some time to resolve within institutions, especially those as large as UCL.
However, DMPs have the potential to support better resource planning and project costing if they become more than just a box-ticking exercise. For this to happen institutions need to put in place systems that encourage communications between researchers and data services and ensure that researchers understand the importance of data security and what this entails.
This report is based on the presentations and discussions around data management planning and public health research held at the Institute of Child Health on 18 July 2012. Organiser: Dr Tito Castillo, Senior Information Systems Consultant, MRC Centre of Epidemiology for Child Health
By F D ( Tito ) Castillo, on 1 October 2012
On Friday 28th September 2012 the epiLab-SS secure research environment passed its Stage 2 assessment as meeting the requirements of the ISO-27001 standard for Information Security. The resulting certificate, due to be formally issued by LRQA within weeks, is the result of rigorous third-party audit of the epiLab-SS Information Security Management System (ISMS). The auditor followed up his initial (Stage 1) assessment of the structural elements of the ISMS to examine in more detail the dynamic functional elements of the system and its wider context within UCL, involving interviews with a range of senior management personnel.
A critically important element in the process involved the demonstration that the ISMS design had been adapted to meet the needs of the domain of epidemiology research, handling personal identifiable and sensitive data safely and securely. Our application of data management plans as a mechanism for assuring engagement of researchers with the ISMS has proved to be invaluable in this respect. These plans have allowed researchers to clearly enumerate all information assets and highlight concerns, vulnerabilities and legal obligations at key stages during their use of the service.
This achievement is highly significant since it demonstrates an effective and cost-efficient approach to provision of secure data handling services within an academic context and means that UCL has become one of the few academic institutions in the UK to provide independent assurance of information security provision for research datasets. We have been able to implement a secure private cloud-based service, using an accredited UK government G-Cloud data centre (AIMES Grid Services CIC Ltd) with end-to-end ISO-27001 certification.
By F D ( Tito ) Castillo, on 18 July 2012
On Friday 13th July 2012 the epiLab-SS secure service underwent a Stage 1 ISO27001:2005 audit by LRQA. The auditor examined the associated Information Security management System that has been developed in conjunction with our cloud-based service. The service is already hosted within a ISO27001 certified data centre (AIMES Grid Services CIC Ltd) offering thin-client access to virtual desktops. Our risk assessment identified the need to develop a formal ISMS in respect of information security practices for users of this service at UCL. This ISMS is an example of the use of data management plans to underpin the risk assessment and continual improvement process for information security and we have chosen to adopt the MRC Data Management Plan template as a standard approach for all registered research projects.
Although this is only the first of two stages of initial audit, the signs are looking good. We satisfied the auditor that our ISMS contained no major non-conformities and, as such, was suitable for progressing to a Stage 2 audit in late September 2012. A successful audit at Stage 2 then this will mean that the epilab-SS system will be certified as ISO27001 compliant, demonstrating an effective model for use of cloud-based secure services for research datasets that could be replicated in other university research units.
By F D ( Tito ) Castillo, on 1 May 2012
Date: Wednesday 18th July 2012
Time: 1pm – 4pm
Venue: Leolin Price Lecture Theatre, Institute of Child Health
Public health research data takes time, effort and considerable resources to collect yet the associated data management practices can vary considerably and often datasets are misplaced or difficult to reuse. Management of public health data must also consider potentially complex legislative and ethical dynamics that demand effective information governance at all stages in the data life-cycle.
Recent initiatives in the medical and related sciences and across funding organisations (National Science Foundation, National Institute of Health, MRC, BBSRC, Cancer Research UK, Wellcome Trust, and ESRC) have highlighted the need for researchers and support staff to demonstrate both the capability and intention to manage their public health research datasets effectively. As an example, all MRC grant applicants are now expected to submit a Data Management Plan (DMP) as part of their funding proposals. Recently the Digital Curation Centre (DCC) has proposed an adaptable checklist of key questions to be addressed by researchers at different stages in this life-cycle.
The DMP-SS project, based at UCL, seeks to explore the application of this checklist through the application of relevant information standards. The project addresses both the structural and procedural aspects of research data management planning.
The intention of this workshop is to:
- present both DMPOnline (a tool developed by the DCC) and the MRC’s latest requirements on data management planning;
- describe on-going initiatives within UCL, including the DMP-SS project, and how they relate to population health research;
- seek to identify key priorities to develop these UCL initiatives further.
|MRC Data management plans||Veerle van den Eynden, MRC Data Support Service Project Manager|
|DCC DMPOnline||Martin Donnolly, Digital Curation Centre|
|DMP-SS||Tito Castillo, MRC Centre of Epidemiology for Child Health|
|Group introduction and discussion
|UCL research data||Max Wilkinson, Head of Research Data Services, UCL|
|Platform Technologies||Jacky Pallas, Platform Technologies, UCL|
|MRC e-Health bid||Spiros Denaxas, CALIBER Project, UCL|
|LSHTM perspective||Frieda Midgely & Gareth Knight, LSHTM|
|Group discussion – priorities and challenges
For registration goto: http://dmpss.eventbrite.com/
By F D ( Tito ) Castillo, on 20 April 2012
Metadata Technology have been making good progress on the development of a generic metadata model for DMP that enables:
- Generic description of the structure of a wide range of data management plans
- Capture instances of plans based on the above structure
- Association of a plan instance with identifiable external resources (such as DDI study units or an ISO 27001 collections)
- Population of elements of a plan with snippets of information/text sourced from the external resources (i.e a survey abstract, a list of data files/variable, a backup policy, etc.)
- Association of concepts with plan elements which would enable describing similarities across plans (i.e. UK and US)
- Serialize plans in XML for storage, management, exchange, or publication purposes
Based on the above model, they are in the process of building a set of generic editing tools that have the ability to:
- Adjust to any data management plan structure (not only the UK version)
- Be used in stand alone mode (i.e. simple text editor)
- Integrate in specific private or public environment to populate plan components with metadata from external resources (DDI, ISO 27001, others).
Given that the core application is being build on the OpenMetadata Framework, it can at the same time serve DDI editors and comes with necessary features for storage, integration with web services, etc.
So the DMP-SS/DMPOnline management toolkit is being built as planned but at the same time this work will result in generic DMP editors that can be used for UK, US, or other management plans.
For a full description of the current information model see: DMP Data Model Report
For a review of the structures of various proposed research data management plans, including Digital Curation Centre (full and simplified) and the 16 that exist in the USA (13 from NSF and 1 from NIH) see: https://docs.google.com/spreadsheet/ccc?key=0AgxevpgZKzIudHE1MVZvYWYwWDRaenBkdnotVXBMWkE.