X Close

Open@UCL Blog

Home

Menu

Archive for the 'Exceptions' Category

Copyright and Text & Data mining – what do I need to know?

By Kirsty, on 6 July 2021

Text and Data Mining (TDM) is a broad term used to cover any advanced techniques for computer-based analysis of large quantities of data of all kinds (numbers, text, images etc). It is a crucial tool in many areas of research, including notably Artificial Intelligence (AI). TDM can be used to reveal significant new facts, relationships and insights from the detailed analysis of vast amounts of data in ways which were not previously possible. An example would be mining medical research literature to investigate the underlying causes of health issues and the efficacy of treatments.

The importance of having copyright exceptions in place to facilitate TDM arises from the fact that the swathes of material which need to be mined are often protected by copyright. That would be true for example of “literary works” of all kinds and of images in many cases. It is frequently the case that researchers will have lawful access to the material but will be prevented from applying TDM techniques because copying the material onto the required computer platform risks legal action for infringement on the part of the copyright owners. “Copying” is of course one of the acts restricted by copyright law and in general the greater the amount and variety of material, the greater the copyright risk.

It is worth remembering that when the Government created an exception for Text and Data Mining in 2014, it meant that the UK was ahead of the game. Other countries did not generally have an exception in their legislation at that time. Since then, other jurisdictions have caught up and, in some cases overtaken the UK. Cutting edge research is a highly competitive area and researchers working in a country which benefits from a generous TDM exception will have a distinct advantage.

The existing exception is still significant from the Open Science perspective in enabling research projects where computer analysis of large quantities of copyright-protected material is required, particularly in the context of AI.

Let’s take a closer look at the UK TDM exception and what it allows us to do, before comparing it briefly with the more recent EU exceptions. The UK exception is to be found in Section 29A of the Copyright, Designs and Patents Act 1988.

What does the exception allow us to do?

Copying copyright-protected works in order to carry out “text and data analysis” (“computational analysis” in the wording of the exception). The need to copy arises because researchers must have have the material to be analysed on a specific platform, to carry out the analysis. The need for the exception then arises because without it, the researcher would require permission from the owner of copyright in each item. Without permission (or an exception), the researchers would be infringing copyright by copying a vast swathe of protected material. That in turn would often make the research impractical to carry out.

Who may do this?

Absolutely anyone, the exception says “a person.” This is wonderfully broad and one of the more favourable aspects of the UK exception. For example you don’t need to be working for/ studying at a particular type of institution to benefit from the exception.

Are there conditions?

You must have lawful access to the material. A prime example would be the text of academic journals. We have lawful access to large numbers of e-journals because UCL Library subscribes to them. The exception would allow a UCL researcher to download large amounts of content from e-journals to carry out detailed analysis using specialised tools. It is important to note that the exception cannot be overridden by contract terms. It follows that a term in an e-journal contract seeking to prevent TDM would have no force, in circumstances where the exception applies. This makes the exception a much more useful tool than it would otherwise be.

As you might expect the copies made for TDM purposes may not be used for other purposes, shared etc under the exception.

Significantly, the analysis must be “…for the sole purpose of research for a non commercial purpose.” This is a major restriction, which would rule out many situations where TDM might be used, for example research by a pharmaceutical company developing new drugs which will be marketed commercially. A major issue with the exception is that it can be unclear at what point “non-commercial” shades into “commercial.” A project which starts out as academic research may take on commercial significance down the line and a piece of research with no commercial aspects may be funded by commercial sponsors. It is an important constraint in the legislation which can also be difficult to be sure about in real life situations. It can stand in the way of joint projects by HEIs and commercial organisations.

Still, in situations where we can claim there is no commercial aspect to the research, the exception is potentially very useful. In addition to material which is already digital it can cover projects where digitisation of copyright- protected print material is required to be analysed. It can be very useful in situations where the copyright status of the source material is unclear, since provided the exception applies, there is no need to investigate further the complexities of copyright in the material.

The new EU TDM exception or rather exceptions

The EU Directive on Copyright in the Digital Single Market (DSM Directive) offers two new exceptions, which EM member states are obliged to transpose. They can be found in Articles 3 and 4 of the Directive.

There are important differences of approach to the UK in the answer to the question:  who may carry out the TDM? Article 3 provides an exception which benefits two defined categories of organisations: “Research organisations” and “Cultural heritage organisations.” Included within those groups are for example universities, museums, publicly funded libraries. Commercial organisations are excluded. It seems that independent researchers, not associated with an organisation would also be excluded, even though their research might be “non-commercial.” In common with the UK legislation, this exception cannot be overridden by contract terms and is therefore a powerful tool. The Directive addresses the question of public-private research collaborations in the recitals to the directive, e.g. recital 11. They are not excluded from benefitting from the Article 3 exception.

Article 4 offers a separate TDM exception which is available to anyone (including commercial organisations) but which is limited in a specific way: If the rights owners explicitly reserve the rights to carry out TDM within their works, then it cannot be mined under the exception. In other words, the EU DSM Directive goes one step further than the UK by offering an exception which can be used to mine lawfully accessible works by commercial organisations (or by anyone else), but it does not apply if the rights owner has explicitly ruled out TDM.  By contrast, commercial organisations would not be able to use the UK exception, unless they can claim the specific research is for a non-commercial purpose.

Guest post by Chris Holland, UCL Copyright Support Officer. For more information or advice contact: copyright@ucl.ac.uk

REF submission guidance: what it means for open access

By Catherine Sharp, on 26 July 2018

The new draft REF submission guidance includes two sections on open access: paragraphs 107-116 on the intent of the REF open access policy, and paragraphs 213-245 on the detail of the requirements. For the most part, the new provisions restate the requirements that will by now be familiar to all academics. One or two changes and adjustments may be helpful for UCL authors, though.

1. A key change that will affect UCL’s REF submission as a whole is that a small percentage – 5% – of the total number of articles and conference papers that an institution submits may be non-compliant. This is very welcome, because it will allow us to submit some older outputs that were accepted before authors were used to the open access requirements. In due course, UCL will introduce guidelines for submitting non-compliant outputs, but these will be selected very carefully. Note that this provision should be treated with caution, and authors should not rely on it for any existing or new papers.

More widely, the new guidance emphasises that the environment section will allow units of assessment to demonstrate where they have gone beyond the REF requirements. This is one of the reasons that UCL’s monthly compliance reports to departments include all articles and conference papers.  Academics should continue to upload all papers to RPS within 3 months of first online publication (ideally within 3 months of acceptance), regardless of whether they will be submitted to the REF.

2. There is a second important change for users of arXiv and other preprint services. Where a paper has been uploaded to a preprint service, and the version in the preprint service is the same as the accepted manuscript, and it was uploaded to the preprint service before it was published online, the paper complies with the open access policy. This is particularly good news for users of arXiv, but it does not mean that all papers in arXiv comply with the open access policy.

UCL’s Open Access Team already assesses papers in arXiv and uploads them to RPS (to demonstrate compliance with the policy) where they are Gold open access, where the arXiv version is identified as the accepted manuscript, or where the publisher allows the published version to be used in RPS/UCL Discovery. This additional provision means that if authors confirm that the version in arXiv is the same as the accepted manuscript/published version, the paper can be marked as compliant in RPS. If this applies to any of your papers, please contact the Open Access Team (open-access@ucl.ac.uk).

3. The guidance restates the exceptions to the policy, including the exception where a new member of staff uploaded their manuscript to their previous institution’s repository. UCL’s advice remains the same: where this exception applies, academics should contact the Open Access Team (open-access@ucl.ac.uk) so that we can check whether the paper complies and record the exception if not.

4. There is no change to the timing requirements. The strict requirement is that papers are uploaded to RPS within three months of acceptance (defined as the “’firm’ accepted date”), but there is an exception (which UCL’s Open Access Team will apply) for papers that do not meet this deadline, but that are uploaded within 3 months of first online publication (the “’early online’ date”). In practice, if a record does not appear in RPS within one month of first online publication, authors are advised to create a manual record, and to upload their manuscript to it. There is a guide to creating manual records on our webpages.

Please contact the Open Access Team for more information.