Copyright and Text & Data mining – what do I need to know?
By Kirsty, on 6 July 2021
Text and Data Mining (TDM) is a broad term used to cover any advanced techniques for computer-based analysis of large quantities of data of all kinds (numbers, text, images etc). It is a crucial tool in many areas of research, including notably Artificial Intelligence (AI). TDM can be used to reveal significant new facts, relationships and insights from the detailed analysis of vast amounts of data in ways which were not previously possible. An example would be mining medical research literature to investigate the underlying causes of health issues and the efficacy of treatments.
The importance of having copyright exceptions in place to facilitate TDM arises from the fact that the swathes of material which need to be mined are often protected by copyright. That would be true for example of “literary works” of all kinds and of images in many cases. It is frequently the case that researchers will have lawful access to the material but will be prevented from applying TDM techniques because copying the material onto the required computer platform risks legal action for infringement on the part of the copyright owners. “Copying” is of course one of the acts restricted by copyright law and in general the greater the amount and variety of material, the greater the copyright risk.
It is worth remembering that when the Government created an exception for Text and Data Mining in 2014, it meant that the UK was ahead of the game. Other countries did not generally have an exception in their legislation at that time. Since then, other jurisdictions have caught up and, in some cases overtaken the UK. Cutting edge research is a highly competitive area and researchers working in a country which benefits from a generous TDM exception will have a distinct advantage.
The existing exception is still significant from the Open Science perspective in enabling research projects where computer analysis of large quantities of copyright-protected material is required, particularly in the context of AI.
Let’s take a closer look at the UK TDM exception and what it allows us to do, before comparing it briefly with the more recent EU exceptions. The UK exception is to be found in Section 29A of the Copyright, Designs and Patents Act 1988.
What does the exception allow us to do?
Copying copyright-protected works in order to carry out “text and data analysis” (“computational analysis” in the wording of the exception). The need to copy arises because researchers must have have the material to be analysed on a specific platform, to carry out the analysis. The need for the exception then arises because without it, the researcher would require permission from the owner of copyright in each item. Without permission (or an exception), the researchers would be infringing copyright by copying a vast swathe of protected material. That in turn would often make the research impractical to carry out.
Who may do this?
Absolutely anyone, the exception says “a person.” This is wonderfully broad and one of the more favourable aspects of the UK exception. For example you don’t need to be working for/ studying at a particular type of institution to benefit from the exception.
Are there conditions?
You must have lawful access to the material. A prime example would be the text of academic journals. We have lawful access to large numbers of e-journals because UCL Library subscribes to them. The exception would allow a UCL researcher to download large amounts of content from e-journals to carry out detailed analysis using specialised tools. It is important to note that the exception cannot be overridden by contract terms. It follows that a term in an e-journal contract seeking to prevent TDM would have no force, in circumstances where the exception applies. This makes the exception a much more useful tool than it would otherwise be.
As you might expect the copies made for TDM purposes may not be used for other purposes, shared etc under the exception.
Significantly, the analysis must be “…for the sole purpose of research for a non commercial purpose.” This is a major restriction, which would rule out many situations where TDM might be used, for example research by a pharmaceutical company developing new drugs which will be marketed commercially. A major issue with the exception is that it can be unclear at what point “non-commercial” shades into “commercial.” A project which starts out as academic research may take on commercial significance down the line and a piece of research with no commercial aspects may be funded by commercial sponsors. It is an important constraint in the legislation which can also be difficult to be sure about in real life situations. It can stand in the way of joint projects by HEIs and commercial organisations.
Still, in situations where we can claim there is no commercial aspect to the research, the exception is potentially very useful. In addition to material which is already digital it can cover projects where digitisation of copyright- protected print material is required to be analysed. It can be very useful in situations where the copyright status of the source material is unclear, since provided the exception applies, there is no need to investigate further the complexities of copyright in the material.
The new EU TDM exception or rather exceptions
The EU Directive on Copyright in the Digital Single Market (DSM Directive) offers two new exceptions, which EM member states are obliged to transpose. They can be found in Articles 3 and 4 of the Directive.
There are important differences of approach to the UK in the answer to the question: who may carry out the TDM? Article 3 provides an exception which benefits two defined categories of organisations: “Research organisations” and “Cultural heritage organisations.” Included within those groups are for example universities, museums, publicly funded libraries. Commercial organisations are excluded. It seems that independent researchers, not associated with an organisation would also be excluded, even though their research might be “non-commercial.” In common with the UK legislation, this exception cannot be overridden by contract terms and is therefore a powerful tool. The Directive addresses the question of public-private research collaborations in the recitals to the directive, e.g. recital 11. They are not excluded from benefitting from the Article 3 exception.
Article 4 offers a separate TDM exception which is available to anyone (including commercial organisations) but which is limited in a specific way: If the rights owners explicitly reserve the rights to carry out TDM within their works, then it cannot be mined under the exception. In other words, the EU DSM Directive goes one step further than the UK by offering an exception which can be used to mine lawfully accessible works by commercial organisations (or by anyone else), but it does not apply if the rights owner has explicitly ruled out TDM. By contrast, commercial organisations would not be able to use the UK exception, unless they can claim the specific research is for a non-commercial purpose.
Guest post by Chris Holland, UCL Copyright Support Officer. For more information or advice contact: firstname.lastname@example.org