X Close

Open@UCL Blog

Home

Menu

Text and Data Mining (TDM) and Your Research: Copyright Implications and New Website Guidance

By Rafael, on 13 May 2024

This the second blog post of our collaborative series between the UCL Office for Open Science and Scholarship and the UCL Copyright team. Here, we continue our exploration of important aspects of copyright and its implications for open research and scholarship. In this instalment, we examine Text and Data Mining (TDM) and its impact on research along with the associated copyright considerations.

Data processing concept illustration

Image by storyset on Freepik.

The development of advanced computational tools and techniques for analysing large amounts of data has opened up new possibilities for researchers. Text and Data Mining (TDM) is a broad term referring to a range of ‘automated analytical techniques to analyse text and data for patterns, trends, and useful information’ (Intellectual Property Office definition). TDM has many applications in academic research across disciplines (Intellectual Property Office definition). TDM has many applications in academic research across disciplines.

In an academic context, the most common sources of data for TDM include journal articles, books, datasets, images, and websites. TDM involves accessing, analysing, and often reusing (parts of) these materials. As these materials are, by default, protected by copyright, there are limitations around what you can do as part of TDM. In the UK, you may rely on section 29A of the Copyright, Designs and Patents Act, a copyright exception for making copies for text and data analysis for non-commercial research. You must have lawful access to the materials (for example via a UCL subscription or via an open license). However, there are often technological barriers imposed by publishers preventing you from copying large amounts of materials for TDM purposes – measures that you must not try to circumvent. Understanding what you can do with copyright materials, what may be more problematic and where to get support if in doubt, should help you manage these barriers when you use TDM in your research.

The copyright support team works with e-resources, the Library Skills librarians, and the Office for Open Science and Scholarship to support the TDM activities of UCL staff and students. New guidance is available on the copyright website. TDM libguide and addresses questions that often arise during TDM, including:

  • Can you copy journal articles, books, images, and other materials? What conditions apply?
  • What do you need to consider when sharing the outcomes of a TDM analysis?
  • What do publishers and other suppliers of the TDM sources expect you to do?

To learn more about copyright (including how it applies to TDM):

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, LinkedIn, and join our mailing list to be part of the conversation!

 

 

How Creative Commons licences support open scholarship

By Kirsty, on 23 October 2023

Happy Open Access Week 2023!

This year’s theme is ‘community over commercialisation’. It is about adopting research and education practices that place priority on the interests of the public. In the context of scholarly communications, it is about making access to scholarly knowledge open and accessible to diverse communities, in transparent and sustainable ways.

For this to be achieved, the outcomes of academic research and education – research data, preprints, published articles, monographs, educational resources – must be open to access but also open to reuse: free access to an article, an online tutorial or a dataset has great benefits, but the potential for users of these materials to share them with others, adapt, add to and improve upon them is what makes innovation and creativity possible.

Creative Commons (CC licences) support open and reusable research by offering a standardised way in which authors can grant others certain permissions to reuse their works. In this post we highlight some key points about CC licences and discuss how they benefit both creators and users of copyright-protected materials.

What is Creative Commons?

Creative Commons (CC) is “an international non-profit organisation dedicated to helping build and sustain a thriving commons of shared knowledge and culture”. The organisation is active in supporting, educating and advocating for a more open culture; but it is most known for its licences.

How do Creative Commons licences work?

If you are the author of pretty much any creative work – a journal article, an image, a music composition, a website, a book – making your work available under a CC licence helps you:

  • As the copyright owner of the work, give ‘blanket’ permission to others to copy and share your work, while requiring that they attribute you as the author.
  • Decide what further uses you give blanket permission for. Do you allow others to make adaptations(e.g. to translate your book, adapt a teaching resource for a new audience, or change your artwork)? Do you allow others to reuse for a commercial purpose?
  • Decide if you would like to ensure that future adaptations of your work (if you are allowing them) are also made available under the same licence, keeping them as ‘open’ as yours.

Image attribution: Barbara Klute und Jöran Muuß-Merholz für wb-web unter CC BY-SA 3.0. The English version is a translation and enhancement by Jöran Muuß-Merholz under the same license., CC BY-SA 3.0, via Wikimedia Commons

The combination of these criteria: attribution (a requirement for all licences), allowing/not allowing derivatives, allowing/not allowing commercial reuse, and requiring/not requiring sharing under the same licence (‘share-alike) creates a set of six licences creators can choose from.

How do Creative Commons licences support open scholarship and the needs of different communities?

There are numerous examples of how CC licences help free up research and education. CC licences applied to open access articles, conference proceedings, monographs and other scholarly works make it possible for readers of these works around the world – who may include academic researchers, lecturers, students but also health practitioners, innovators, artists and the general public – to benefit from these works and potentially create something new and innovative as a result. CC licences applied to research data enables data to be shared and reused across different organisations and countries. CC licences applied to preprints, study preregistrations and theses ensure openness in research that is not yet formally published. In the same way, particularly through allowing adaptations, CC licences support the development and success of Open Educational Resources (OERs).

Beyond traditional scholarship, CC licences help open up cultural collections and offer opportunities for publishers and the creative industries to adopt new business models that serve their audiences better.

How can I learn more about Creative Commons?

Image attribution: adapted from Martin Missfeldt https://www.bildersuche.org/ CC BY-SA

If you have made it so far in this post, you may have further questions including how to apply a CC licence, how to discover and reuse CC materials, how CC licences work alongside copyright and how they can support commercially sensitive works. Here are a few things you can do:

  • Drop-in any time between 12 pm and 2 pm on Teams on Tuesday 24 October, to hear more, ask questions and tell us about your experiences with CC licences. You can join for just a few minutes to ask a question or stay for longer to become a CC expert. Register for the CC licences drop-in session.
  • Take our 5-question fun personality quiz to discover which licence you are. You may not learn anything new about yourself, but you will hopefully get even more familiar with how the range of CC licences can be applied in different situations. Your responses will be anonymous.

Open Access Week activities

By Kirsty, on 13 October 2023

Open Access Week is almost upon us!

Keep your eyes open for a series of blog posts on Creative Commons, citizen science, the recent activities of UCL Press and an exciting review of a year in open access.

This year’s theme is Community over Commercialisation. Creative Commons licences sit at the heart of this discussion. To this end, we invite you to a drop-in session on Tuesday the 24th of October to address questions around creating and using Creative Commons materials. The session is on Teams and you can join at any time. Bring along your questions or just join to discuss how CC supports equitable access to a wide range of works, from scholarly publications to open and FAIR data to images and music.

We have already announced our wonderful winners of the Open Science and Scholarship awards. UCL colleagues can also join us on Wednesday to celebrate and network with the winners, tickets are still available!

We will be posting and tweeting regularly throughout the week about the services and support available to researchers and I hope that we can get some good discussions going!

See you there!

How understanding Copyright helps you open up your research

By Harry, on 14 March 2023

Guest post by Christine Daoutis, Copyright Support Officer

“Can use this image I found free online?”

“I’m not sure how much of a book or an article it’s OK to copy”.

“This is my article; surely I can post it anywhere I want?”

These questions, and quite a few others, often come up in everyday research practice. They are all related to copyright. Whether you are reusing others’ materials (documents, figures, photos, video, software, data) or creating and sharing your own, understanding copyright ensures not only that you can respect others’ rights and stay within the law, but also that you can open up your research.

But understanding copyright is much more than a legal compliance issue. It is also more than an academic integrity issue. In short, it’s not just about following the rules, but also about understanding your own rights and using the rules flexibly. You can use your copyright knowledge as a tool to open up, rather than restrict, your research. For example, relying on copyright exceptions[1], and knowing how to find, reuse and acknowledge openly licensed materials[2], can give you much more freedom in how you can reuse others’ works. Crucially, knowing your rights as authors also allows you to share your research openly and, through licensing, determine how others may reuse it. Open Science practices – open access to publications, open data, open source software and hardware, co-creation projects – rely on an understanding of copyright.

To help you increase your knowledge and confidence around copyright, you can do any of the following:

Infographic showing key UCL copyright resources. Top three resources: UCL copyright survey, copyright essentials, training sessions. Supporting resources: copyright website, copyright blog, contact the UCL copyright support team.

  1. Complete the 3-minute UCL copyright support survey to rate your confidence and tell us what support you need. If you are not sure what you need to know, the survey gives you some ideas to choose from. Currently open until 31 March 2023.
  2. Complete the 20-minute Copyright Essentials online module. You will learn the basics at your own pace, using quizzes, short videos and academic-based scenarios.
  3. Book a training session delivered the copyright support team. These can be in person or online, and offer you the chance to ask questions.
  4. Visit the UCL copyright website for guidance on specific copyright topics.
  5. Follow the copyright blog for topical articles and updates.
  6. Contact the UCL copyright support team if you have a specific question, or would like to arrange bespoke training.

 

[1] https://blogs.ucl.ac.uk/copyright/2023/02/24/fair-dealing-week-2023-part-2-three-fair-dealing-exceptions/

[2] https://creativecommons.org/faq/

 

Using games to engage with Open Access (and beyond!)

By Kirsty, on 18 May 2022

Guest post by Petra Zahnhausen-Stuber, Open Access Team, UCL Library (LCCOS)

In recent years, ‘Gamification’, the use of game elements in non-gaming settings to improve user experience, has been embraced by Research Support Services at Higher Education Institutes. Research Support Games cover various topics including research data management, copyright and/or open access and address an audience ranging from early career researchers and academics to support staff.

For the organisers of the Research Support Games Days (RSGD), games can be an effective tool to communicate with scholars about often complex concepts. In its third instalment since 2019, this event promotes the use of game-based learning among Research Support Services by presenting games, online tools and platforms that could be beneficial for training purposes. Here it was also highlighted, that most of these games were designed to be played in person. However, the outbreak of the Covid-19 pandemic in 2020 was a catalyst for developing more virtual games as a way of continuing the engagement with researchers when face-to-face training was not possible. Despite any the challenges of creating digital games, their advantage of reaching a wider audience outside the physical environment of research institutions becomes apparent in the following examples of Open Access themed online games.

The Publishing Trap (UK Copyright Literacy), this game about scholarly communication focuses on helping researchers understand the effect of different publishing models, copyright and finances on the dissemination of their research. First launched as a board game in 2017, in response to the pandemic a digital version was created in 2020. In both versions participants form up to 4 teams representing four scholars in different career scenarios and make decisions about how to best publish their research. Retaining most of the original features, the online version uses interactive PowerPoint slides and can be played via any virtual classroom software with a break-out room functionality, so that the element of team discussions from the board game is being replicated.

A group of people doing a jigsaw puzzle on the floor

Open Access Escape Room in action at the 2022 EARMA conference

Similarly, in 2020, the role-playing Open Access Mystery game developed by Katrine Sundsbo uses downloadable slides. It was also designed for online platforms (i.e. Zoom) to allow for immediate verbal interaction between players who are tasked with finding the culprit responsible for a global lockdown of all research. The Open Access Escape Room, also by the same author, was originally created in 2018 as a physical game and digitally adapted in 2020 under the name The Puzzling Hunt for Open Access. Both versions follow the narrative of all research being locked away by a villain and are aimed at academic staff to gain an understanding of the concepts of Open Access. The players have to find clues and solve various Open Access themed puzzles in order to unlock research. Despite not replicating the original escape room format, where participants interact with each other in teams, the online game offers more flexibility as the mixed media-based puzzles can be completed by a single player at their own pace. Like most Research Support Games, all materials are published under a CC BY licence resulting in both versions having been played and adapted further in and outside the UK.

The single-player Open Axis: The Open Access Video Game (UCLA) was always designed for a remote learning environment intending to reach a worldwide audience of graduates and undergraduates. Created in 2020, this “choose your own adventure” can be played in a web browser, is predominantly text based but features classic 8-bit video games. The player chooses between several characters portraying scholars of various backgrounds. Following a non-linear narrative, the player’s decision impact the course of the in-game stories around themes of open access, scholarly publishing and research practices.

Choosing another approach of getting scholars interested in Open Access, the team at Robert Gordon University developed five online puzzles in 2021, including memory, crosswords and a scavenger hunt. Since puzzles can be played quicker than games, it makes them suitable for bite-sized learning during icebreakers or coffee breaks.
These games form by no means an exhaustive list and it is worth delving into the manifold resources of the Research Support Games Day Proceedings (below), where the benefits and challenges involved in taking games online are further explored.

For more information on Research Support Games Days and Gamification:

Adaptions of the “Open Access Escape Room”:

Art History theses and copyright

By Kirsty, on 9 December 2021

Guest post by Thomas Stacey, Open Access Team, UCL Library (LCCOS)

At UCL, students studying for doctoral and research master’s degrees are required to submit an electronic copy of their thesis to the Library for inclusion in UCL Discovery, our open-access repository of UCL research outputs, in order for their degree to be awarded.  The Open Access Team encourages theses to be made openly available, either immediately after award or following the completion of an embargo period. We do, however, recognise that there are a number of reasons why access may need to be restricted, such as future publication, confidentiality, the inclusion of sensitive and/or personal information, and – in the discipline of Art History in particular – the presence of third-party copyrighted images.

I have been thinking about art history theses and whether they could be made open access more easily – and crucially with all the images included where needed.

The University of Cambridge’s ‘Unlocking Research’ blog post written in 2019 by Dr Lorraine de la Verpillière provides a comprehensive background on the issues facing academics within the arts: many are forced to pay to access third-party copyrighted works for private study, and then to pay again later on publish the final research output. Within this blog post, one academic commented “The more successful I become the poorer I get” as the furthering of their career through obtaining copyright for images has cost them over $20,000. Even out-of-copyright artworks are affected, as galleries and museums that own the originals can create their own copyrighted reproductions and restrict others’ ability to do the same.  Bridgeman Images, for example, now owns the rights to all images of artworks in Italian national museums – which can pose a huge financial challenge for many art historians.

A further obstacle for Art History students is that the principle of fair dealing within the Copyright, Designs and Patents Act 1988, which can be used to justify the inclusion of extracts of texts and figures (as part of a wider, previously-published work) in theses, cannot be applied to the reproduction of full artworks, which constitute entire copyrighted works in themselves.

An art history thesis without images understandably compromises the integrity of the work. Unless PhD students use images with Creative Commons licences or which are in the public domain due to being out-of-copyright entirely, they will either have to obtain permissions or redact the images within their thesis accordingly. When processing thesis submissions for UCL PhD students, the Open Access Team will often be required to redact images as part of routine checks prior to any thesis file being made publicly available in UCL Discovery.

It seems there is not a straightforward solution to enable art history theses to be made open access with all images included in the work. Dr De la Verpillière suggests that there could be more support from universities for art history students and academics regarding third-party copyright. Art institutions really need to do more in this respect. Some art institutions have started to make their image collections open access (a selection is given below) so hopefully more will do likewise soon. Even if art institutions provided discounted permissions fees for PhD students needing to use images for example – that is a compromise of sorts to help new academics.

To avoid delays in making theses available in UCL Discovery post-award, or redactions being made to images of artwork that are critical to the overall integrity of the thesis, the Open Access Team also recommends that relevant licence and/or permissions information is included within the thesis file, as part of the Library’s guide to copyright for research students.

Here are some art institutions with open-access image collections:

Copyright and Text & Data mining – what do I need to know?

By Kirsty, on 6 July 2021

Text and Data Mining (TDM) is a broad term used to cover any advanced techniques for computer-based analysis of large quantities of data of all kinds (numbers, text, images etc). It is a crucial tool in many areas of research, including notably Artificial Intelligence (AI). TDM can be used to reveal significant new facts, relationships and insights from the detailed analysis of vast amounts of data in ways which were not previously possible. An example would be mining medical research literature to investigate the underlying causes of health issues and the efficacy of treatments.

The importance of having copyright exceptions in place to facilitate TDM arises from the fact that the swathes of material which need to be mined are often protected by copyright. That would be true for example of “literary works” of all kinds and of images in many cases. It is frequently the case that researchers will have lawful access to the material but will be prevented from applying TDM techniques because copying the material onto the required computer platform risks legal action for infringement on the part of the copyright owners. “Copying” is of course one of the acts restricted by copyright law and in general the greater the amount and variety of material, the greater the copyright risk.

It is worth remembering that when the Government created an exception for Text and Data Mining in 2014, it meant that the UK was ahead of the game. Other countries did not generally have an exception in their legislation at that time. Since then, other jurisdictions have caught up and, in some cases overtaken the UK. Cutting edge research is a highly competitive area and researchers working in a country which benefits from a generous TDM exception will have a distinct advantage.

The existing exception is still significant from the Open Science perspective in enabling research projects where computer analysis of large quantities of copyright-protected material is required, particularly in the context of AI.

Let’s take a closer look at the UK TDM exception and what it allows us to do, before comparing it briefly with the more recent EU exceptions. The UK exception is to be found in Section 29A of the Copyright, Designs and Patents Act 1988.

What does the exception allow us to do?

Copying copyright-protected works in order to carry out “text and data analysis” (“computational analysis” in the wording of the exception). The need to copy arises because researchers must have have the material to be analysed on a specific platform, to carry out the analysis. The need for the exception then arises because without it, the researcher would require permission from the owner of copyright in each item. Without permission (or an exception), the researchers would be infringing copyright by copying a vast swathe of protected material. That in turn would often make the research impractical to carry out.

Who may do this?

Absolutely anyone, the exception says “a person.” This is wonderfully broad and one of the more favourable aspects of the UK exception. For example you don’t need to be working for/ studying at a particular type of institution to benefit from the exception.

Are there conditions?

You must have lawful access to the material. A prime example would be the text of academic journals. We have lawful access to large numbers of e-journals because UCL Library subscribes to them. The exception would allow a UCL researcher to download large amounts of content from e-journals to carry out detailed analysis using specialised tools. It is important to note that the exception cannot be overridden by contract terms. It follows that a term in an e-journal contract seeking to prevent TDM would have no force, in circumstances where the exception applies. This makes the exception a much more useful tool than it would otherwise be.

As you might expect the copies made for TDM purposes may not be used for other purposes, shared etc under the exception.

Significantly, the analysis must be “…for the sole purpose of research for a non commercial purpose.” This is a major restriction, which would rule out many situations where TDM might be used, for example research by a pharmaceutical company developing new drugs which will be marketed commercially. A major issue with the exception is that it can be unclear at what point “non-commercial” shades into “commercial.” A project which starts out as academic research may take on commercial significance down the line and a piece of research with no commercial aspects may be funded by commercial sponsors. It is an important constraint in the legislation which can also be difficult to be sure about in real life situations. It can stand in the way of joint projects by HEIs and commercial organisations.

Still, in situations where we can claim there is no commercial aspect to the research, the exception is potentially very useful. In addition to material which is already digital it can cover projects where digitisation of copyright- protected print material is required to be analysed. It can be very useful in situations where the copyright status of the source material is unclear, since provided the exception applies, there is no need to investigate further the complexities of copyright in the material.

The new EU TDM exception or rather exceptions

The EU Directive on Copyright in the Digital Single Market (DSM Directive) offers two new exceptions, which EM member states are obliged to transpose. They can be found in Articles 3 and 4 of the Directive.

There are important differences of approach to the UK in the answer to the question:  who may carry out the TDM? Article 3 provides an exception which benefits two defined categories of organisations: “Research organisations” and “Cultural heritage organisations.” Included within those groups are for example universities, museums, publicly funded libraries. Commercial organisations are excluded. It seems that independent researchers, not associated with an organisation would also be excluded, even though their research might be “non-commercial.” In common with the UK legislation, this exception cannot be overridden by contract terms and is therefore a powerful tool. The Directive addresses the question of public-private research collaborations in the recitals to the directive, e.g. recital 11. They are not excluded from benefitting from the Article 3 exception.

Article 4 offers a separate TDM exception which is available to anyone (including commercial organisations) but which is limited in a specific way: If the rights owners explicitly reserve the rights to carry out TDM within their works, then it cannot be mined under the exception. In other words, the EU DSM Directive goes one step further than the UK by offering an exception which can be used to mine lawfully accessible works by commercial organisations (or by anyone else), but it does not apply if the rights owner has explicitly ruled out TDM.  By contrast, commercial organisations would not be able to use the UK exception, unless they can claim the specific research is for a non-commercial purpose.

Guest post by Chris Holland, UCL Copyright Support Officer. For more information or advice contact: copyright@ucl.ac.uk

Brexit and Beyond – what does copyright look like post-Brexit?

By Kirsty, on 20 May 2021

On Monday 17th May, we brought together three experts for an in depth look at the impact that Brexit has already had on copyright in the UK and what could be coming next.

Catherine Stihler (CEO Creative Commons), Ben White (Researcher, Centre for Intellectual Property Policy and Management, Bournemouth University) and Dr Emily Hudson (Reader in Law, King’s College London) all brought their own distinct backgrounds and experiences to bear on this topic for a truly interesting discussion.

Open in Media Central or view below

Brexit and beyond webinar in conjunction with Copyright4Knowledge

By Kirsty, on 4 May 2021

On the back of our successful #EbookSOS webinar we are doing it again – join us for another collaboration with Copyright4Knowledge, this time on the subject of the post-Brexit copyright world.

What will the copyright environment be like post-Brexit? How can we best advocate for more library- and research-friendly copyright legislation? The European Union and the European Court of Justice have long exercised a major influence on UK copyright law and the decisions of UK courts in copyright matters. What will happen post-Brexit, given that EU copyright law no longer applies directly in the UK?

Brexit poses many questions for the Library and Research communities and we will endeavour to explore some of them in our Brexit and beyond webinar on 17th May 2021, from 11.00 to 12.30. You are invited to join our three expert speakers to discuss the copyright environment for HE and Research post Brexit.  What are the challenges post-Brexit and does Brexit also present opportunities?

There will be an opportunity to put your questions to the panel in a final Q and A session.

The webinar is free to attend but if you would like to join us please register via Eventbrite

Draft programme

  • 00-11.10  Welcome and introduction
  • 10-11.30 European digital policy and why it still matters to the UK, Catherine Stihler (CEO Creative Commons)
  • 11.30-11.50 Will the UK fall behind the EU in important areas of digital research and online access to 20th century cultural heritage? Benjamin White (Researcher, Centre for Intellectual Property Policy and Management, Bournemouth University)
  • 11.50-12.10 Some suggestions for copyright advocacy in the post-Brexit world, Dr Emily Hudson (Reader in Law, King’s College London)
  • 12.10-12.30 Q&A