X Close

Open@UCL Blog

Home

Menu

Open educational resources and copyright: what do you need to consider?

By Rafael, on 7 November 2024

This is the last article of our Copyright and Open Science series by Christine Daoutis, UCL Copyright Support Officer, which explored important aspects of copyright and its implications for open research and scholarship.

An Open Educational Resources logo featuring an open book with pages transforming into upward-pointing hands, set against a blue background.

Image caption. Jonathasmello, CC BY 3.0 , via Wikimedia Commons

In this post, we conclude our Copyright and Open Science series by focusing on open education. Broadly defined, open education is “a philosophy about how people should produce, share, and build on knowledge” (source: What is open education? Opensource.com). It refers to values, practices and resources that aim to make scholarship more accessible, equitable, sustainable, transparent and collaborative.

The UNESCO definition of OERs highlights the importance of freely accessible educational materials in advancing open education practices globally. This includes the creation and reuse of OERs—materials that are either out of copyright or licensed to allow reuse. However, open education extends beyond resources to include practices such as integrating open science into teaching, sharing educational practices, and co-creating resources with learners.

OERs include a wide range of materials, such as open textbooks, open access articles, lecture handouts, images, film, slides, lecture recordings, assessment resources, software and whole courses such as Massive Open Online Courses (MOOCS). By default, all these resources are protected by copyright. If you’re planning to create open educational resources, here’s some copyright advice.

Addressing copyright in OERs

1. Know who owns what. If you are creating or collaborating on a teaching resource, it is essential to clarify who holds the copyright. This could be you, the author; your employer, if the work was created in the course of employment; or the resource could be co-owned with others, including students or sponsors. To license a resource for reuse (for example, to make it available under a Creative Commons licence), you must own the copyright to the resource and/or agree such licensing with co-owners. ♦ Copyright ownership at UCL is addressed in the UCL IP Policy.

2. Make the resources openly available and reusable. Once you are certain that the resource is yours to license, consider making it openly available, under a licence that allows reuse. Open access repositories support the discovery and access of different types of materials, including OERs. UCL has a dedicated OER repository, which accepts materials created by its staff and students.

As for licensing: we have explained in a previous post how Creative Commons licences work; and you can read more on how CC licences support OERs on the Creative Commons wiki. Licensing under the most permissive of the licences, the Creative Commons Attribution licence (CC BY), supports the ‘five Rs’ of OERs: enabling others to “retain, revise, remix, reuse and redistribute the materials”. (David Wiley, Defining the “Open” in Open Content and Open Educational Resources, Improving Learning blog).

A cartoon of a smiling stick figure pushing a shopping trolley filled with objects labeled 'CC' (Creative Commons) and holding up a yellow 'CC'-labeled item. The figure is placing an object on a bookshelf with colorful books and 'creative' works.

Image caption: sOER Frank, CC BY 2.0, via Wikimedia Commons

3. Address any third-party materials. If the resource contains materials you don’t own the copyright to (such as third-party content), you have a few options:

  • Reuse works that are out of copyright (public domain) or openly licensed. These might include Creative Commons images and videos, open access articles, and OERs created by others. ♦ See UCL’s guidance on finding OERs and a reading list with links to many openly licensed resources.
  • Get permission from the copyright owner. If the material is not openly licensed, you might consider seeking permission to reuse it. Be clear about how the resource containing the material will be shared (i.e., as an OER). Third-party materials included in an OER should be shared under their own copyright terms (e.g., their reuse may be more restricted than the rest of the resource) and this should be communicated when sharing.
  • Rely on a copyright exception. In some cases, instead of getting permission you may decide to rely on a copyright exception, notably the quotation exception in UK copyright law. Using exceptions requires judgement. You’ll need to determine whether the use of the material is ‘fair dealing’: does the purpose justify the use? Does it affect the copyright owner’s market? Overall, is it “fair” to all parties involved? Be aware that copyright exceptions vary by country, which is important when making a resource globally available. The Code of Best Practices in Fair Use for Open Educational Resources explores these approaches further, putting forward a framework that could be applied internationally.

Putting the copyright advice to practice: examples from UCL’s copyright online tutorials.

The screenshot shows the UCL Copyright Essentials 2023-2024 module page. On the right side, there's an image of stormtroopers marching in formation. The content discusses the use and adaptation of images under Creative Commons licenses. Below the stormtroopers, there are links to additional copyright resources. The layout is clean and educational, providing information on legal considerations for using and modifying copyrighted materials with appropriate licensing. On the left side, the course menu outlines the entire module and includes links to further reading.

Screenshot from UCL’s Copyright Essentials tutorial, which includes a photo by Michael Neel from Knoxville, TN, USA, CC BY 2.0, via Wikimedia Commons.

While creating UCL’s Copyright Essentials and Copyright and your Teaching, two online tutorials introducing copyright, the UCL Copyright support team drew on its own advice. Specifically:

  • Copyright ownership and attribution were addressed. Copyright Essentials is an adaptation of an original resource, which was also openly licensed. Attribution to all original authors was included.
  • Both tutorials are publicly available online, allowing anyone to access and complete them. They are also licensed for reuse under the Creative Commons Attribution licence, permitting others to adapt and redistribute the materials with appropriate attribution.
  • Third-party materials mostly included openly licensed images and links to lawfully shared videos and documents. However, for some materials, we opted to rely on copyright exceptions, which involved a degree of interpretation and risk. This was highlighted in the tutorials, inviting learners to reflect on the use of exceptions.

It should be noted that using proprietary e-learning tools (like Articulate Rise) to develop the tutorials restricts reuse. While the shared resources can be accessed, they cannot be downloaded or edited. Authors wishing to adapt the resources have the option to recreate the materials under the licence terms or contact us for an editable copy. Ideally, these resources should be created with open-source tools, but there’s a trade-off between the advantages of user-friendly, accessible proprietary tools and these limitations.

For more advice on copyright and OERs please contact copyright@ucl.ac.uk.


Read more from the Copyright and Open Science Series:

Copyright and Open science in the age of AI: what can we all do to ensure free and open access to knowledge for all?

By Rafael, on 24 October 2024

We are approaching the end of International Open Access Week, and we have been enjoying a series of interesting insights and discussions across UCL!  Earlier this week, we explored the balance between collaboration and commercialisationhighlighted the important work of Citizen Science initiatives and the growing significance of open access textbooks.

Today, Christine Daoutis, UCL Copyright Support Officer, will build on our ongoing series about copyright and open science, focusing on how we can ensure free and open access to knowledge in the age of AI, by addressing copyright challenges, advocating for rights retention policies, and discussing secondary publication rights that benefit both researchers and the public.


Open Access Week 2024 builds on last year’s theme, Community over Commercialisation, aiming not only to continue discussions but to take meaningful action that prioritises the interests of the scholarly community and the public. This post focuses on copyright-related issues that, when addressed by both individual researchers and through institutional, funder, and legal reforms, can help create more sustainable and equitable access to knowledge.

Infographic promoting Plan S for rights retention strategy. It features an illustration of people climbing ladders towards a large key, symbolising control over open access to knowledge. The text reads: "By exercising your rights, you can share your knowledge as you wish and enable everyone to benefit from your research." The hashtag #RetainYourRights is included in the middle section.

 Rights retention infographic. Source: cOAlition-s

Retaining author rights

Broadly speaking, rights retention means that authors of scholarly publications avoid the traditional practice of signing away their rights to publishers, typically done through a copyright transfer agreement or exclusive licence. Instead, as an author, you retain at least some rights that allow you to share and reuse your own research as openly as possible. For example, you could post your work in an open access repository, share it on academic networks, reuse it in your teaching, and incorporate it into other works like your thesis.

Many funders and institutions have specific rights retention policies that address related legal issues. If such a policy applies, and publishers are informed in advance, authors typically need to retain rights and apply an open licence (usually CC BY) to the accepted manuscript at the point of submission.

Rights retention ensures that your research can be made open access without relying on unsustainable pay-to-publish models, and without facing delays or restrictions from publishers’ web posting policies. Importantly, rights retention is not limited to published research—it can be applied to preprints, data, protocols, and other outputs throughout the research process.

Secondary Publication Rights (SPRs)

Secondary publication rights (SPRs) refer to legislation that allows publicly funded research to be published in an open access repository or elsewhere, at the same time as its primary publication in academic journals. Some European countries already have SPRs, as highlighted by the Knowledge Rights 21 study conducted by LIBER, and LIBER advocates for #ZeroEmbargo on publicly funded scientific publications. There are ongoing calls to harmonise and optimise these rights across countries, ensuring that the version of record becomes immediately available upon publication, overriding contractual restrictions imposed by publishers.

SPRs can apply to different types of research output and are meant to complement rights retention policies. However, introducing SPRs depends on copyright reform, which is not an action individual researchers can take themselves, though it’s still useful to be aware of developments in this area.

The image is a digital collage featuring a blue and green silhouette of a human head composed of circuit patterns on the right. The left side of the background is filled with various tech-themed icons surrounding a prominent "MACHINE LEARNING" label. A hand reaches towards the different icons, interacting with and exploring AI concepts

Source: Computer17293866, CC BY-SA 4.0, via Wikimedia Commons

Artificial Intelligence and your rights

The rise of Generative AI (GenAI) has introduced broader issues affecting researchers, both as users and as authors of copyrighted works. These include:

  • Clauses in subscription agreements that seek to prevent researchers from using resources their institution has subscribed to for AI-related purposes.
  • Publishers forming agreements with AI companies to share content from journal articles and books for AI training purposes, often without clear communication to authors. A recent deal between Taylor & Francis and Microsoft for $10 million has raised concerns among scholars about how their research will be used by AI tools. In some cases, authors are given the option to opt in, as seen with Cambridge Press.
  • For works already licensed for reuse, such as articles under a CC BY licence or those used under copyright exceptions, questions arise about how the work will be reused, for what purposes, and how it will be attributed.

While including published research in AI training should help improve the accuracy of models and reduce bias, researchers should have enough information to understand and decide how their work is reused. Creative Commons is exploring ‘preference signals’ for authors of CC-licensed works to address this issue.

The key issue is that transferring your copyright or exclusive rights to a publisher restricts what you can do with your own work and allows the publisher to reuse your work in ways beyond your control, including training AI models.

Using Copyright exceptions in research

UK copyright law includes exceptions (known as ‘permitted acts’) for non-commercial research, private study, criticism, review, quotation, and illustration for instruction. As a researcher, you can rely on these exceptions as long as your use qualifies as ‘fair dealing’, as previously discussed in a blog post during Fair Dealing Week. Text and data mining for non-commercial research is also covered by an exception, allowing researchers to download and analyse large amounts of data to which they have lawful access.

Relying on copyright exceptions involves evaluating your purpose and, for some exceptions, making a decision around what is ‘fair’. This also involves some assessment of risk. Understanding copyright exceptions helps you exercise your rights as users of knowledge and make confident assessments as to whether and when a copyright exception is likely to apply, and when permission is necessary. [see links for UK legislation at the end of this article]

The hands of diverse individuals hold up large, colorful letters spelling "COPYRIGHT" against a light blue background. Each letter features a different bright color, creating a vibrant and playful display.

Source: www.freepik.com

Engage with copyright at UCL

The conversations sparked during Open Access Week continue throughout the year at UCL as part of ongoing copyright support and education. To engage further with these issues, you can:

Useful Legislation

Coming Soon: Open Access Week 2024!

By Rafael, on 24 September 2024

We’re excited to announce a packed programme of events for this year’s #OAWeek at UCL! Throughout the week, we’ll be sharing daily blog posts and updates on social media that highlight the latest activities from UCL Press and the UCL Copyright team, alongside exciting news on our growing Citizen Science Community on MS Teams. This year’s theme, ‘Community over Commercialisation’, will be at the heart of our discussions, exploring how we can prioritise openness and collaboration in research to benefit the public and academic communities rather than profit-driven initiatives.

Promotional banner for International Open Access Week 2024 with the theme 'Community over Commercialization,' presented in various languages to highlight inclusivity. The illustration shows two people shaking hands, suggesting collaboration and commitment. The dates 21-27 October 2024 and the hashtag #OAWeek are included, encouraging participation and engagement on social media.

Poster of the International Open Access Week 2024

Read more about Open Access week and this year’s theme.

Tuesday 22 October (11:00am-2:00 pm) – Open Science and ARC Roadshow

As part of this year’s Open Access Week activities, we’re launching the first in a series of pilot roadshows, jointly organised by the UCL Office for Open Science & Scholarship and the Centre for Advanced Research Computing.

Come and find us outside the Academic Staff Common Room in the North Cloisters between 11:00 am and 2:00 pm, where our team will be on hand to answer all your questions about Open Access publishing, Research Data Management, Research IT, Data Stewardship, Citizen Science, and any other Open Science-related topics you’re curious about! Stop by to find out more—we might even have some goodies waiting for you!

No registration needed – find the event location on the webpage.

Tuesday 22 October (2:30-4:00 pm) – Copyright, Open Science & Creativity

One event we’re particularly excited about is happening on Tuesday, 22 October (2:30–4:00 pm). We’ll be hosting a brand-new card game designed by Christine Daouti, titled ‘Copyright, Open Science, and Creativity’. This engaging game provides a fun and interactive way to explore key topics like equity in open science, authors’ rights, and open access publishing. You’ll have the opportunity to debate various aspects of copyright with fellow participants and explore issues such as open licences, AI in research, rights retention, and the challenges of equity in open science.

Spaces are limited, so be sure to sign up early! For more details and registration information, visit the event page.

Wednesday 23 October (2:00–3:30 pm) – Annual Open Science & Scholarship Awards

We’re also really glad to invite you to our second Annual Open Science & Scholarship Awards! Join us in celebrating the incredible contributions of colleagues and students to the future of open research and scholarship. The event will feature short talks from the winners in each category, followed by the award presentations. Afterwards, stay for drinks, nibbles, and a chance to network with peers.

Register today via our Eventbrite page!

Thursday 24 October, 2.30 pm – 4 pm, drop-in session on Copyright, Licences and Open Science

Join the UCL Copyright team for an online drop-in session where they’ll be available to answer your questions about copyright, licensing, and how to share your research openly. This is a great opportunity to clarify any issues related to your research, thesis, publications, or data. Feel free to drop in on Teams between 2:30 and 3:50 pm or send your questions in advance to copyright@ucl.ac.uk.

To get you started, here are a few questions you might want to consider:

  • Why do research funders prefer CC BY licences for journal articles and monographs?
  • What copyright considerations should you keep in mind when making your data open and FAIR?
  • Can you use someone else’s copyrighted materials in your own thesis or publication that you plan to make open access?

This session offers a chance to resolve these and other copyright and licensing concerns so you can better understand the open research landscape.

Visit the event page for more information and sign up now!

Stay connected!

While we prepare for the events coming up, make sure you stay informed about new articles, events, and projects by signing up for your mailing list to receive the next issue of our Open@UCL newsletter. Also, join in the conversation during #OAWeek by checking this blog page for daily updates, and following us on LinkedIn or our newly created BlueSky account.

See you there!

 

Copyright and AI, Part 2: Perceived Challenges, Suggested Approaches and the Role of Copyright literacy

By Rafael, on 15 August 2024

Guest post by Christine Daoutis (UCL), Alex Fenlon (University of Birmingham) and Erica Levi (Coventry University).

This blog post is part of a collaborative series between the UCL Office for Open Science and Scholarship and the UCL Copyright team exploring important aspects of copyright and its implications for open research and scholarship. 

A grey square from which many colourful, wavy ribbons with segments in shades of white, blue, light green, orange and black radiate outward against a grey background.

An artist’s illustration of AI by Tim West. Photo by Google DeepMind from Pexels.

A previous post outlined copyright-related questions when creating GenAI materials—questions related to ownership, protection/originality, and infringement when using GenAI. The post discussed how answers to these questions are not straightforward, largely depend on what is at stake and for whom, and are constantly shaped by court cases as they develop.

What does this uncertainty mean for students, academics, and researchers who use GenAI and, crucially, for those in roles that support them? To what extent does GenAI create new challenges, and to what extent are these uncertainties inherent in working with copyright? How can we draw on existing expertise to support and educate on using GenAI, and what new skills do we need to develop?

In this post, we summarise a discussion we led as part of our workshop for library and research support professionals at the Research Libraries UK (RLUK) annual conference in March 2024. This year’s conference title was New Frontiers: The Expanding Scope of the Modern Research Library. Unsurprisingly, when considering the expanding scope of libraries in supporting research, GenAI is one of the first things that comes to mind.

Our 16 workshop participants came from various roles, research institutions, and backgrounds. What they had in common was an appetite to understand and support copyright in the new context of AI, and a collective body of expertise that, as we will see, is very useful when tackling copyright questions in a novel context. The workshop consisted of presentations and small group discussions built around the key themes outlined below.

Perceived Challenges and Opportunities
Does the research library community overall welcome GenAI? It is undoubtedly viewed as a way to make scholarship easier and faster, offering practical solutions—for example, supporting literature reviews or facilitating draft writing by non-English speakers. Beyond that, several participants see an opportunity to experiment, perhaps becoming less risk-averse, and welcome new tools that can make research more efficient in new and unpredictable ways.

However, concerns outweigh the perceived benefits. It was repeatedly mentioned that there is a need for more transparent, reliable, sustainable, and equitable tools before adopting them in research. Crucially, users need to ask themselves what exactly they are doing when using GenAI, their intention, what sources are being used, and how reliable the outputs are.

GenAI’s concerns over copyright were seen as an opportunity to place copyright literacy at the forefront. The need for new guidance is evident, particularly around the use of different tools with varying terms and conditions, and it is also perceived as an opportunity to revive and communicate existing copyright principles in a new light.

Suggested Solutions
One of the main aims of the workshop was to address challenges imposed by GenAI. Participants were very active in putting forward ideas but expressed concerns and frustration. For example, they questioned the feasibility of shaping policy and processes when the tools themselves constantly evolve, when there is very little transparency around the sources used, and when it is challenging to reach agreement even on essential concepts. Debates on whether ‘copying’ is taking place, whether an output is a derivative of a copyrighted work, and even whether an output is protected are bound to limit the guidance we develop.

Drawing from Existing Skills and Expertise
At the same time, it was acknowledged that copyright practitioners already have expertise, guidance, and educational resources relevant to questions about GenAI and copyright. While new guidance and training are necessary, the community can draw from a wealth of resources to tackle questions that arise while using GenAI. Information literacy principles should still apply to GenAI. Perhaps the copyright knowledge and support are already available; what is missing is a thorough understanding of the new technologies, their strengths, and limitations to apply existing knowledge to new scenarios. This is where the need for collaboration arises.

Working Together
To ensure that GenAI is used ethically and creatively, the community needs to work collaboratively—with providers, creators, and users of those tools. By sharing everyday practices, decisions, guidance, and processes will be informed and shaped. It is also important to acknowledge that the onus is not just on the copyright practitioners to understand the tools but also on the developers to make them transparent and reliable. Once the models become more transparent, it should be possible to support researchers better. This is even more crucial in supporting text and data mining (TDM) practices—critical in many research areas—to limit further restrictions following the implementation of AI models.

Magic Changes
With so much excitement around AI, we felt we should ask the group to identify the one magic change that would help remove most of the concerns. Interestingly, the consensus was that clarity around the sources and processes used by GenAI models is essential. How do the models come up with their answers and outputs? Is it possible to have clearer information about the sources’ provenance and the way the models are trained, and can this inform how authorship is established? And what criteria should be put in place to ensure the models are controlled and reliable?

This brings the matter back to the need for GenAI models to be regulated—a challenging but necessary magic change that would help us develop our processes and guidance with much more confidence.

Concluding Remarks
While the community of practitioners waits for decisions and regulations that will frame their approach, it is within their power to continue to support copyright literacy, referring to new and exciting GenAI cases. Not only do those add interest, but they also highlight an old truth about copyright, namely, that copyright-related decisions always come with a degree of uncertainty, risk, and awareness of conflicting interests.

About the authors 

Christine Daoutis is the UCL Copyright Support Officer at UCL. Christine provides support, advice and training on copyright as it applies to learning, teaching and research activities, with a focus on open science practices. Resources created by Christine include the UCL Copyright Essentials tutorial and the UCL Copyright and Your Teaching online tutorial.

Alex Fenlon is the Head of Copyright and Licensing within Libraries and Learning Resources at the University of Birmingham. Alex and his team provide advice and guidance on copyright matters, including text, data mining, and AI, to ensure that all law and practice are understood by all.

Erica Levi is the Digital repository and Copyright Lead at Coventry University. Erica has created various resources to increase awareness of copyright law and open access through gamification. Her resources are available on her website.

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Join our mailing list, and follow us on X, formerly Twitter and LinkedIn, to be part of the conversation and stay connected for updates, events, and opportunities.

 

 

 

Copyright and AI, Part 1: How Does Copyright Apply to AI-Generated Works?

By Rafael, on 21 June 2024

Guest post by Christine Daoutis, UCL Copyright Support Officer. 

This the third blog post of the collaborative series between the UCL Office for Open Science and Scholarship and the UCL Copyright team. Here, we continue our exploration of important aspects of copyright and its implications for open research and scholarship.

An artist’s illustration of artificial intelligence (AI). This illustration depicts language models which generate text. It shows distorted text on a screen seen through a glass container. The visible text at the top reads, "How do large language models work?" The rest is partially obscured, but includes mentions of "neural networks" and "machine learning.

Photo by Google DeepMind.

In a previous post we introduced questions that arise when using and creating materials protected by copyright. What options are available to you if you want to reuse others’ work (e.g. articles, theses, images, film, code) in your research? And what do you need to consider before you share your own research with others? Issues around copyright protection, permissions, exceptions, licences, and ownership need to be examined when creating new works and including others’ materials. These questions are also relevant when we think about works that are created with the use of GenAI tools, such as ChatGPT. However, with the use of these technologies still being relatively new and the legal aspects being shaped as we speak, answers are not always straightforward.

GenAI Training Data: GenAI models are trained on a large number of materials, usually protected by copyright (unless copyright has expired or been waived). Does this mean AI companies are infringing copyright by using these materials? How would copyright exceptions and fair dealing/fair use apply in different countries? How would licence terms – including the terms of open licences – be respected? Answers will come both from legislation and codes of practice introduced by governments and regulatory bodies (such as the EU AI Act) and from the outcomes of court cases (see, for example, Getty Images vs Stability AI, the Authors’ Guild against OpenAI and Microsoft.

User Prompts: The prompts a user provides to the model (instructions, text, images) may also be protected. You should also consider whether the prompts you enter include any confidential/commercially sensitive information that should not be shared. Please see UCL’s IP policy for guidance on this.

A digital illustration depicts a serene-looking young woman with glowing skin and braids that resemble threads. Text overlay reads "Zarya of the Dawn," The background has shades of green, black and blue forming an ethereal environment.

Image Credit: Kris Kashtanova using Midjourney AI, Public domain, via Wikimedia Commons.

AI-Generated Work: Is the AI-generated work an original work protected by copyright? Is it a derivative of other original works, and therefore, possibly infringing? If it is protected, who owns the copyright? The answer to this will vary by case and jurisdiction. In the US, a court ruled that AI-generated images in a comic book were not protected, although the whole comic book and story were. In China, it was ruled that images generated with the use of GenAI tools would be protected, with the owner being the person who provided the prompts. The UK’s CDPA (9.3) states that ‘in the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken’.

In short, GenAI raises questions about what constitutes an original work, what constitutes infringement, how copyright exceptions and fair dealing/fair use are applied, and how authorship is established. While these questions are still being shaped, here are three things you can do:

  1. Consider any limitations in using GenAI besides copyright (e.g., confidentiality, biases, publishers’ policies). See UCL’s Generative AI hub for guidance.
  2. Be transparent about how you use GenAI. See UCL Library guidance on acknowledging the use of AI and referencing AI.
  3. If you have any copyright-related questions on the use of GenAI, contact the copyright support service.

 While GenAI has opened up more questions than answers around copyright, it also offers an opportunity to think about copyright critically. Stay connected with us for Part 2 of this blog post, which will discuss how new technologies, including GenAI, are changing our understanding of copyright. We look forward to continuing this important conversation with you.

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, LinkedIn, and join our mailing list to be part of the conversation!

 

 

Text and Data Mining (TDM) and Your Research: Copyright Implications and New Website Guidance

By Rafael, on 13 May 2024

This the second blog post of our collaborative series between the UCL Office for Open Science and Scholarship and the UCL Copyright team. Here, we continue our exploration of important aspects of copyright and its implications for open research and scholarship. In this instalment, we examine Text and Data Mining (TDM) and its impact on research along with the associated copyright considerations.

Data processing concept illustration

Image by storyset on Freepik.

The development of advanced computational tools and techniques for analysing large amounts of data has opened up new possibilities for researchers. Text and Data Mining (TDM) is a broad term referring to a range of ‘automated analytical techniques to analyse text and data for patterns, trends, and useful information’ (Intellectual Property Office definition). TDM has many applications in academic research across disciplines (Intellectual Property Office definition). TDM has many applications in academic research across disciplines.

In an academic context, the most common sources of data for TDM include journal articles, books, datasets, images, and websites. TDM involves accessing, analysing, and often reusing (parts of) these materials. As these materials are, by default, protected by copyright, there are limitations around what you can do as part of TDM. In the UK, you may rely on section 29A of the Copyright, Designs and Patents Act, a copyright exception for making copies for text and data analysis for non-commercial research. You must have lawful access to the materials (for example via a UCL subscription or via an open license). However, there are often technological barriers imposed by publishers preventing you from copying large amounts of materials for TDM purposes – measures that you must not try to circumvent. Understanding what you can do with copyright materials, what may be more problematic and where to get support if in doubt, should help you manage these barriers when you use TDM in your research.

The copyright support team works with e-resources, the Library Skills librarians, and the Office for Open Science and Scholarship to support the TDM activities of UCL staff and students. New guidance is available on the copyright website. TDM libguide and addresses questions that often arise during TDM, including:

  • Can you copy journal articles, books, images, and other materials? What conditions apply?
  • What do you need to consider when sharing the outcomes of a TDM analysis?
  • What do publishers and other suppliers of the TDM sources expect you to do?

To learn more about copyright (including how it applies to TDM):

Get involved!

alt=""The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, LinkedIn, and join our mailing list to be part of the conversation!