Text and Data Mining (TDM) | UCL Open@UCL Blog

Archive for the 'Text and Data Mining (TDM)' Category

Copyright and Open science in the age of AI: what can we all do to ensure free and open access to knowledge for all?

By Rafael, on 24 October 2024

We are approaching the end of International Open Access Week, and we have been enjoying a series of interesting insights and discussions across UCL! Earlier this week, we explored the balance between collaboration and commercialisation, highlighted the important work of Citizen Science initiatives and the growing significance of open access textbooks.

Today, Christine Daoutis, UCL Copyright Support Officer, will build on our ongoing series about copyright and open science, focusing on how we can ensure free and open access to knowledge in the age of AI, by addressing copyright challenges, advocating for rights retention policies, and discussing secondary publication rights that benefit both researchers and the public.

Open Access Week 2024 builds on last year’s theme, Community over Commercialisation, aiming not only to continue discussions but to take meaningful action that prioritises the interests of the scholarly community and the public. This post focuses on copyright-related issues that, when addressed by both individual researchers and through institutional, funder, and legal reforms, can help create more sustainable and equitable access to knowledge.

Infographic promoting Plan S for rights retention strategy. It features an illustration of people climbing ladders towards a large key, symbolising control over open access to knowledge. The text reads: "By exercising your rights, you can share your knowledge as you wish and enable everyone to benefit from your research." The hashtag #RetainYourRights is included in the middle section.

Rights retention infographic. Source: cOAlition-s

Retaining author rights

Broadly speaking, rights retention means that authors of scholarly publications avoid the traditional practice of signing away their rights to publishers, typically done through a copyright transfer agreement or exclusive licence. Instead, as an author, you retain at least some rights that allow you to share and reuse your own research as openly as possible. For example, you could post your work in an open access repository, share it on academic networks, reuse it in your teaching, and incorporate it into other works like your thesis.

Many funders and institutions have specific rights retention policies that address related legal issues. If such a policy applies, and publishers are informed in advance, authors typically need to retain rights and apply an open licence (usually CC BY) to the accepted manuscript at the point of submission.

Rights retention ensures that your research can be made open access without relying on unsustainable pay-to-publish models, and without facing delays or restrictions from publishers’ web posting policies. Importantly, rights retention is not limited to published research—it can be applied to preprints, data, protocols, and other outputs throughout the research process.

Secondary Publication Rights (SPRs)

Secondary publication rights (SPRs) refer to legislation that allows publicly funded research to be published in an open access repository or elsewhere, at the same time as its primary publication in academic journals. Some European countries already have SPRs, as highlighted by the Knowledge Rights 21 study conducted by LIBER, and LIBER advocates for #ZeroEmbargo on publicly funded scientific publications. There are ongoing calls to harmonise and optimise these rights across countries, ensuring that the version of record becomes immediately available upon publication, overriding contractual restrictions imposed by publishers.

SPRs can apply to different types of research output and are meant to complement rights retention policies. However, introducing SPRs depends on copyright reform, which is not an action individual researchers can take themselves, though it’s still useful to be aware of developments in this area.

The image is a digital collage featuring a blue and green silhouette of a human head composed of circuit patterns on the right. The left side of the background is filled with various tech-themed icons surrounding a prominent "MACHINE LEARNING" label. A hand reaches towards the different icons, interacting with and exploring AI concepts

Source: Computer17293866, CC BY-SA 4.0, via Wikimedia Commons

Artificial Intelligence and your rights

The rise of Generative AI (GenAI) has introduced broader issues affecting researchers, both as users and as authors of copyrighted works. These include:

Clauses in subscription agreements that seek to prevent researchers from using resources their institution has subscribed to for AI-related purposes.
Publishers forming agreements with AI companies to share content from journal articles and books for AI training purposes, often without clear communication to authors. A recent deal between Taylor & Francis and Microsoft for $10 million has raised concerns among scholars about how their research will be used by AI tools. In some cases, authors are given the option to opt in, as seen with Cambridge Press.
For works already licensed for reuse, such as articles under a CC BY licence or those used under copyright exceptions, questions arise about how the work will be reused, for what purposes, and how it will be attributed.

While including published research in AI training should help improve the accuracy of models and reduce bias, researchers should have enough information to understand and decide how their work is reused. Creative Commons is exploring ‘preference signals’ for authors of CC-licensed works to address this issue.

The key issue is that transferring your copyright or exclusive rights to a publisher restricts what you can do with your own work and allows the publisher to reuse your work in ways beyond your control, including training AI models.

Using Copyright exceptions in research

UK copyright law includes exceptions (known as ‘permitted acts’) for non-commercial research, private study, criticism, review, quotation, and illustration for instruction. As a researcher, you can rely on these exceptions as long as your use qualifies as ‘fair dealing’, as previously discussed in a blog post during Fair Dealing Week. Text and data mining for non-commercial research is also covered by an exception, allowing researchers to download and analyse large amounts of data to which they have lawful access.

Relying on copyright exceptions involves evaluating your purpose and, for some exceptions, making a decision around what is ‘fair’. This also involves some assessment of risk. Understanding copyright exceptions helps you exercise your rights as users of knowledge and make confident assessments as to whether and when a copyright exception is likely to apply, and when permission is necessary. [see links for UK legislation at the end of this article]

The hands of diverse individuals hold up large, colorful letters spelling "COPYRIGHT" against a light blue background. Each letter features a different bright color, creating a vibrant and playful display.

Source: www.freepik.com

Engage with copyright at UCL

The conversations sparked during Open Access Week continue throughout the year at UCL as part of ongoing copyright support and education. To engage further with these issues, you can:

Add your voice to how copyright literacy is shaped at UCL. Provide feedback on the draft UCL copyright literacy strategy and consider joining the new UCL copyright literacy community.
Attend a copyright training session or email copyright@ucl.ac.uk to arrange a bespoke session. The newly introduced ‘Copyright, Open Science and Creativity’ game is a fun way of engaging with current debates and learning about recent developments. More dates for this will be released soon.
Follow copyright news on the Open@UCL blog and the Copyright blog.

Useful Legislation

Filed under Advocacy, Copyright, Guest post, Open Access, Open Access Week 2024, Plan S, Text and Data Mining (TDM)

Tags: authorship, Copyright and AI, copyright and open science, copyright exceptions, copyright literacy, fair dealing, Open Access Week, Open Access Week 2024, Open Access Week blog series, Plan S, Rights retention, Secondary Publication Rights

1 Comment »

Text and Data Mining (TDM) and Your Research: Copyright Implications and New Website Guidance

By Rafael, on 13 May 2024

This the second blog post of our collaborative series between the UCL Office for Open Science and Scholarship and the UCL Copyright team. Here, we continue our exploration of important aspects of copyright and its implications for open research and scholarship. In this instalment, we examine Text and Data Mining (TDM) and its impact on research along with the associated copyright considerations.

Image by storyset on Freepik.

The development of advanced computational tools and techniques for analysing large amounts of data has opened up new possibilities for researchers. Text and Data Mining (TDM) is a broad term referring to a range of ‘automated analytical techniques to analyse text and data for patterns, trends, and useful information’ (Intellectual Property Office definition). TDM has many applications in academic research across disciplines (Intellectual Property Office definition). TDM has many applications in academic research across disciplines.

In an academic context, the most common sources of data for TDM include journal articles, books, datasets, images, and websites. TDM involves accessing, analysing, and often reusing (parts of) these materials. As these materials are, by default, protected by copyright, there are limitations around what you can do as part of TDM. In the UK, you may rely on section 29A of the Copyright, Designs and Patents Act, a copyright exception for making copies for text and data analysis for non-commercial research. You must have lawful access to the materials (for example via a UCL subscription or via an open license). However, there are often technological barriers imposed by publishers preventing you from copying large amounts of materials for TDM purposes – measures that you must not try to circumvent. Understanding what you can do with copyright materials, what may be more problematic and where to get support if in doubt, should help you manage these barriers when you use TDM in your research.

The copyright support team works with e-resources, the Library Skills librarians, and the Office for Open Science and Scholarship to support the TDM activities of UCL staff and students. New guidance is available on the copyright website. TDM libguide and addresses questions that often arise during TDM, including:

Can you copy journal articles, books, images, and other materials? What conditions apply?
What do you need to consider when sharing the outcomes of a TDM analysis?
What do publishers and other suppliers of the TDM sources expect you to do?

To learn more about copyright (including how it applies to TDM):

Visit the new TDM guidance.
Register for one of our copyright sessions.
Contact the copyright team if you have specific questions or would like to organise a session for your department.
The new UCL Copyright Literacy Community aims to bring together staff and students from across UCL to identify and collaborate on areas where an understanding of copyright should be strengthened. If you are interested in joining, please contact the copyright team.

Get involved!

The UCL Office for Open Science and Scholarship invites you to contribute to the open science and scholarship movement. Stay connected for updates, events, and opportunities. Follow us on X, formerly Twitter, LinkedIn, and join our mailing list to be part of the conversation!

Filed under Copyright, Guest post, Text and Data Mining (TDM)

Tags: copyright, copyright and open science, intellectual property, new resource, open science, Text and Data Mining (TDM)

No Comments »

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Archive for the 'Text and Data Mining (TDM)' Category