Using AI – Challenge 3 – Copyright and fair dealing • Curo

31 July 2025

Artificial intelligence (AI), particularly in the form of generative AI large language models such as ChatGPT, Gemini and Copilot, is rarely out of the news.

These systems can produce effective text, answer complex questions and generate stunning images. But the software also presents challenges for safe business use. In three posts, we are exploring overcoming errors and hallucinations, environmental impact and copyright issues.

As far as large language models go, artificial intelligence is something of a misnomer. If we consider the text-producing versions, these complex and sophisticated pieces of software take the output of human intelligence in the form of the written electronic documents used to train them and make a mash-up from their contents. The same broadly applies to the files used for training image generation, though the technique used is rather different.

This process does not breach copyright in the conventional sense of directly reproducing someone else’s work.

Yet the fact is that without the effort of those who generated the training material, the AI output would not be possible. And, not unreasonably, the people who produced the original text or images used to train the software might expect some reward for their copyright material being used – or alternatively the opportunity to exclude their work from becoming grist to the AI mill.

A good example of the potential problems faced comes from the response to the intention of Meta (parent company of Facebook, WhatsApp and Instagram) to make use of a collection of pirated electronic books called Library Genesis (LibGen) as a source to train its Llama 3 AI model. LibGen contains more than 7.5 million books and 81 million papers and articles, obtained without permission of the copyright holders. Meta rightly thought that information taken from books and edited articles would be higher quality than text scraped from the internet. But, in making use of this source, they have had to face up to the reality of misusing copyright material.

The magazine The Atlantic published access to a searchable database of works by author name. When I searched for my own content, I found 144 results – most of my books published since 2000 and many magazine and newspaper articles. All are covered by copyright. I had not been asked for permission to use this material. I am not aware of a similar list for images – it’s harder to establish a catalogue of the training data – but there is no doubt that theft will have occurred on a similar scale. In principle, individuals can opt out – but this requires filling in a detailed form for each item, with no mechanism to check that their material has been excluded. Content creators should be given the chance to opt-in, rather than having to discover the action and opt-out.

There are two aspects to this challenge, facing both businesses and individuals.

Firstly, when we use large language models, we are complicit in this illegal practice. The genie is out of the bottle. I don’t think we can expect anyone to say ‘I won’t use AI systems’ because they are indubitably useful (as long as we take into account, for instance, the hallucinations mentioned in the first challenge). But it is reasonable to ask users to complain to both governments and the AI developers, demanding that they stop illegal use of copyright training material without permission.

Secondly, businesses need to protect their own copyright material as much as possible, putting pressure on AI developers not to undertake this practice if they wish to continue receiving lucrative contracts to incorporate their systems into those of client businesses.

As someone who has been a professional writer longer than the internet being widely used by ordinary people, I have accepted for a long time that my books will be illegally copied and shared by some individuals. Trying to avoid this takes far too much time and effort for these limited uses. But when material I have written is repurposed on an industrial scale, it does not seem unreasonable to expect to be asked for my permission, and to receive reasonable recompense.

Brian Clegg is an award-winning science writer with over 50 books in print and articles in a wide range of newspapers and magazines (www.brianclegg.net).

Back to Curo News