Learning Not Copying

June 18, 2025

It might appear as if AI is ingesting copyrigjht material - text, images, videos, etc. - and regurgitating it out in response to prompts. But if you peel back the layers, the underlying process looks a lot more like learning than copying.

This is a link-enhanced version of an article that first appeared in the Mint. You can read the original here. If you would like to receive these articles in your inbox every week please consider subscribing by clicking on this link.

On 11 June 2025, Disney and Universal filed a lawsuit against Midjourney, claiming that the AI image-generation platform was creating “recognisable” images of characters over which they held exclusive rights. This is the latest in a series of complaints lodged against AI companies like OpenAI and Anthropic, alleging that this revolutionary new technology conflicts with the way intellectual property law has operated for centuries.

At the heart of all these cases lies the prohibition under copyright law against the reproduction of literary and artistic works without the owner’s permission. AI companies admittedly ‘train’ their models on text, audio and video material scraped from the internet. Given that much of the output they generate contains similar content, there is a presumption that they have somehow ‘copied’ these works without the permission of copyright holders.

Copyright in the Information Age

Copyright law was established in response to the very first innovation of the information age—the printing press. When publishers realized that the works they had commissioned were being sold in the market at a fraction of the price they charged, they asked for legal protection—not only for the physical books they had printed, but also for the ideas contained within them.

This necessitated a new form of legal protection, one that broadened the concept of ownership beyond just the tangible forms in which content is sold (like books, paintings and vinyl records) to include the intangible ideas they hold. As new technologies emerged, these protections evolved to encompass them; this is how we prevent pirates from selling bootleg DVDs, counterfeiters from duplicating merchandise and websites from displaying images without permission. Now, with AI, copyright law is also being adapted to accommodate it.

AI companies process large volumes of content—text and images—to develop their models. They first break text down into smaller units called tokens and convert images into discrete pixel values. Transformer architectures then process these text tokens to learn the relationships between them, while neural networks learn to remove ‘noise’ from random particles until they start forming coherent images.

As a result, AI models do not store the content they process for retrieval on demand. Instead, they identify statistical patterns within the data so that, when prompted, they can apply that knowledge to generate content that most appropriately responds to the requests made. In the case of large language models (LLMs), this involves predicting the next word, sentence or paragraph. In the case of diffusion models, it entails progressively eliminating noise until an image appears. So, even though Midjourney may have been trained on millions of images, it hasn’t ‘copied’ them into its memory. All it has done is derive statistical patterns from such visual information so that general principles of composition, colour and form can be encoded as mathematical weights that represent that information.

Learning

This process is remarkably similar to human learning. When we read, our eyes scan the words in a book, but our brains don’t store an exact facsimile of that text; instead, they merely retain the ideas and concepts it embodies. When art students study the Great Masters, they do so not to replicate them perfectly, but to absorb techniques, principles of composition and visual approaches, allowing this enhanced understanding to improve their own artistic skills.

This is the essence of human creativity. Exposure to existing works has always been essential for new expression. Shakespeare borrowed his plots. Picasso learnt from African masks. Jazz musicians transformed classical forms. Great artists are also great students, building their own individual styles upon those of masters who came before them. If the fundamental essence of human learning does not violate intellectual property laws, should we not apply the same logic to AI as well?

There is no doubt that this will affect creative industries. But we have been here before. Every wave of technological evolution has been disruptive. The powerloom replaced handloom artists; recorded music rendered live musicians redundant; and the film industry disrupted live theatre performances, just as OTT streaming content has led to fewer people going to theatres. In much the same way, AI is going to disrupt artists of all sorts—graphic designers, musicians, authors, actors and film producers.

Disrupt Thyself

Incumbents have always resisted change. But early victories have often been pyrrhic. Although the music industry succeeded in shutting down Napster’s free file-sharing service, within a decade music had become entirely digital, distributed over online platforms using the very model they had fought so hard to squash.

Creative enterprises that succeed in the long run are those that embrace change.

The London-based advertising agency group WPP recently created a Super Bowl advertisement entirely with AI—no sets, no actors and no crew. It cost far less than it would otherwise have and was completed in a fraction of the time. The ad agency group also offers an AI platform called WPP Open that can take simple text prompts and turn them into social media ads in a matter of minutes—a service that over 50,000 people are already using.

The only way to prevent yourself from being disrupted is to disrupt yourself.