| Topical Update

UK Copyright & AI Training: What’s at Stake?

8 minute read

Following the on-going debate as regularly featured on the BBC’s Today program - what are the issues here?

UK Government (Dec 2024–2025) is leaning toward a text-and-data-mining (TDM) exception with a rights-reservation “opt-out”, plus mandatory transparency—broadly in line with the EU DSM Article 4 approach. Rights holders could block use (reserve rights), and licensing is encouraged/expected. GOV.UK

Writers’ Guild of Great Britain (WGGB) rejects opt-out. It wants permission first (opt-in), clear disclosure of training sources, labelling of AI outputs, and payment for use of writers’ works. The Writers’ Guild of Great Britain

Context: Creative industries contribute ~5% of UK GVA (≈£124bn, 2023). Any policy shift that misprices (or gives free access to) creative inputs risks undercutting a major growth sector.

Market signal: Even while litigation continues, OpenAI (ChatGPT creator) has struck paid licensing deals with major publishers (News Corp, FT, Axel Springer, AP), acknowledging at scale that copyrighted archives hold value and that royalties/licences are a viable route. 

Creative industries contribute ~5% of UK GVA (≈£124bn, 2023). Any policy shift that misprices (or gives free access to) creative inputs risks undercutting a major growth sector.

The UK Government’s Current Direction

Proposed package (Consultation opened 17 Dec 2024):

To introduce a Text and Data Mining (TDM) exception for any purpose, including commercial, provided the user has lawful access and the rightsholder hasn’t reserved rights. This should be coupled with transparency duties on AI developers, requiring them to disclose training sources and datasets, crawler details, and maintain record-keeping. Additionally, enable and encourage licensing, including collective licensing, where rights are reserved and access is needed. Finally, work towards interoperability with international regimes, explicitly referencing the EU model.

Government argues that unclear UK rules push model training offshore and depress domestic AI investment; its “opt-out + transparency” aims to unlock lawful training in the UK while preserving avenues for remuneration via licensing.

But earlier efforts to broker a non-statutory code of practice between platforms and rightsholders collapsed in early 2024, which helped precipitate this legislative tack. 

The Writers’ Guild Position (WGGB)

WGGB’s policy line is clear: Permission before use (opt-in) should be required, meaning no commercial Text and Data Mining (TDM) exception that allows developers to scrape works unless the author actively blocks it. Transparency is essential, with public logs so writers can see if their works were used. Remuneration should involve ongoing payment when works are ingested, along with labelling of AI-generated outputs. The rationale behind these measures is that unlicensed training suppresses pay, reduces jobs, and risks hollowing out a sector that is strategically important to UK identity and exports.

Why the “5% of the UK economy” matters

The creative industries contributed ~5% of UK GVA in 2023 (~£124bn), with high-value spillovers (exports, soft power). Policy that defaults to broad uncompensated access (or puts heavy burden on rightsholders to police opt-outs in practice) risks under-remunerating the inputs that feed both culture and AI. In short: get the price of data wrong, and you tax a growth sector.

Signals from Deals & Cases

Whatever the ultimate court outcomes on “fair use/fair dealing,” the market is voting with chequebooks. Licensing at scale includes deals such as News Corp–OpenAI (2024) with a reported value of over $250 million over five years for archive access and product integrations, Financial Times/OpenAI (2024) for FT archive licensed for training and attributed use in ChatGPT, and Axel Springer/OpenAI (2023) and AP/OpenAI (2023) for content licensing for model training. Litigation continues as publishers like Ziff Davis and others have sued OpenAI, with parallel actions targeting other AI companies. Settlements are shaping expectations, exemplified by Anthropic’s recently proposed $1.5 billion settlement (reported here) with authors over pirated book datasets, which underscores the liability exposure when acquisition is unlawful, even where some training uses may be defensible.

The remedy includes dataset destruction and per-work payments, informing UK debates on transparency, lawful access, and rights reservation.

Takeaway: The combination of paid licences and expensive settlements is normalising royalty-bearing access to large, high-quality sets of copyright works, precisely what the Government says it wants to enable through a reserve-rights TDM regime plus transparency.

Summary of where the Positions Diverge (and Overlap)

Issue

UK Government (consultation)

Writers’ Guild (WGGB)

Default training access

Opt-out TDM exception if rights not reserved, with lawful access + transparency preconditions

Opt-in (permission first); no commercial TDM exception allowing scraping without prior consent

Remuneration

Envisions licensing where rights are reserved; explores collective licensing

Payment as the norm for any training use; structures that pay individual writers

Transparency

Mandated disclosures/records; crawler IDs; dataset source reporting

Robust transparency, publicly checkable logs

Labelling of outputs

Supportive of AI-output labelling standards

Requires clear labelling and credit where appropriate

Policy driver

Balance UK AI competitiveness with creator control; align with EU-style regime

Protect incomes, jobs, and cultural value; prevent uncompensated extraction

Practical Implications for UK Creators & AI Developers

For creators & publishers: Start reserving rights (machine-readable where possible) on web properties and within contracts; prepare to participate in collective licensing if adopted.  Audit archives to identify licensable datasets; create tariff frameworks pegged to usage class/volume.

For AI developers: Build dataset governance: provenance records, crawler disclosures, and processes to honour reservations at scale; and assume paid access for premium archives; “free scraping” risks both supply shutdown (opt-outs) and liability (as the Anthropic saga shows when acquisition is unlawful).

The Bottom Line

The Government’s “opt-out + transparency + licensing” model is pragmatic and internationally interoperable, but lives or dies on implementation details: usability of rights-reservation tech, enforceable transparency, and efficient licensing rails.

However, the Writer’s Guild’s “opt-in + pay” stance protects individual writers more robustly and reduces free-riding, but could raise transaction costs unless paired with collective or blanket licensing mechanisms.

Given that creative industries are ~5% of the economy, policy should bias toward proven remuneration channels—which the market is already converging on through licensing deals—while ensuring genuine research and innovation are not chilled.

Possible UK Policy Tweaks (to bridge the gap)

  1. Statutory transparency with audit rights (disclosure of dataset sources & crawler identities; records retained for a defined number of years). 
  2. Rights-reservation that works in practice (machine-readable standards; default honouring by major crawlers; ICO/IPO guidance).
  3. Collective licensing pathways (e.g., extended collective licensing for training uses) so individual writers actually get paid without friction.
  4. Output labelling requirements for certain commercial contexts to support attribution and market transparency. 
  5. Enforcement with teeth for non-compliance (including suspension of UK processing where transparency/rights-reservation is ignored).

Finally, just to compare an existing collection agency, PRS Today vs. AI Collective Tomorrow

Feature

PRS for Music (Today)

Possible AI Collective (Future)

Scope

Songwriters, composers, music publishers

Authors, journalists, screenwriters, visual artists, etc.

Licensing model

Blanket licences for radio, TV, streaming, venues

Blanket/collective licences for AI training datasets

Rights covered

Performance & communication to the public

Text & data mining, reproduction for model training

Value exchange

Royalties paid by broadcasters, platforms, venues

Royalties/licences paid by AI developers

Administration

Monitoring, collection, distribution; admin costs ~10–15%

Dataset registries, opt-out logs

Let’s Stay Connected

Protect your innovation, contact Barnfather today.