Here’s the newest legal plot twist in AI land. On Friday, Encyclopaedia Britannica and dictionary publisher Merriam-Webster filed a lawsuit against OpenAI. The publishers say OpenAI used their copyrighted material to train its models and that the result is a chatbot that sometimes repeats their text almost word for word.
What the publishers are claiming
The core allegation is blunt. Britannica says that GPT-4 has "memorized" large parts of its copyrighted content and will output near-verbatim copies when prompted. The complaint includes side-by-side examples where passages from the model appear to match Britannica text closely.
Britannica also argues that the way OpenAI’s responses are delivered can replace visits to its website. Instead of sending users to Britannica like a search engine might, the model gives answers directly. Britannica calls this practice "cannibalizing" its web traffic and says it competes with the publisher’s content.
How the lawsuit presents evidence
- The complaint shows examples of model outputs placed next to Britannica passages that look very similar.
- Britannica describes those model outputs as unauthorized copies produced after the model was trained on its material.
- The publishers argue the model not only copied text but also harmed their ability to attract visitors.
Where this fits in the larger picture
This suit is part of a wider wave of legal challenges from publishers and authors. The New York Times has a long-running lawsuit against OpenAI with similar claims about unauthorized copying. And in another case, Anthropic settled a class action for using copyrighted books to train its AI and agreed to pay about 1.5 billion dollars to authors.
Why this matters
If courts accept the publishers’ arguments, it could change how AI companies gather and use text from the web. That may affect what material is used to train future models and how AI systems handle direct answers versus directing users to source websites.
For now, the case adds another chapter to ongoing debates about copyright, training data, and the balance between building helpful AI and respecting creators. Expect more legal filings and, likely, more dramatic headlines.