ChatGPT and AI: Navigating uncharted copyright territory
5 June 2023
by Ridwaan Boda and Alexander Powell
ChatGPT was launched in November 2022 and it’s done a great job of hogging the headlines ever since. GPT is an acronym for “generative pre-trained transformer”, and ChatGPT is an artificial intelligence (“AI”) language model that’s trained on a large body of data. ChatGPT is already generating content for many professionals including journalists and lawyers. But how does this fast-changing field affect the world of IP?
ChatGPT is trained to generate and understand text. If you’re a stats person you might be interested to know that ChatGPT has been trained on some 45 terabytes of data, and it has been fed some 300 billion words from books, Wikipedia, other writing, and back-and-forth human conversation.
A lawless environment
Currently, there are no laws specifically regulating generative tools like ChatGPT. In October 2022, the White House Office of Science and Technology Policy (OSTP) created a blueprint for an AI Bill of Rights. Additionally, the EU’s Artificial Intelligence Act is due to come into effect in 2024, and this will create a common regulatory and legal framework.
IP issues: Copyright
But what are the IP issues raised by ChatGPT? The particular area of IP law involved here is copyright. The concern that copyright owners have is this: generative AI tools like ChatGPT are built on the back of works that enjoy copyright protection, yet these copyrighted works are used to train generative AI models without authority or payment. The copyrighted works used without authority or compensation can comprise text, art, film and sounds, among others.
Inputs and model training: text and data mining
The terms “inputs” and “model training” are often used in relation to the (often unauthorised) use of copyrighted material in the creation of generative AI tools like ChatGPT.
The inputs or model-training stage of generative AI tools like ChatGPT involves heavy-duty text and data mining (“TDM”) of copyrighted works. In the EU, there is legislation governing TDM – there are exceptions in the 2019 Copyright in the Digital Single Market Directive covering TDM for scientific purposes (Article 3), as well as commercial TDM (Article 4).
In the USA, on the other hand, there is no specific exception for TDM, so the issue is whether it constitutes fair use. Case law suggests that US law is very permissive compared to the EU when it comes to TDM, making it a good jurisdiction to develop generative AI tools. South Africa currently lacks a law that specifically deals with TDM.
Outputs of tools like ChatGPT
The term “output” refers to the output of a generative AI tool like ChatGPT. Many questions arise, for example:
- Is the output protected by copyright?
- Does the output infringe earlier copyrighted works, especially those used during the model training stage of the AI system?
- Is the output a “derivative work” of the copyrighted work?
- Do copyright exceptions apply to outputs that might otherwise infringe copyright?
Stable Diffusion AI’s idea of a copyright lawyer.
Litigation: human authorship required
Some of the issues listed above are the subject of litigation in the USA and UK. The US Copyright Office has issued a Copyright Registration Guidance emphasising the application of the human authorship requirement for copyright. It said this:
“When an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the traditional elements of authorship are determined by the technology, not the human user… when an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result that material is not protected by copyright and must be disclaimed in a registration application.”
Recently, a US lawyer also admitted to using ChatGPT to do legal research, where it was eventually revealed that the output included references to completely fabricated court cases. Many argue that this could lead to the lawyer getting disbarred.
In both the EU and the USA introducing prompts into a generative tool does not necessarily grant the prompter copyright over the output.
A South African perspective
It’s very early days but here are some insights. As mentioned, OpenAI’s rules state that the inputter owns the input and OpenAI assigns its rights to the output to the inputter. But in our view, if OpenAI is simply copying the answer from elsewhere, it would not be able to assign the copyright.
On some occasions, an adaptation that is infringing may create a new copyright, which can be assigned. All outputs would, in our view, be computer-generated, so OpenAI is probably the author of the output under South African law.
Some final words to create some balance
ChatGPT is a hugely exciting and somewhat scary development. But it’s worth bearing in mind that it really is early days, and despite the opportunities, it still does have a tendency to generate errors. Only time will tell.
Reviewed by Rowan Forster, an Executive in ENSafrica’s IP department.
Dr Bernard Dippenaar