Scarinci Hollenbeck, LLC
The Firm
201-896-4100 info@sh-law.comFirm Insights
Author: Scarinci Hollenbeck, LLC
Date: September 11, 2023
The Firm
201-896-4100 info@sh-law.comQuestions surrounding artificial intelligence (AI) and copyright are evolving quickly. One of the key issues and intricacies involves content produced by “generative AI” computer programs (discussed below), whether the content is entitled to copyright protection, and how training and using these programs may infringe existing copyrights.
Stand-up comedian Sarah Silverman is one of many content creators who have filed lawsuits alleging that AI platforms were trained on their copyrighted works without authorization or license from the rights holders. Silverman, along with authors Christopher Golden and Richard Kadrey, contend that defendants OpenAI and Meta Platforms copied the authors’ published books to train their AI products ChatGPT and LLaMA “without consent, without credit, and without compensation.”
OpenAI and Meta Platforms both offer AI software products known as large language models (LLM). Rather than being programmed by software engineers, large language models are “trained” by copying massive amounts of text and extracting expressive information from such text. As the U.S. Patent and Trademark Office (USPTO) has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.”
Once properly “trained,” platforms like ChatGPT and LLaMA allow users to enter text prompts. The AI platforms then attempt to respond with a coherent and fluent response that closely mimics human language. To produce text outputs, LLMs rely on information extracted from their training datasets, along with patterns and connections drawn from the data. For example, if an LLM is prompted to generate a writing in the style of a certain author, the LLM would construct and generate content based on patterns and connections it learned from analysis of that author’s work within its training data. Importantly, a user can also ask ChatGPT or LLaMA to summarize a copyrighted book and the programs do so based on the training data acquired by the program.
In the lawsuits, Plaintiffs Silverman, Golden, and Kadrey maintain that they did not consent to the use of their copyrighted books as training material for ChatGPT or LLaMA. They further allege that the LLMs are themselves infringing derivative works, made without the plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.
According to their complaint, ChatGPT provided accurate summaries of the plaintiffs’ books when prompted, which demonstrates that the program was trained using their copyrighted works. “Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” their complaint against OpenAI states. The suit further alleges that “at no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works.”
Both suits were filed in California district court and seek class-action status. They allege claims of copyright infringement and violations of the section 1202(b) of the Digital Millennium Copyright Act (DMCA), as well as common law claims of unjust enrichment, unfair competition, and negligence. For example, the lawsuit against Meta argues that the company “breached its duties by negligently, carelessly, and recklessly collecting, maintaining and controlling [theirs] and [others’] infringed works and engineering, designing, maintaining and controlling systems – including LLaMA – which are trained on [theirs] and [others’] infringed Works without their authorization.”
While OpenAI and Meta Platforms have not yet officially responded to the lawsuits, the AI platforms will likely raise a fair use defense. As discussed in prior articles, fair use is determined on case-by-case basis and requires evaluation of the following four factors:
In a recent report, the Congressional Research Service noted that AI companies have previously argued that their training processes constitute fair use and are therefore non-infringing, writing:
Some stakeholders argue that the use of copyrighted works to train AI programs should be considered a fair use under these factors. Regarding the first factor, OpenAI argues its purpose is “transformative” as opposed to “expressive” because the training process creates “a useful generative AI system.” OpenAI also contends that the third factor supports fair use because the copies are not made available to the public but are used only to train the program. For support, OpenAI cites The Authors Guild, Inc. v. Google, Inc., in which the U.S. Court of Appeals for the Second Circuit held that Google’s copying of entire books to create a searchable database that displayed excerpts of those books constituted fair use.
Of course, fair use analysis requires courts to weigh all four fair use factors, and the plaintiffs will likely contend several factors tip the scale in their favor. For example, they may argue that ChatGPT and LLaMA are commercial products, which weighs against fair use under the first statutory factor. They may also argue that by providing summaries of the books, the programs undermine the market for the original works, weighing against fair use under the fourth factor.
Artificial intelligence, particularly generative AI, raises novel and complex copyright issues. In addition to the question of whether generative AI programs infringe copyrights in existing works, the availability of copyright protection for AI-generated works also remains unsettled. Because cases involving generative AI are in their infancy, we are unlikely to find answers to many of these copyright issues in the short term. In the meantime, this area of copyright law warrants close monitoring by content owners as well as AI platform creators and users and Scarinci Hollenbeck remains at the forefront of this issue.
If you have any questions or if you would like to discuss the matter further, please contact me, Albert J. Soler, or the Scarinci Hollenbeck attorney with whom you work, at 201-896-4100.
No Aspect of the advertisement has been approved by the Supreme Court. Results may vary depending on your particular facts and legal circumstances.
Your home is likely your greatest asset, which is why it is so important to adequately protect it. Homeowners insurance protects you from the financial costs of unforeseen losses, such as theft, fire, and natural disasters, by helping you rebuild and replace possessions that were lost While the definition of “adequate” coverage depends upon a […]
Author: Jesse M. Dimitro
Making a non-contingent offer can dramatically increase your chances of securing a real estate transaction, particularly in competitive markets like New York City. However, buyers should understand that waiving contingencies, including those related to financing, or appraisals, also comes with significant risks. Determining your best strategy requires careful analysis of the property, the market, and […]
Author: Jesse M. Dimitro
Business Transactional Attorney Zemel to Spearhead Strategic Initiatives for Continued Growth and Innovation Little Falls, NJ – February 21, 2025 – Scarinci & Hollenbeck, LLC is pleased to announce that Partner Fred D. Zemel has been named Chair of the firm’s Strategic Planning Committee. In this role, Mr. Zemel will lead the committee in identifying, […]
Author: Scarinci Hollenbeck, LLC
Big changes sometimes occur during the life cycle of a contract. Cancelling a contract outright can be bad for your reputation and your bottom line. Businesses need to know how to best address a change in circumstances, while also protecting their legal rights. One option is to transfer the “benefits and the burdens” of a […]
Author: Dan Brecher
What is a trade secret and why you you protect them? Technology has made trade secret theft even easier and more prevalent. In fact, businesses lose billions of dollars every year due to trade secret theft committed by employees, competitors, and even foreign governments. But what is a trade secret? And how do you protect […]
Author: Ronald S. Bienstock
If you are considering the purchase of a property, you may wonder — what is title insurance, do I need it, and why do I need it? Even seasoned property owners may question if the added expense and extra paperwork is really necessary, especially considering that people and entities insured by title insurance make fewer […]
Author: Patrick T. Conlon
No Aspect of the advertisement has been approved by the Supreme Court. Results may vary depending on your particular facts and legal circumstances.
Consider subscribing to our Firm Insights mailing list by clicking the button below so you can keep up to date with the firm`s latest articles covering various legal topics.
Stay informed and inspired with the latest updates, insights, and events from Scarinci Hollenbeck. Our resource library provides valuable content across a range of categories to keep you connected and ahead of the curve.
Questions surrounding artificial intelligence (AI) and copyright are evolving quickly. One of the key issues and intricacies involves content produced by “generative AI” computer programs (discussed below), whether the content is entitled to copyright protection, and how training and using these programs may infringe existing copyrights.
Stand-up comedian Sarah Silverman is one of many content creators who have filed lawsuits alleging that AI platforms were trained on their copyrighted works without authorization or license from the rights holders. Silverman, along with authors Christopher Golden and Richard Kadrey, contend that defendants OpenAI and Meta Platforms copied the authors’ published books to train their AI products ChatGPT and LLaMA “without consent, without credit, and without compensation.”
OpenAI and Meta Platforms both offer AI software products known as large language models (LLM). Rather than being programmed by software engineers, large language models are “trained” by copying massive amounts of text and extracting expressive information from such text. As the U.S. Patent and Trademark Office (USPTO) has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.”
Once properly “trained,” platforms like ChatGPT and LLaMA allow users to enter text prompts. The AI platforms then attempt to respond with a coherent and fluent response that closely mimics human language. To produce text outputs, LLMs rely on information extracted from their training datasets, along with patterns and connections drawn from the data. For example, if an LLM is prompted to generate a writing in the style of a certain author, the LLM would construct and generate content based on patterns and connections it learned from analysis of that author’s work within its training data. Importantly, a user can also ask ChatGPT or LLaMA to summarize a copyrighted book and the programs do so based on the training data acquired by the program.
In the lawsuits, Plaintiffs Silverman, Golden, and Kadrey maintain that they did not consent to the use of their copyrighted books as training material for ChatGPT or LLaMA. They further allege that the LLMs are themselves infringing derivative works, made without the plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.
According to their complaint, ChatGPT provided accurate summaries of the plaintiffs’ books when prompted, which demonstrates that the program was trained using their copyrighted works. “Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” their complaint against OpenAI states. The suit further alleges that “at no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works.”
Both suits were filed in California district court and seek class-action status. They allege claims of copyright infringement and violations of the section 1202(b) of the Digital Millennium Copyright Act (DMCA), as well as common law claims of unjust enrichment, unfair competition, and negligence. For example, the lawsuit against Meta argues that the company “breached its duties by negligently, carelessly, and recklessly collecting, maintaining and controlling [theirs] and [others’] infringed works and engineering, designing, maintaining and controlling systems – including LLaMA – which are trained on [theirs] and [others’] infringed Works without their authorization.”
While OpenAI and Meta Platforms have not yet officially responded to the lawsuits, the AI platforms will likely raise a fair use defense. As discussed in prior articles, fair use is determined on case-by-case basis and requires evaluation of the following four factors:
In a recent report, the Congressional Research Service noted that AI companies have previously argued that their training processes constitute fair use and are therefore non-infringing, writing:
Some stakeholders argue that the use of copyrighted works to train AI programs should be considered a fair use under these factors. Regarding the first factor, OpenAI argues its purpose is “transformative” as opposed to “expressive” because the training process creates “a useful generative AI system.” OpenAI also contends that the third factor supports fair use because the copies are not made available to the public but are used only to train the program. For support, OpenAI cites The Authors Guild, Inc. v. Google, Inc., in which the U.S. Court of Appeals for the Second Circuit held that Google’s copying of entire books to create a searchable database that displayed excerpts of those books constituted fair use.
Of course, fair use analysis requires courts to weigh all four fair use factors, and the plaintiffs will likely contend several factors tip the scale in their favor. For example, they may argue that ChatGPT and LLaMA are commercial products, which weighs against fair use under the first statutory factor. They may also argue that by providing summaries of the books, the programs undermine the market for the original works, weighing against fair use under the fourth factor.
Artificial intelligence, particularly generative AI, raises novel and complex copyright issues. In addition to the question of whether generative AI programs infringe copyrights in existing works, the availability of copyright protection for AI-generated works also remains unsettled. Because cases involving generative AI are in their infancy, we are unlikely to find answers to many of these copyright issues in the short term. In the meantime, this area of copyright law warrants close monitoring by content owners as well as AI platform creators and users and Scarinci Hollenbeck remains at the forefront of this issue.
If you have any questions or if you would like to discuss the matter further, please contact me, Albert J. Soler, or the Scarinci Hollenbeck attorney with whom you work, at 201-896-4100.
Let`s get in touch!
Sign up to get the latest from the Scarinci Hollenbeck, LLC attorneys!