LET'S GET STRATEGIC
Generative AI and Copyright Issues: What You Need to Know
by Linda Pophal
Generative AI (ChatGPT is one popular example) holds a lot of promise for researchers and content creators of all kinds, but it may also pose some risks that have not yet been fully explored, such as the risk of copyright infringement. As an AI tool crawls the internet and digital sources for information to respond to users’ queries, the information that’s pulled often belongs to other content creators. What risks does that have for those relying on this information—sometimes verbatim—especially when it might be inaccurate? In this column, let’s take a look at written content generated by AI and the implications content creators need to be aware of.
COPYRIGHT: NAVIGATING NEW TERRAIN
Daniel Restrepo, an associate at the Fennemore law firm’s Phoenix office, working in its business and finance practice group, says that there are two conflicting interests when it comes to the regulation of intellectual property (IP) and generative AI. He notes that “copyright has historically been reserved to content created by human beings with the policy goal of encouraging the sharing of new and innovative ideas with the public and culture.” But “there is an enormous interest in promoting and rewarding the development of AI and machine learning.” Restrepo adds, “Beyond the novelty of ChatGPT, AI presents significant value to businesses, government administration, and national security.”
The quandary? Restrepo says, “If we do not provide IP rights for content generated by AI, particularly to the designers of AI, then there is much less incentive to create such software if the result is content that immediately enters the public domain.” However, there are some other sticky concerns, as Kennington Groff, an IP, entertainment, and business attorney with Founders Legal in Atlanta, points out. “Based on the recent guidelines provided by the U.S. Copyright Office, there is a potential risk of copyright infringement when AI-generated content is derived from copyrighted material without proper authorization,” Groff says. “As AI systems crawl the internet and digital sources to gather information and respond to user queries, they may inadvertently use copyrighted content belonging to other creators. This raises concerns about infringement for both AI developers and users who rely on the AI-generated information, sometimes even reproducing it verbatim.”
In addition, as Aaron C. Rice, chair of Founders Legal’s entertainment group and managing attorney of the firm’s Nashville location, mentions in an April 2023 blog post (“U.S. Copyright Guidelines for Works Containing AI-Generated Material”), copyrighting includes the following disclosure requirement: “When registering a work that contains AI-generated material, creators must disclose the use of AI in the registration application. This disclosure helps the Copyright Office in assessing the human author’s contribution to the work.”
Registering one’s work is not, of course, a requirement for it to be copyrighted—that occurs automatically upon its creation (although the creator will not be eligible for punitive damages if the work is infringed upon). However, as Arle Lommel, director of data services at CSA Research in Massachusetts, points out, generative AI doesn’t really work in the way that many believe it does.
UNDERSTANDING HOW GENERATIVE AI ACTUALLY WORKS
“Unfortunately, there is a lot of misunderstanding about how generative AI works right now,” says Lommel. “Many people assume it acts like a giant search engine, retrieving and reproducing content that it has stored somewhere.” But that’s not the process, he states. “First, generative AI (aka GenAI) systems do not store vast troves of training data. Instead, they store statistical representations of that data. This means they cannot simply reproduce something they were trained on, but instead have to generate something based on it.”
Lommel likens the process to an undergraduate student asked to write a summary paper based on several pieces of research, who then puts that summary into their own words and synthesizes some of the knowledge in those sources to reflect their own understanding. That, he says, “differs from a student who buys a paper online or copies a Wikipedia article, which clearly constitutes plagiarism.” Addressing plagiarism and content ownership of AI-generated content, Lommel states, will be extremely difficult given the way these systems work. “Because they are generating—rather than copying—content, the bar will be high to prove that any output infringes on the rights of others.”
Lommel acknowledges that “theoretically, the output is a derivative work.” But, he says, “it is derivative of many, many works, all of which contribute in an infinitesimal degree to the output. This is not to say that a clever legal strategy might not succeed in finding some infringing use, but I believe the risk to be quite low.” For written content creators, of course, there are tools such as Grammarly’s Plagiarism Checker or Turnitin (used by teachers) that can be utilized to identify plagiarism. There are also tools such as OpenAI’s AI Text Classifier, which allows users to cut and paste copy to analyze the likelihood that it was created by a human or by AI.
ACCURATE OR NOT?
Still, for those using generative AI to create written content, a bigger risk than plagiarism exists—the risk of inaccuracy. Lommel explains the inaccuracy problem of AI as follows:
Generative AI carries a real risk because of how fluent it is: The output resembles something a competent human might say or create, but this makes it easy to miss subtle problems of meaning. For example, if you use GenAI to translate an instruction for how a patient should treat a disease and it gets details wrong and nobody detects it because the output sounds right, who is liable if that results in harm? The terms of service for all of the current GenAI tools explicitly disclaim any warranty of fitness in their output, which leaves whoever employed them with the full liability. And simply saying, ‘I thought it was right,’ will not remove that liability. We are already seeing lawsuits over incorrect factual content, but it is only a matter of time before an organization is sued over content that it produced using GenAI with insufficient oversight.
In addition, Lommel says, the static nature of these systems also poses the risk of inaccuracy because they are trained on datasets that are time-limited: For ChatGPT, it’s 2021. “GPT-4 warns users of this, but it becomes problematic when systems make statements based on past knowledge that is now known not to be true,” he states. “Imagine a case where a system describes someone as a convicted murderer, but that individual was completely exonerated at a point after it was trained.” That risk exists even absent AI-generated content, because humans may miss certain facts while researching. But, Lommel suggests, generative AI “is likely to exacerbate this problem.”
Here are three important must-do’s for written content creators as they experiment with this technology: Consider generative AI as a tool that can inform your writing process, not replace it; fact-check carefully; and run any content created that relies on generative AI-provided content—even to a small degree—through tools that can minimize the risk of inadvertent copyright infringement.