Common Misconceptions About Generative AI and Copyright Disruptive Competition Project
However, the fact that existing copyright acquis provisions likely do not apply directly to generative AI models may cast doubt on the need to consider them when developing filtering tools against infringing output. In the leaked version, a distinction is drawn between “general purpose AI system” (GPAI) and “foundation models”. Importantly, “generative AI” is defined as a type of foundation model “specifically intended to be used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video” (Article 28b).
Judge Beryl A. Howell of the US District Court for the District of Columbia agreed with the US Copyright Office’s decision to deny grant copyright protections to an artwork created by computer scientist Stephen Thaler using “Creativity Machine,” an AI system of his own design. Howell wrote in her motion that “courts have uniformly declined to recognize copyright in works created absent any human involvement. The answer is not clear at this point whether a generative AI system may use input data that is protected by copyright law. On one hand, new works are being created from the input data with minimal display of any copyrighted works.
Humans creating the training data as copyright holders
Columbia Law School professor Shyamkrishna Balganesh echoes that belief that the American legal system’s eventual determination of whether or not the training of an LLM constitutes copyright infringement will almost certainly not be cut-and-dry. Last week, the Authors Guild sent an open letter to the leaders of some of the world’s biggest generative AI companies. However, it seems that copyright may be possible in cases where the creator can prove there was substantial Yakov Livshits human input. It depends — generative AI may violate copyright laws when the program has access to a copyright owner’s works and is generating outputs that are „substantially similar” to the copyright owner’s existing works, according to the Congressional Research Service. If the human and machine’s contributions are more intertwined, a work’s eligibility for copyright depends on how much control or influence the human author had on the machine’s outputs.
The ruling might also make Hollywood studios more cautious in their current negotiations with striking writers and actors. One point of contention in that dispute concerns whether studios will be able to use AI for various purposes — from using “digital replicas” of performers to having generative AI write scripts that writers would then adapt for a movie. However, if such works are to be given little or no copyright protection (as appears to be the case), the studios may be more willing to give up AI capabilities as a bargaining chip in negotiations. While this case did not test the Office’s positions, it is likely that such a case, forcing a court to weigh human direction against AI generation, is just around the corner.
More from Artificial Intelligence
The office recently issued guidance on the copyrightability of works created with the assistance of AI, which attorneys said introduced additional murkiness in the AI authorship debate. „When registering a work containing AI-generated material, creators must clarify the use of AI in the registration request. This disclosure helps the Copyright Office to assess the human author’s contribution to the work.” Arguably, the type of transparency that is useful is one that allows copyright holders to access datasets in order to exercise their opt-outs. It is unclear how the present text would enable that, since it imposes a requirement that cannot be met in practice. Furthermore, generative AI providers should be incentivized to collaborate with copyright holders in this process, e.g. for the development of workable standards to make effective the reservation of rights. From that perspective, it could be useful to frame the newly proposed obligation as one of good faith or best efforts to document and provide information on how the provider deals with copyrighted training data.
The fact is that generative AI tools such as ChatGPT, Stable Diffusion, MidJourney, or DALL-E must have access to millions (or even billions) of text and images in order for them to learn and generate content simply through a text-based prompt. “The Office’s conclusion that copyright law does not protect non-human creators was a sound and reasoned interpretation of the applicable law,” the agency wrote in its cross-motion for summary judgment, which Howell granted Friday. Register of Copyrights Shira Perlmutter, who leads the office, defended the office’s decision to reject Thaler’s application on the grounds that it wasn’t created by a human.
Founder of the DevEducation project
A prolific businessman and investor, and the founder of several large companies in Israel, the USA and the UAE, Yakov’s corporation comprises over 2,000 employees all over the world. He graduated from the University of Oxford in the UK and Technion in Israel, before moving on to study complex systems science at NECSI in the USA. Yakov has a Masters in Software Development.
In August 2023, a judge in the US District Court for the District of Columbia sided with the agency against computer scientist Stephen Thaler, who was seeking copyright protection for an image created by AI software. At the time, Thaler’s attorney told Bloomberg Law that they intended to appeal the case. While copyright law tends to favor an all-or-nothing approach, scholars at Harvard Law School have proposed new models of joint ownership that allow artists to gain some rights in outputs that resemble their works. Vendor and customer contracts can include AI-related language added to confidentiality provisions in order to bar receiving parties from inputting confidential information of the information-disclosing parties into text prompts of AI tools. Businesses should evaluate their transaction terms to write protections into contracts. As a starting point, they should demand terms of service from generative AI platforms that confirm proper licensure of the training data that feed their AI.
- Any works produced from unauthorized copying constitute copyright infringement and should be considered derivative works (as defined by the Copyright Act).
- Indeed, training datasets for generative AI are so vast that there’s a good chance you’re already in one (there’s even a website where you can check by uploading a picture or searching some text).
- The suit claims that these artists’ work was wrongfully used to train Stable Diffusion, and that the images generated in the style of those authors directly compete with their own work — an important point in the matter of fair use.
- So, while all these pending lawsuits and others continue to mount, the fair use doctrine’s place in the ongoing saga of the artificial intelligence industry is still very much up in the air.
Artists objected to the invention of photography, arguing that it would render the paintbrush obsolete. Orchestra conductors objected to the advent of recorded music, arguing that it would diminish the demand for live performances. Film studios objected to the development of home video technology on the grounds that it challenged their existing business model. Recorded music and home video unlocked a massive new source of revenue for musicians and filmmakers, and paved the way for today’s streaming economy. However, for publicly available web-based content, such as news feeds, copyright is a consideration, and there are long-standing best practices to address it that can be applied to generative AI.
The Office will use the record it assembles to advise Congress; inform its regulatory work; and offer information and resources to the public, courts, and other government entities considering these issues. Second and related to copyright, providers of generative AI models shall “document and make publicly available a summary of the use of training data protected under copyright law” (Article 28b–5a). This is the provision that most clearly aims at enabling opt-out under Article 4 CDSM Directive. The fourth approach is an offshoot from fair use maximalism that distinguishes between Input Works protected by copyright and not subject to license restrictions and Input Works protected by copyright but subject to license restrictions. Under this approach, GAIs would need to comply with the license restrictions of applicable works protected by copyright before using them as Input Works. For example, Wikipedia licenses the majority of its text to the public under two open-source license schemes.
The federal agency said the application lacked the necessary human authorship to qualify for copyright protection. A number of artists, for example, have publicly claimed that image-generating platforms like Midjourney and Stable Diffusion are plagiarizing their work. And in June, two lawsuits filed on behalf of a total of five authors – including the comedian Sarah Silverman – accused OpenAI and Meta of illegally using copyrighted book material to train LLMs. For most experts, the biggest questions concerning AI and copyright relate to the data used to train these models. Most systems are trained on huge amounts of content scraped from the web; be that text, code, or imagery.
The EU should enact a robust general transparency requirement for developers of generative AI models. Creators need to be able to understand whether their works are being used as training data and how, so that they can make an informed choice about whether to reserve the right for TDM or not. Broadly speaking, in light of the fact that these are legal questions that are just beginning to be debated, the best thing that marketers can do at the moment is to pay attention to the relevant cases. “For now, marketers working with [generative AI] would be foolish not to keep their ears to the ground on legal challenges,” says Mark Penn, president and managing partner of the Stagwell Group.
As for consent, some have called for an opt-out system where creators could have their works removed from the training data, or the deployment of a “do not train” tag similar to the robots.txt “do not crawl” tag. As we explain above, under the view that training data is generally a fair use, this is not required by copyright law. But the views that using copyrighted training data without some sort of recognition of the original creator is unfair, which many hold, may support arguments for other regulatory or technical approaches that would encourage attribution and pathways for distributing new revenue streams to creators.
Mr. Soiffer represents clients in transactions ranging from bet-the-company transactions to day-to-day matters. He regularly negotiates and advises clients on a range of intellectual property licensing and other commercial transactions. Mr. Soiffer advises clients on developing term sheets and requests for proposal, analyzing competitive bids from a legal perspective and drafting and negotiating various intellectual property-related agreements. There are several approaches to navigating potential copyright infringement by Output Works which lie on the spectrum of what should be considered “fair use”. In addition, GAI providers/defendants may argue that there was no copying and that the AI system only “used” the Input Work. Because the use of an underlying work is not an exclusive right of a copyright holder, one may argue that no infringement by the GAI or the user occurred (assuming that no copying of Input Works, among other potential infringement, occurred).