Review your content’s performance and reach.
Become your target audience’s go-to resource for today’s hottest topics.
Understand your clients’ strategies and the most pressing issues they are facing.
Keep a step ahead of your key competitors and benchmark against them.
add to folder:
Questions? Please contact customerse[email protected]
With a class action suit filed against Github over its Copilot code-writing AI trained on opensource code, the issue of whether machine learning on copyright-protected data constitutes infringement or falls under the fair use doctrine is as current as ever.
Machine learning algorithms create patterns based on sample input data to make predictions or decisions (almost) independently from a human programmer. Machine learning algorithms must process incredible amounts of training data for an accurate resulting AI model. For example, some of the most popular machine learning datasets are LAION-5B which consists of 5.85 billion image-text pairs (used by Stable Diffusion and Lensa App), and GPT-3, an autoregressive language model with 175 billion parameters (used by an AI chatbot ChatGPT).
Often, those massive datasets contain copyrighted materials – photos, paintings, books, or lines of code. Even more often, copyright owners have no idea about (let alone consent to) their material being used in machine learning.
Copying copyrighted works into machine-learning datasets implicates the reproduction right, and creating works based on the processed datasets implicates the right to create derivative works. Unless there is an applicable exception (such as the “fair use” doctrine), those are acts of copyright infringement.
No case law in the US yet would directly address the use of copyrighted materials in machine learning.
The plaintiffs in the recently filed Github Copilot lawsuit claim that using datasets consisting of open-source code for machine learning violates open-source licenses. The terms of the licenses often require that whoever uses the protected code when writing their own code must attribute the author of the underlying code and share the resulting code in a public repository for free. These principles are essential for the open-source community and software development, but let us get back to the topic.
Although GitHub has yet to file a response, it previously alleged that the use of open-source code in machine learning falls under the fair use doctrine. Although the open-source component of the lawsuit adds a layer of complexity, the core question we are faced with now is, “does machine learning have to respect copyright, or is it fair use?”
The purpose of the fair use doctrine is to balance the protections that copyright grants to its owners with the “greater social good” and to promote creativity, education, free speech, and research. It is an exception from copyright allowing the use of copyrighted materials without the owner’s consent for criticism, comment, news reporting, teaching, scholarship, or research (see §107 of the Copyright Act).
Fair use is a mixed question of law and fact, which means that the finding of whether something constitutes fair use is case-specific. There are no categories of presumptively fair use (see Campbell v. Acuff-Rose Music, Inc. 510 US 569).
In deciding fair use cases, courts must consider the following factors having equal weight: 
The finding of fair use is less likely if the use is commercial as opposed to not-for-profit. The use tends to be commercial if the purported infringer profits from exploiting the copyrighted material without paying the customary price to copyright owners. 
On the other hand, fair use is likely to apply if the use of copyrighted material: 
For example, copying computer code to determine technical compatibility and not running it for its functional purposes was found to be fair use (see Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992)).
An intriguing prong in the finding of fair use is the effect of the use on (1) the potential market for or (2) the value of the copyrighted work. With some companies already outsourcing their digital art needs to AI, it is not a long shot that art-generating AI will compete with human artists. With said competition, AI art may negatively affect the value of the very copyrighted art on which it trained. In a more distant future, one could draw parallels in other industries, like writing code or novels.
To conclude, AI projects are unlikely to fall under the fair use exception from copyright if: 
However, a court would have to consider all the factors of a given case to give any definitive answer. We cannot help but end on a good old “it depends” that lawyers are notorious for, but c’est la vie! 
Another issue to explore is copyright infringement liability attributed to a commercial project using machine learning datasets created by not-for-profit organizations for educational and research purposes. Come back soon for more articles!
add to folder:
If you would like to learn how Lexology can drive your content marketing strategy forward, please email [email protected].
© Copyright 2006 – 2022 Law Business Research