US Copyright Office: Generative AI Training [pdf]

copyright.gov

59 points by dave1629 3 days ago


dave1629 - 3 days ago

From the Conclusion: "In applying current law, we conclude that several stages in the development of generative AI involve using copyrighted works in ways that implicate the owners’ exclusive rights. The key question, as most commenters agreed, is whether those acts of prima facie infringement can be excused as fair use. ... But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries. ... These groundbreaking technologies should benefit both the innovators who design them and the creators whose content fuels them, as well as the general public."

momothereal - 2 days ago

The head of the US Copyright Office has since been fired: https://www.cbsnews.com/news/trump-fires-director-of-u-s-cop...

kelseyfrog - 2 days ago

Footnote one is where the whole thing goes off the rails. The Copyright Office asserts that the works in question are not merely "data" in the ordinary sense, but somehow "embody creative expression" in a way that constitutes protected authorship.

This is metaphysics, not law or computer science.

They're smuggling in a kind of authorial transubstantiation, as if creative essence somehow imbues the bits themselves, rendering them qualitatively different from any other arrangement of bytes. The implication is that once a work has passed through the sacrament of human intention, it permanently carries a kind of spiritual copyright residue, regardless of its subsequent transformation or use.

But that's not how data works. A copy of a copyrighted work in a training corpus is still just data. It doesn't emit rights. It's not radioactive. There's no Platonic form of "authorship" that permeates the latent space. What matters, legally, and practically, is what the system does with that data, not some mystical essence the data supposedly contains.

This is authorial essentialism dressed up as policy. And it doesn’t hold up under inspection.

adt - 2 days ago

Part 1 (replicas) https://copyright.gov/ai/Copyright-and-Artificial-Intelligen...

Part 2 (copyrightability) https://copyright.gov/ai/Copyright-and-Artificial-Intelligen...

Part 3 (GenAI training) https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

Analysis in previous and upcoming editions of The Memo: https://lifearchitect.ai/memo/

michael-sumner - 2 days ago

We wrote a summary of it here for busy folks https://x.com/scoredetect/status/1921883329772548365

rahimnathwani - 2 days ago

[flagged]