The Copyright Office’s recent pre-publication report on generative AI training has sparked a flurry of headlines.1 Yet many of these reactions miss the forest for the trees.2 The true and alarming significance of the report’s backward-looking interpretation of copyright law — which effectively forecloses the use of vast swathes of copyrighted material for AI model training — is that it signals a dangerous trajectory absent drastic action to realign copyright with its constitutional foundation. If the United States is to achieve and maintain AI dominance, it must first establish data dominance.3 The status quo will not only stifle American innovation; it will also cede critical advantages to our strategic adversaries in the global race to develop and deploy leading AI models.
Absent intervention, this report will be remembered as a moment when the legal system began to actively undermine AI progress by restricting access to the high-quality training data it needs.4 One of the American legal system’s greatest strengths, its commitment to incrementalism and precedent, also harbors its greatest weakness in times of rapid technological upheaval. A “bend, but don’t break” philosophy is typically conducive to the stability that aligns with the rule of law. However, when a body of law has been bending for centuries, as is the case with copyright, it eventually stretches to its breaking point, becoming dangerously disconnected from its core constitutional mandate.5
The U.S. Constitution, within the Intellectual Property Clause of Article I, Section 8, Clause 8, grants Congress the power “to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”6 This power is not absolute; it is a means to the end of scientific and artistic progress. As recognized by the House Report accompanying the Copyright Act of 1909, Congress exceeds its constitutional authority when its chosen mechanisms for promoting progress cease to serve that purpose.7 “The enactment of copyright legislation by Congress . . . [is] based upon the ground that the welfare of the public will be served and progress of science and useful arts will promoted by securing to authors for limited periods the exclusive rights to their writings.”8 The report further clarified that copyright is “[n]ot primarily for the benefit of the author, but primarily for the benefit of the public.”9 We find ourselves at precisely such a juncture today, where an ossified copyright regime threatens to impede, rather than catalyze, a technological revolution.
Should the Copyright Office’s restrictive guidance on AI training data find its way into judicial precedent through one or more of the pending high-stakes lawsuits, such as The New York Times v. OpenAI, advocates for American AI dominance will face an uphill battle for the foreseeable future.10 A handful of judges, perhaps swayed by a century of expanding copyright protectionism, could impose severe limitations on the ability of leading American AI labs to continue their crucial work, effectively kneecapping their efforts to keep pace with strategic competitors like China. This is not merely about abstract legal theory; this is about maintaining our technological edge in areas critical to national security, from advanced autonomous defense systems to AI-driven intelligence analysis and cybersecurity.
An alternative path is available, but its viability demands a return to constitutional first principles and an embrace of national data dominance. Fidelity to the Constitution’s directive that intellectual property legislation must advance “progress of science and useful arts” requires far more than passively hoping a few maverick judges will ignore the Copyright Office’s misguided report and years of calcified precedent. The stakes are too high, the timeline too compressed. The Founders, in their wisdom, intentionally crafted the IP clause with the explicit aim of encouraging innovation and the wide dissemination of knowledge, not to erect barriers that would impede technological advances.11
It is time for a frank admission: copyright law, in its current hypertrophied state, is broken.12 It has evolved into a system that predominantly furthers the entrenched interests of entertainment conglomerates, a handful of superstar authors, and a narrow band of legacy rights holders, often at the expense of broader innovative potential and the public good.13 More critically, strict adherence to common, expansive interpretations of its provisions means that the “data scarcity” already lamented by leading AI labs like OpenAI as well as smaller startups and research institutions will only become more acute.14 This is unacceptable. In the global contest to develop the most advanced AI, second place is not an option. Second place means diminished economic competitiveness, a weakened national security posture, and a future where we are reactive rather than proactive.
In the extreme, coming in second place means losing wars. It means a perpetually struggling economy and the sort of resource scarcity that historically fosters political instability and chaos.
The United States will not achieve the necessary data dominance under a copyright regime defined by ambiguity and or one that is prone to interpretations that favor stasis.15 The gray area that enshrouds interpretations of fair use practically guarantees a protracted, years-long legal quagmire as labs and rights-holders battle to clarify the circumstances under which copyrighted data can be used for AI training.16 This is a luxury we cannot afford when competitors are moving at lightning speed.
It is a stark reality that China does not operate under the same sort of self-imposed legal constraints that we have allowed to encumber our own innovators.17 While this is far from an endorsement of the Chinese Communist Party’s often invasive and ethically dubious data collection schemes, it serves as a pragmatic reminder that Beijing will not permit its AI ambitions to be starved of their necessary data. Their national champions are fueled by data, and they are unencumbered by a legal system that prioritizes legacy IP claims over strategic technological advancement in the same way ours currently does.18
Some legal scholars will recognize this reality, but they will propose what they see as less disruptive solutions. It is worthwhile to examine a few alternative proposals.
There exists an argument that protection of copyright law should result in a mandated licensing for all AI training data from major data providers. This approach, while seemingly orderly, would be the death knell for a vibrant AI startup ecosystem. It would concentrate power in the hands of a few, large data custodians, likely the same entities that most benefit from the current restrictive copyright paradigm.19 Operating on lean budgets and tight timelines, startups cannot afford to navigate complex, expensive licensing negotiations for the petabytes of data needed to train models.
These startups may well hold the key to the next AI breakthrough. They are the startups working on narrow AI systems with direct applications for our military, the ones that could unlock AI-driven solutions for personalized medicine, critical infrastructure resilience, or material science. A licensing-only regime risks creating barriers to entry, stifling competition and innovation before it can even begin.
Others may advocate simply tinkering around the edges of existing copyright law, for example, by adding a fifth factor to the fair use exception specifically related to AI training. While less radical, this kind of approach is insufficient to meet the urgency of the challenge. Adding another factor to the already unpredictable fair use test might marginally reduce litigation in some specific instances, but it would hardly eliminate the legal ambiguity.
Instead, Congress should move swiftly to establish a clear, broad, and unambiguous safe harbor for the use of copyrighted materials for the specific, non-consumptive purpose of training AI models. This is not about permitting AI systems to generate and disseminate copies of protected works; robust protections against such outputs must remain. This is about recognizing that the act of learning from data — the process of identifying patterns, correlations, and statistical relationships within datasets — is a transformative use that does not, and should not, typically implicate the core exclusive rights that copyright law is meant to protect.
To combat data scarcity and ensure American researchers and developers have access to the scale and diversity of data required for ongoing AI progress, the federal government should spearhead a “Data for AI” initiative. This would be akin to a national infrastructure project for the digital age. It could involve several components: curating and making accessible vast amounts of publicly funded research data; developing robust frameworks for anonymizing and aggregating sensitive government datasets for AI training in a privacy-preserving manner; and potentially creating incentives for private sector entities to pool or contribute non-strategic data for broader public benefit and pre-competitive research.
The path to AI dominance is paved with data. By embracing these kinds of bold, decisive reforms, we can shed the shackles of an outdated copyright paradigm, unleash the full innovative potential of American ingenuity, and secure our leadership in the defining technology of the 21st century. The time for incremental half-measures is over. The moment for visionary action is now.
Our national security and future prosperity depend on it.
- “Copyright and Artificial Intelligence Part 3: Generative AI Training,” United States Copyright Office, May 2025. ↩︎
- Ina Fried, “U.S. Copyright Office’s AI Report Sparks New Fight,” Axios, May 13, 2025. ↩︎
- Alexander Wang, “Converting Energy into Intelligence: The Future of AI Technology, Human Discovery, and American Global Competitiveness,” hearing, U.S. House of Representatives Energy and Commerce Committee, April 9, 2025. ↩︎
- Joe McKendrick, “If AI is the ‘gas Guzzler’ of data, how do we get better mileage?” ZDNET, July 3, 2024. ↩︎
- Sean M. O’Connor, “The Overlooked French Influence on the Intellectual Property Clause,” University of Chicago Law Review 82, no. 2 (2015); Dotan Oliar, “Making Sense of the Intellectual Property Clause: Promotion of Progress as a Limitation on Congress’s Intellectual Property Power,” Georgetown Law Journal 94 (2006): 1,771. ↩︎
- U.S. Constitution, art. I, sec. 8, cl. 8. ↩︎
- “The House Report 1 on the Copyright Act of 1909,” U.S. House of Representatives, 1909. ↩︎
- Ibid. ↩︎
- Ibid. ↩︎
- New York Times v. OpenAI, no. 1:23-cv-11195 (S.D.N.Y. 2025). ↩︎
- Kevin Frazier, “Progress Interrupted: The Constitutional Crisis in Copyright Law,” Harvard Journal of Law and Technology, March 13, 2025. ↩︎
- Bradley E. Abruzzi, “Copyright and the Vagueness Doctrine,” University of Michigan Journal of Legal Reform no. 45 (2012). ↩︎
- Cory Doctorow, “In Serving Big Company Interests, Copyright Is in Crisis,” Electronic Frontier Foundation, January 21, 2020. ↩︎
- Christopher Lehane, “Open Letter to the Office of Science and Technology Policy: Request for Information on AI Action Plan,” OpenAI, March 13, 2025; Isabelle Bousquette, “AI Startups Have Tons of Cash, but Not Enough Data. That’s a Problem,” Wall Street Journal, June 15, 2023. ↩︎
- Stephen Yelderman, “The Supreme Court’s Fragile Copyright Law,” 50 Fla. St. U. L. Rev. 335 (2023); Joshua Levine, “Don’t Let Copyright Kill American AI,” The Republic, November 25, 2024. ↩︎
- William W. Fisher III, “Reconstructing the Fair Use Doctrine,” 101 Harvard Law Review (1988); R. Polk Wagner, “The Perfect Storm: Intellectual Property and Public Values,” 74 Fordham L. Rev. (2005). ↩︎
- Gordan Gao and Yao Xiaoyi, “Navigating Copyright Challenges in AI Model Training: A Cross-Border Perspective,” King & Wood Mallesons, March 24, 2025. ↩︎
- Lehane, “Request for Information on AI Action Plan.” ↩︎
- Bill Rosenblatt, “The Media Industry’s Race To License Content For AI,” Forbes, July 18, 2024. ↩︎

