21 Million Copyrighted Songs Are Being Shared Among AI Developers, and Artists Are Demanding Answers
More than 21 million copyrighted music recordings are being shared among artificial intelligence developers through public datasets, according to a recent investigation, sparking a broader reckoning over who controls artists' work in the AI era. The collections include music from Taylor Swift, Bad Bunny, Billie Eilish, Nirvana, Pearl Jam, and the Beatles, alongside countless independent producers. This discovery arrives as major artists, including Grammy-winning singer SZA, are publicly calling out AI music companies for training models on their songs without permission or compensation.
How Are These Datasets Being Used to Train AI Music Models?
Four major datasets containing the copyrighted recordings are being actively shared among AI developers. Two of the datasets hold more than 100,000 recordings each, while the remaining two are considerably larger, containing roughly 9 million and 12 million tracks respectively. The largest collection, called LAION-DISCO-12M, was released in November 2024 by LAION, a German nonprofit that assembles open datasets for AI research. LAION explicitly warns against deploying the dataset commercially or using it in its original form to create finished products, yet the organization says it does not distribute the music directly; instead, it provides links to publicly available YouTube tracks and their associated metadata.
Google and Stability AI have reportedly used tracks from the Free Music Archive, one of the smaller collections, according to the investigation published by The Atlantic. Each dataset has been downloaded several thousand times, though the AI industry's practice of keeping training data confidential means it remains largely unknown which companies have relied on which specific collections.
Why Are Artists Like SZA Speaking Out Now?
SZA recently discovered that an AI music database listed 238 of her songs as material used to train AI models, including some she believes may be unreleased tracks. The "Snooze" singer called out musicians who support the practice and made clear she does not see this as innovation. For SZA, the issue extends far beyond technology; it centers on consent, ownership, and who profits when an artist's sound, style, and emotional fingerprint are fed into a machine.
SZA's frustration reflects a deeper historical concern for Black artists and creators. Black music has long powered global culture, from blues, jazz, and gospel to hip-hop, R&B, house, and Afrobeats, yet the people who create that sound have not always been protected, credited, or paid. Other artists are drawing similar lines. Kehlani has also spoken out against AI-generated R&B, while some producers, like Kato On The Track, discovered that their work was swept up without consent or compensation. Kato reported that 54 of his songs were used to train and sell generative AI models without permission, not including the 1,541 songs he is credited as a producer on for other artists.
What Legal Battles Are Unfolding Over AI Music Training?
The music industry is already fighting over AI in court. Major labels have sued AI music companies Suno and Udio, alleging that copyrighted songs were used without permission to train music-generating systems. Suno and Udio are now contending with at least 12 lawsuits, according to The Atlantic. The litigation first erupted in June 2024, when the Recording Industry Association of America, representing Sony, Warner, and UMG, sued both companies for what it described as mass copyright infringement.
Since then, the three major labels have pursued divergent strategies. UMG settled with Udio in October 2025, announcing a compensatory legal settlement alongside new recorded music and publishing licenses for a jointly developed AI platform expected to launch in 2026. Under that arrangement, Udio's service will operate within what Universal described as a "walled garden" with audio fingerprinting and content filtering in place. Warner reached its own settlement and licensing deal with Udio in November 2025 and, within days, became the first major label to reach a settlement with Suno as well. The Warner-Suno agreement also included Suno's acquisition of Songkick, the concert-discovery platform, from Warner. Sony, by contrast, has remained in active litigation against both companies.
What Are the Broader Implications for Artists and Creators?
The stakes extend beyond music generation itself. While artists are demanding consent and compensation on the creative side, media companies are simultaneously racing to embed AI into their advertising, streaming, and audience-growth strategies. Warner Bros. Discovery, for example, announced that it is developing agentic AI-powered advertising technology with Amazon Web Services, its preferred cloud provider. The company says the technology will help advertisers plan, activate, optimize, and measure campaigns across both linear and digital platforms.
These two tracks are not separate; they meet at the same place: profit. If AI can imitate the artists people love, generate the content people consume, and optimize the ads people see, then the next fight is not just about whether a fake song sounds real. It is about whether real people, especially Black creators, can maintain control over the culture they built.
Steps Artists and Advocates Are Taking to Protect Creative Rights
- Legal Action: Major record labels are pursuing litigation against AI music companies, with Sony continuing active lawsuits against both Suno and Udio while Warner and UMG have negotiated settlements that include licensing agreements and content filtering mechanisms.
- Public Advocacy: Artists like SZA, Kehlani, and producers like Kato On The Track are publicly calling out unauthorized use of their work, raising awareness about consent and compensation issues in the AI era.
- Licensing Negotiations: Some companies are moving toward "walled garden" models with audio fingerprinting and content filtering, creating frameworks where AI platforms operate under agreed-upon terms with rights holders.
- Dataset Transparency Demands: Creators and advocates are pushing for clearer disclosure about which songs are included in training datasets and how those datasets are being used commercially.
SZA's position is clear: if her music helped train a machine, she wants no part of celebrating it. The industry is now facing two AI conversations at once. On the creative side, artists are asking for consent, credit, and compensation. On the corporate side, media companies are racing to make AI part of their advertising, streaming, and audience-growth strategies. The technology may be new, but the concern is very old: whether Black creativity will be protected or extracted again.