
As a forensic musicologist, I’m asked about AI, of course. “How do artificial intelligence-based tools affect my process?” and “What do I think about training models on copyrighted musical works?” To the first question, yes, I’m using AI more and more, certainly not for assessing substantial similarity, but cool stuff like stem separation, for example — very helpful for analysis and demonstration.
The second question, bigger by far, is about the ethics and legality of training the tools, and to a great extent, it’s too complex and varied a question for a general response. For the most part, I’ve concentrated on copyright infringement as a function of outputs, not inputs, and in the cases of AI models, that goes to fair use. Musicologize, of course, is a music expert service, not a legal expert one, but forensic musicology is tangential to legal matters, so with that caveat, I’m happy to get into it. It’s fascinating. With virtually every well-known AI company involved in litigation right now, the legality of the training itself is obviously very much in question. Maybe copyright is a function of the inputs too. A couple of recent developments suggest my base positioning may change.
So let’s consider, first, a recent indication in the ongoing Thomson Reuters v. Ross Intelligence case. Somewhat ironically, the subject is the Westlaw platform, which serves courts and lawyers by providing legal resources like case law, legal journals, and citation analysis, some of which use AI. Thomson Reuters owns Westlaw and asserts Ross, which at least was an AI-powered legal research platform, trained on and infringed the Westlaw copyrights. Westlaw doesn’t just catalog the case law — that’s mostly public stuff — the entries contain “headnotes,” which are like summaries that someone had to formulate and author and so, to some extent, these are therefore protectable. Copying was inferred from the similarity of Ross’s memos to Westlaw’s headnotes. Copyright doesn’t protect facts, and the cases are what they are, so Ross argued scenes a faire, but that didn’t work. Westlaw’s headnotes are not just facts, they’re expressions of facts. And expression is exactly what copyright protects. Thus we have the first fair use decision in what is bound to be a very long series of fair use debates across the many (forty or so I think?) cases in litigation as I’m typing.
Fair use is a defense to infringement. Yeah, you used the material, but the law is okay with it.
Remember, the four pillars (or factors) of fair use in U.S. copyright law are:
- Purpose and Character of the Use
- Nature of the Copyrighted Work
- Amount and Substantiality of the Portion Used
- Effect on the Market Value of the original work
Big picture, looking across the many current cases, I’d say the AI companies are hanging their hats on the first one — “Is the training of the models transformative?” What do you think? An AI model isn’t a poem, script, book, musical work, etc. and unless it outputs something that looks too much like someone’s poem or other, the model itself is transformative. But, according to this decision, since Westlaw and Ross have similar purposes, and Ross used the Westlaw “headnotes as AI data to create a legal research tool to compete with Westlaw,” they’re competing. The AI model makers in all of these cases can be expected to explain that the training data converts to something granular and micro, usable only by the system, but here the court rejected that, distinguishing this from a case like Sega v. Accolade where copied computer codes were an example of “intermediate copying,” (yes, we made a copy of it for purposes of learning but then went on to make our own thing”) and were considered a functional necessity, as opposed to someone’s editorializing, and thus, not protectable, or at least not infringed upon. And again, copyright doesn’t protect facts, but in Thomson Reuters v. Ross Intelligence, the material was at least somewhat protectable and wasn’t functionally essential to building Ross’s platform. On the other hand, the “nature” of those summary headnotes was not very far removed from the underlying factual and public case law itself and possessed only a little creativity. And as for pillar #3, amount and substantiality, first we might think, “did they not use it all?” as we might presume, say, Chat GPT by now has used the entire internet? They possibly did, to train their model, but their “use” doesn’t involve presenting a lot of it in their own product. The “amount and substantiality” of what actually appears to their users isn’t very much. But that fourth pillar, “Effect on the Market Value of the Original Work?” Duh! They were going to affect Thompson Reuters’ market — they’re in the same business. And the decision includes that the court believes it’s this, “the single most important element of fair use,” that’s paramount, and not the champion I expect to lead the defenses of the AI defendants, the transformative first pillar.
I’m still not persuaded and this ain’t over, but I’m uncomfortable with how it’s going. Decisions lean on precedent and here the applicable precedent is somehow overcome. This court is apparently persuaded that the training might be infringing because it appropriates protectable content for a competing commercial product. Yeah, of course it does. But in the precedential tech cases I mentioned we had situations arguably akin to “Yes, we copied your code so we could figure out what you were doing, and after we learned, then we did our own thing.” The possible substantial similarity within headnotes aside, more generally speaking, as a musician, that’s how we learn! I’ve transcribed elements from works I admired so I could learn the musical devices, precisely intending to apply that to my own things. There’s no better way. When I’m in museums, I see artists at easels copying the hanging pieces. Questions of whether the art is public domain separate, for now, they aren’t stealing, they’re learning. That’s what the AI developers are going to say the models are doing. That’s more “intermediate copying.” And it didn’t occur to me, but Aaron Moss, who writes an excellent blog called Copyright Lately, thoughtfully tied in Chapman v. Minaj as an example of musical intermediate copying as fair in his more timely response to Thomson Reuters v. Ross Intelligence.
That last pillar, not the first, may be the stern that steers the boat. The high profile 2023 Warhol v. Goldsmith case belongs in this discussion, putting a ding in the first pillar in time to be a pain in the AI companies collective butts and doing Ross no favors. “Transformativeness” it argues shouldn’t save you if your product is in the same market and substitutes for—or exploits the same commercial opportunities as—the original. That court explained that a new work’s differences or new insights shouldn’t mean it can compete for the same licensing market as the original did.
“Although new expression, meaning, or message is relevant to the first factor, that inquiry does not focus on the intent behind or the meaning of the copied work, nor does it override the other statutory factors, including whether the use is of a commercial nature and supersedes the market for the original.”
— Warhol v. Goldsmith, 598 U.S. ___, slip op. at 22 (2023).
Consequently, Warhol underscored that a work’s transformative character may be outweighed if the new work occupies the same market space as the original, effectively “usurping” the original’s economic opportunity. This means that even a “highly creative” adaptation can fail as a fair use if it intrudes too deeply into the original author’s ability to monetize their work. Or, even if the Westlaw headnotes are not substantially replicated, the court cares more that Ross’s product could usurp Westlaw’s.
If I train my AI on Bruno Mars’ tracks with the goal of turning his fans to my AI’s outputs instead of his, yes, that might be wrong. Is it less wrong if you’re just building a genre-centric music writing platform, that would be much closer to the tech case examples. Do you then turn the market toward your output instead of that of countless human future artists? If an AI platform replaces a composer or session musician, that might create “market displacement,” but I’ve been using virtual musicians for decades. Is that wrong? There’s a distinction between morally “wrong” and “infringing” of course.
Things seem to be moving awfully quickly these days. Warhol and now Thomson Reuters v. Ross give fresh judicial weight to that fourth market-harm factor. If you satisfy pillar #1, should pillar #4 still be able to trip you up? Transformative technology is progress, capitalism, apple pie and such, isn’t it?