

I agree with the ethical standpoint of banning Generative AI on the grounds that it’s trained on stolen artist data, but I’m not sure how tenable “trained on stolen artist data” is as a technical definition of what is not acceptable.
For example, if a model were trained exclusively on licensed works and data, would this be permissible? Intuitively, I’d still consider that to be Generative AI (though this might be a moot point, because the one thing I agree with the tech giants on is that it’s impractical to train Generative AI systems on licensed data because of the gargantuan amounts of training data required)
Perhaps it’s foolish of me to even attempt to pin down definitions in this way, but given how tech oligarchs often use terms in slippery and misleading ways, I’ve found it useful to try pin terms down where possible




I see your point, but as you say, there would still be the tradeoff of missing more recent stuff. That might only involve missing a couple of years’ worth of stuff now, but AI isn’t going away any time soon, so it would mean that there’d be an increasing amount of human made music not being archived; One of the things I like about Anna’s archive is that they seem to look at this problem as a long term, informational infrastructure kind of way, so I imagine they wouldn’t be keen on stopping the archive at 2023.
It seems they’ve opted for a different tradeoff instead: lower popularity songs are archived at a lower bitrate, and even the higher popularity stuff has some compression. Some archives go for quality, and thus prioritise high quality FLACs, so Anna’s archive are aiming to fulfill a different niche. I can respect that.