Lots of people on Lemmy really dislike AI’s current implementations and use cases.
I’m trying to understand what people would want to be happening right now.
Destroy gen AI? Implement laws? Hoping all companies use it for altruistic purposes to help all of mankind?
Thanks for the discourse. Please keep it civil, but happy to be your punching bag.


They have to pay for every copyrighted material used in the entire models whenever the AI is queried.
They are only allowed to use data that people opt into providing.
There’s no way that’s even feasible. Instead, AI models trained on pubically available data should be considered part of the public domain. So, any images that anyone can go and look at without a barrier in the way, would be fair game, but the model would be owned by the public.
It’s totally feasible, just very expensive.
Either copyright doesn’t exist in its current form or AI companies don’t.
Its only not feasible because it would kill AIs.
Large models have to steal everything from everyone to be baseline viable
No, it’s not feasible because the models are already out there. The data has already been ingested and at this point it can’t be undone.
And you can’t exactly steal something that is infinitely reproducible and doesn’t destroy the original. I have a hard time condemning model creators of training their models on images of Mickey Mouse while I have a Plex server with the latest episodes of Andor on it. Once something is put on display in public the creator of it should just accept that they have given up their total control of it.
Ah yes the “its better to beg forgiveness than to ask permission” argument.
Oh no… Anyway
Public Domain does not mean being able to see something without a barrier in the way. The vast majority of text and media you can consume for free on the Internet is not in the Public Domain.
Instead, “Public Domain” means that 1) the creator has explicitly released it into the Public Domain, or 2) the work’s copyright has expired, which in turn then means that anyone is from that point on entitled to use that work for any purpose.
All the major AI models scarfed up works without concern for copyrights, licenses, permissions, etc. For great profit. In some cases, like at least Meta, they knowingly used known collections of pirated works to do so.
I am aware and I don’t expect that everything on the internet is public domain… I think the models built off of works displayed to the public should be automatically part of the public domain.
The models are not creating copies of the works they are trained on any more than I am creating a copy of a sculpture I see in a park when I study it. You can’t open the model up and pull out images of everything that it was trained on. The models aren’t ‘stealing’ the works that they use for training data, and you are correct that the works were used without concern for copyright (because the works aren’t being copied through training), licenses (because a provision such as ‘you can’t use this work to influence your ability to create something with any similar elements’ isn’t really an enforceable provision in a license), or permission (because when you put something out for the public to view it’s hard to argue that people need permission to view it).
Using illegal sources is illegal, and I’m sure if it can be proven in court then Meta will gladly accept a few hundred thousand dollar fine… before they appeal it.
Putting massive restrictions on AI model creation is only going to make it so that the most wealthy and powerful corporations will have AI models. The best we can do is to fight to keep AI models in the public domain by default. The salt has already been spilled and wishing that it hadn’t isn’t going to change things.
I don’t have much technical knowledge of AI since I avoid it as much as I can, but I imagined that it would make sense to store the training data. It seems that it is beneficial to do so after all, so I presume that it’s done frequently: https://ai.stackexchange.com/questions/7739/what-happens-to-the-training-data-after-your-machine-learning-model-has-been-tra
My understanding is also that generative AI often produces plagiarized material. Here’s one academic study demonstrating this: https://www.psu.edu/news/research/story/beyond-memorization-text-generators-may-plagiarize-beyond-copy-and-paste
Finally, I think that whether putting massive restrictions on AI model creation would benefit wealthy corporations is very debatable. Generative AI is causing untold damage to many aspects of life, so it certainly deserves to be tightly controlled. However, I realize that it won’t happen. Just like climate change, it’s a collective action problem, meaning that nothing that would cause significant impact will be done until it’s way too late.
What about models folks run at home?
Careful, that might require a nuanced discussion that reveals the inherent evil of capitalism and neoliberalism. Better off just ensuring that wealthy corporations can monopolize the technology and abuse artists by paying them next-to-nothing for their stolen work rather than nothing at all.
I think if you’re not making money off the model and its content, then you’re good.
I would make a case for creation of datasets by a international institution like the UNESCO. The used data would be representative for world culture, and creation of the datasets would have to be sponsored by whoever wants to create models out of it, so that licencing fees can be paid to creators. If you wanted to make your mark on global culture, you would have an incentive to offer training data to UNESCO.
I know, that would be idealistic and fair to everyone. No way this would fly in our age.
This definitely relates to moral concerns. Are there other examples like this of a company that is allowed to profit off of other people’s content without paying or citing them?
Hollywood, from the very start. Its why it is across the US from New York, to get outside of the legal reach of Broadway show companies they stole from.
Stack overflow
Reddit.
Google.
So likely nothing gonna happen it seems. Business as usual.