Posted On 25 April 2023
I keep thinking about the arguments around content being used for AI data sets and the arguments around content being archived/offered by sites like Internet Archive.
They don’t seem consistent, on either side. Corporations are happy to use data sets scraped from copyrighted content, but they surely don’t want their copyrighted content slurped up into data sets without compensation.
On the flip side, a lot of the folks who (IMO rightly) support Internet Archive don’t want corporations to flex copyright against IA on the basis of IA being a public good.
I’m not sure how you get one without the other, though. Either it’s OK for IA and corporations, and we need to re-think copyright rules in light of technology as it exists today – or it’s not OK for either of them to operate without explicit consent.
For the record: I’m on the side of we need to re-think the rules, but I’m neither optimistic about that happening nor the outcome given the regulatory capture the entertainment industry has on government… At a minimum, we need shorter copyright terms, better rules around abandoned works, and clarity around fair use and technological shift of content that benefits consumers and individual creators.