It may be beyond the scope of copyright law to address the harms being done to authors by generative AI, and the point remains that AI-training practices are secretive and fundamentally nonconsensual. Very few people understand exactly how these programs are developed, even as such initiatives threaten to upend the world as we know it. Books are stored in Books3 as large, unlabeled blocks of text. To identify their authors and titles, I extracted ISBNs from these blocks of text and looked them up in a book database. Of the 191,000 titles I identified, 183,000 have associated author information. You can use the search tool below to look up authors in this subset and see which of their titles are included.
Reisner, A. (2023, September 25). These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech. The Atlantic. https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/