Though OpenAI is acknowledging that it scrapes the internet for training its large language models like GPT-4, this still looks like a half-baked approach to address the ethical dilemmas around copying data from other people’s websites.
People on HackerNews discuss the ethics of the release of this web crawler for training AI models. “OpenAI isn’t even citing in moderation. It’s making a derivative work without citing, thus obscuring it,” said one of the users. Moreover, OpenAI does not acknowledge the websites it has already used to build its models.
Read more:
Pandey, M. (2023, August 7). OpenAI Now Crawls the Internet with GPTBot. Analytics India Magazine. https://analyticsindiamag.com/openai-now-crawls-the-internet-with-gptbot/