Perplexity has been ignoring the robots.txt code that explicitly asks web crawlers not to scrape the page. Srinivas responded in Fast Company that actually, Perplexity wasn’t ignoring robots.txt; it was just using third-party scrapers that ignored it. Srinivas declined to name the third-party scraper and didn’t commit to asking that crawler to stop violating robots.txt.
…To add insult to injury, Perplexity plagiarized Wired’s article about it — even though Wired explicitly blocks Perplexity in its text file. The bulk of Wired’s article about the plagiarism is about legal remedies, but I’m interested in what’s going on here with robots.txt. It’s a good-faith agreement that has held up for decades now, and it’s falling apart thanks to unscrupulous AI companies — that’s right, Perplexity isn’t the only one — hoovering up just about anything that’s available in order to train their bullshit models. And remember how Srinivas said he was committed to “factfulness?” I’m not sure that’s true, either: Perplexity is now surfacing AI-generated results and actual misinformation, Forbes reports.
Read more:
Lopatto, E. (2024, June 27). Perplexity’s grand theft AI. The Verge. https://www.theverge.com/2024/6/27/24187405/perplexity-ai-twitter-lie-plagiarism