These large language models’ power comes from troves of publicly available human-created text that has been hoovered from the internet. It got me thinking: What data do these models have on me? And how could it be misused?…
Large language models (LLMs), such as OpenAI’s GPT-3, Google’s LaMDA, and Meta’s OPT-175B, are red hot in AI research, and they are becoming an increasingly integral part of the internet’s plumbing. LLMs are being used to power chatbots that help with customer service, to create more powerful online search, and to help software developers write code.
If you’ve posted anything even remotely personal in English on the internet, chances are your data might be part of some of the world’s most popular LLMs.
Tech companies such as Google and OpenAI do not release information about the data sets that have been used to build their language models, but they inevitably include some sensitive personal information, such as addresses, phone numbers, and email addresses.
Read more:
Heikkilä, M. (2022, August 31). What does GPT-3 “know” about me? MIT Technology Review. https://www.technologyreview.com/2022/08/31/1058800/what-does-gpt-3-know-about-me/