According to an article published on BBC Science Focus, the model was trained using databases from the internet that included a massive 570 GB of data sourced from books, wikipedia, research articles, webtexts, websites and other forms of content and writing on the net. Approximately 300 billion words were fed into the system.
Being a large language system, the model works on probability as a result of which it is able to predict the next word or prompt in a sentence. This was made possible as the model underwent a supervised testing phase.
Read more:
Iyer, A. (2022, December 15). Behind ChatGPT’s Wisdom: 300 Bn Words, 570 GB Data. Analytics India Magazine. https://analyticsindiamag.com/behind-chatgpts-wisdom-300-bn-words-570-gb-data/