When OpenAI released GPT-3 in July 2020, it offered a glimpse of the data used to train the large language model. Millions of pages scraped from the web, Reddit posts, books, and more are used to create the generative text system, according to a technical paper. Scooped up in this data is some of the personal information you share about yourself online. This data is now getting OpenAI into trouble.