How much personal data ends up in AI conversations
Two large analyses of real chat logs found that hundreds of millions of people are sending these tools far more sensitive information than they realize, often inside requests that look harmless.
The volume is large and growing
OpenAI’s September 2025 economic research paper reports 700 million weekly users on ChatGPT, roughly one in ten adults on the planet. Together they send about 18 billion messages a week, around 29,000 every second. Personal messages have grown faster than work messages and now account for more than 70% of consumer ChatGPT traffic. [1]
Anthropic’s Economic Index, the most recent edition of which sampled one million anonymized Claude conversations from November 2025, paints a similar picture: roughly half of Claude.ai conversations are work, a third are personal, and the rest are coursework. [2]
What is in those messages
The clearest published number on this comes from the Trust No Bot study, presented at COLM 2024 by researchers from the University of Washington, the Allen Institute for AI, and Mila. They examined two large datasets of real conversations between people and chatbots:
“Over 70% of queries contain some kind of detected PII, and almost 15% mention a non-PII sensitive topic.” Mireshghallah et al., Trust No Bot, 2024 [3]
The same paper found that people’s names and organization names dominate. The first surprise is how easily a name reveals a lot. The second surprise is where the names show up:
“PII appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively).” Mireshghallah et al., Trust No Bot, 2024 [3]
Almost half of translation requests in their corpus carried personally identifiable information. One in six code edits did. Most users probably do not think of “translate this email” or “clean up this script” as a privacy event.
Story prompts and role play are a special case
Casual prompts like “write a story about a girl named Anna who works at...” look identical to real disclosures, both to humans and to automated detectors. The same study notes that storytelling and roleplay prompts contain large amounts of names, ages, locations, and family details, often plausibly real. [3]
Their conclusion is blunt: pattern matching detectors are the wrong tool for that category of risk. They recommend lighter, local detection paired with user judgement, rather than relying on a server filter to decide what counts as sensitive.
The providers know
You can read the providers’ own pipelines as a sanity check. OpenAI states that before any researcher inside the company can analyze chat samples, the messages run through what they call a “Privacy Filter”, an internal LLM tool that scrubs PII. [1] Anthropic’s Clio system, which produces the Economic Index, sets a minimum cluster size of 15 conversations and 5 unique users for any reported pattern, so that low frequency topics specific to individuals stay buried. [4]
Both safeguards exist because raw conversations carry more identifying detail than the companies are willing to expose, even to their own staff. The same conversations are stored, processed, and in some cases used as training data.
Why this matters even if you have nothing to hide
The question is not whether any single message would embarrass you. The question is what the accumulated record of two years of casual prompts looks like, and who, today or in the future, can read it.
If you used a chat assistant casually since 2023, your account history likely contains: your full name in many forms, the names of your spouse, children, parents, and coworkers, your employer, your home city, your medical concerns, your finances, your travel dates, and the contents of work documents you pasted in for help. None of that disappears when you close the tab. Some of it is plausibly retained for years.
Sources
- Chatterji, Cunningham, Deming, Hitzig, Ong, Shan, Wadman. How People Use ChatGPT. NBER Working Paper w34255, September 2025. PDF.
- Anthropic. Anthropic Economic Index, January 2026 Report. anthropic.com.
- Mireshghallah, Antoniak, More, Choi, Farnadi. Trust No Bot: Discovering Personal Disclosures in Human LLM Conversations in the Wild. COLM 2024. arXiv:2407.11438.
- Anthropic. Clio: Privacy preserving insights into real world AI use, December 2024. anthropic.com/research/clio.
More reading: what AI assistants do with your data after you press send, ways to share less personal data with AI assistants.