Psst! Your AI chatbot is feeding on your conversations
Anthropic is the latest chatbot to begin training its AIs on users' conversations, but turning such sensitive personal information into AI development fuel raises big concerns
Many AI chatbots use the conversations you have with them as training data to keep developing their underlying generative AI models – but individual users may not realise their discussions with the tools aren’t one-to-one private chats by default.Â
Last week, AI company Anthropic announced it would start training several versions of its AI model – Claude – based on chats it has with users, unless they opted out. Anthropic’s stated aim is to improve the model, including making it better at detecting illegal or harmful content.
But while Anthropic’s move generated headlines, the practice is already widespread among some of the most popular AI chatbots.
Social media giant Meta, now a major AI developer, announced back in April that it would start training its AI based on users interactions. But that wasn’t all: Meta also said it would feed its AI with publicly available personal information of Facebook and Instagram users, unless they opted out.
The company’s data grab faced huge criticism over privacy implications – after all, Meta is an advertising giant that makes money by micro-targeting ads to capture individuals’ attention. Yet the Irish data regulator, which oversees Meta in the EU, cleared the update in May.
Still, perhaps the most widely used chatbot, OpenAI’s ChatGPT, also only lets users opt out of their chats being used as training fodder. By default, it feeds conversations back into further training of the model.
Google’s Gemini is no different – promising more personalised replies by letting its AI mine previous discussions. Here again, users can refuse, but data-for-training is switched on by default.
Concern over access to intimate and sensitive chats
One of the main concerns about turning people’s chats into AI training data is that many individuals share a lot of personal and even sensitive information with chatbots. Some users even become emotionally attached to chatbots, as we’ve written before.
There are also many people using AI chatbots as a therapist. OpenAI CEO Sam Altman has raised concerns about this, reminding people that their chats with the company’s bot are not protected by patient-doctor confidentiality (as would be the case with a human psychologist).
Beyond safety concerns over people’s sensitive personal information leaking out (or being otherwise misused), human-to-bot chats being mined for AI training raises legal privacy questions in Europe.
Under EU law, any company that processes personal data must have a valid legal basis to do so, including generative AI companies. Most AI chatbot developers now claim a “legitimate interest” as their legal basis to use personal data, arguing that AI models need vast amounts of data to function and in order to improve in quality.
While Meta’s AI training on personal data culled from its social media empire has been highly criticised, attempts to stop the tech giant on EU privacy law grounds have so far been unsuccessful.
AI is also expected to become increasingly embedded in our daily lives, as leading companies look to develop “AI agents” – an apparently even more advanced flavour of GenAI that’s designed to offer personalised assistance at the touch of a button.
The catch? AI agents will need access to vast troves of personal data too.
(nl, aw)