Logo
FrontierNews.ai

OpenAI's ChatGPT Faces Canadian Privacy Crackdown: What the Regulators Found

Canadian privacy regulators have determined that OpenAI violated data protection laws by collecting personal information from public websites and user conversations without proper consent to train ChatGPT's GPT-3.5 and GPT-4 models. A joint investigation by Canada's federal and provincial privacy commissioners found that the company's data practices did not comply with applicable private-sector privacy laws, marking a significant regulatory challenge for one of the world's most widely used AI systems.

What Exactly Did Canadian Regulators Find Non-Compliant?

The investigation, conducted by the Office of the Privacy Commissioner of Canada, the Commission d'accès à l'information du Québec, and the privacy commissioners of British Columbia and Alberta, examined how OpenAI handled personal information across three main areas: data collected from publicly accessible websites, licensed third-party datasets used for training, and user interactions with ChatGPT.

The regulators accepted that OpenAI's overall purpose for developing and deploying ChatGPT was legitimate. However, they identified several critical compliance failures. The most significant finding concerned the initial collection of personal information from publicly accessible websites and licensed third-party sources. Regulators determined this data collection was overbroad, meaning OpenAI gathered far more personal information than necessary, given the scale, sensitivity, and potential inaccuracy of the data involved, as well as the limited safeguards in place at the time.

How Did OpenAI Fail to Obtain Proper Consent?

One of the core violations centered on consent. OpenAI relied on implied consent, assuming that people who posted information online implicitly agreed to have it scraped and used for AI model training. Canadian regulators rejected this approach entirely.

The regulators explained that implied consent was insufficient because the data could include sensitive personal information, and individuals would not reasonably have expected their publicly posted information to be scraped and used for AI model training. This reasoning applies to both the initial training data collection and the use of user conversations with ChatGPT. For user interactions specifically, regulators concluded that express consent, meaning explicit, informed agreement, should have been obtained before using chat data to improve the models.

The findings highlight a fundamental mismatch between how AI companies have historically approached data collection and what privacy regulators now expect. When users chat with ChatGPT, many do not understand that their conversations could be used to train future versions of the model or reviewed by human trainers. Regulators found that OpenAI's safeguards at the time were not strong enough to prevent sensitive personal information from being included in training data.

Steps OpenAI Should Take to Comply With Privacy Laws

  • Obtain Express Consent: Collect explicit, informed permission from users before using their chat data for model training or having human trainers review conversations.
  • Strengthen Data Safeguards: Implement more robust technical and procedural measures to prevent sensitive personal information from being included in training datasets, covering broader categories of protected information.
  • Disclose Information Practices Clearly: Provide transparent explanations of how personal information will be collected, used, and disclosed, especially when outputs from ChatGPT might reveal sensitive information about individuals.
  • Limit Data Collection Scope: Reduce the breadth of personal information collected from public sources to only what is necessary for legitimate model development purposes.
  • Establish User Controls: Allow users to opt out of having their conversations used for model improvement and provide mechanisms to access or correct personal information.

The regulators also found that OpenAI should have obtained express consent for certain disclosures of personal information through ChatGPT outputs, particularly when the information was sensitive or fell outside what individuals would reasonably expect. While OpenAI had introduced measures to reduce the risk of sensitive disclosures, those measures covered a narrower set of information than the broader categories of personal information protected under Canadian privacy laws.

This investigation represents one of the most detailed regulatory examinations of how large language models are trained and deployed. The findings suggest that the approach many AI companies have taken, treating publicly available data as fair game for training without explicit consent, no longer aligns with how privacy regulators interpret data protection laws. The case also underscores the tension between the data requirements of modern AI development and individual privacy rights in an era when personal information is routinely shared online.

OpenAI's situation in Canada may foreshadow similar regulatory actions in other jurisdictions. Privacy regulators in the European Union, the United Kingdom, and other regions are increasingly scrutinizing how AI companies collect and use personal data. The Canadian findings provide a roadmap for what regulators consider non-compliant practices and what companies must do to bring their operations into alignment with privacy laws.