← Back to All Frameworks

Canada Health Data De-identification Framework

PIPEDA and Provincial Health Privacy Legislation

Overview

Canada has a complex, multi-layered approach to health data privacy and de-identification that combines federal legislation, provincial laws, and sector-specific guidelines. This creates a comprehensive but sometimes fragmented framework for protecting health information while allowing for its use in research and analysis.

Unlike countries with a single national health data privacy law, Canada's framework reflects its constitutional division of powers, with healthcare primarily falling under provincial jurisdiction while the federal government maintains certain regulatory roles in privacy and data protection.

The Office of the Privacy Commissioner of Canada (OPC) provides oversight at the federal level, while provincial privacy commissioners or specialized health information privacy offices oversee provincial frameworks.

Key Regulatory Bodies in Canada

  • Office of the Privacy Commissioner of Canada (OPC) - Federal oversight for PIPEDA and the Privacy Act
  • Canadian Institute for Health Information (CIHI) - National, independent organization that provides health information standards and collects healthcare data
  • Provincial Information and Privacy Commissioners - Each province has its own commissioner or equivalent office
  • Health Canada - Federal department responsible for national health policy
  • Statistics Canada - National statistical office that collects and analyzes health data
  • Pan-Canadian Health Data Strategy Expert Advisory Group - Established in 2021 to advise on modernizing health data collection and use
"The health information custodian shall de-identify the personal health information before disclosing it... Information is de-identified if it does not identify an individual, and it is not reasonably foreseeable in the circumstances that the information could be utilized, either alone or with other information, to identify an individual."
- Personal Health Information Protection Act (Ontario)

Legal Framework

Canada's health data privacy framework operates at multiple levels:

Federal Level

Example: PIPEDA's 10 Fair Information Principles

PIPEDA is built around 10 fair information principles that apply to health data:

  1. Accountability: Organizations are responsible for personal information under their control
  2. Identifying Purposes: Purposes for collection must be identified at or before collection
  3. Consent: Knowledge and consent are required for collection, use, or disclosure
  4. Limiting Collection: Collection must be limited to what's necessary for identified purposes
  5. Limiting Use, Disclosure, and Retention: Information should not be used for purposes other than those for which it was collected
  6. Accuracy: Personal information must be accurate, complete, and up-to-date
  7. Safeguards: Personal information must be protected by appropriate security safeguards
  8. Openness: Organizations must make policies and practices relating to personal information readily available
  9. Individual Access: Individuals have the right to access their personal information
  10. Challenging Compliance: Individuals can challenge an organization's compliance with these principles

These principles inform how health data must be de-identified and managed in commercial contexts across Canada.

"De-identification is not a single technique, but a collection of approaches, tools, and methods that can be applied to data to ensure that the risk of re-identification is very low... Whether information is de-identified or not depends on context."
- Office of the Privacy Commissioner of Canada

Provincial Level

Each province and territory has its own health information privacy legislation, including:

Case Study: Quebec's Bill 64

In September 2021, Quebec passed Bill 64 (An Act to modernize legislative provisions as regards the protection of personal information), which introduced significant changes to Quebec's privacy regime, including:

  • Explicit definitions for de-identified and anonymized information
  • Requirements for privacy impact assessments before using de-identified information
  • Mandatory breach notification requirements
  • Significant administrative penalties (up to $25 million or 4% of worldwide turnover)
  • New consent requirements for secondary use of personal information
  • Requirements to implement privacy by design principles
  • New data portability rights

These changes, which came into effect in phases between 2022 and 2024, represent the most significant update to provincial privacy law in Canada and bring Quebec's framework closer to the GDPR model.

Jurisdictional Complexity

The multi-jurisdictional nature of Canada's privacy framework creates significant complexity for organizations handling health data across provinces. For example:

  • A national telehealth provider must comply with PIPEDA and up to 13 different provincial/territorial health privacy laws
  • De-identification standards may vary between provinces, requiring different approaches in different jurisdictions
  • Data sharing between provinces may trigger multiple compliance obligations
  • Organizations must track ongoing legislative changes across multiple jurisdictions

This complexity has led to calls for greater harmonization of health data privacy standards across Canada.

De-identification Concepts and Approaches

Canadian frameworks generally distinguish between different levels of de-identification:

Concept Description Legal Status
De-identification The general process of removing identifying information, which may result in either anonymized or coded information depending on the extent of removal Umbrella term that encompasses various techniques and levels of data transformation
Anonymization Information that cannot be used to identify an individual, either directly or indirectly Generally falls outside the scope of privacy legislation
Coded Information Information where direct identifiers are removed and replaced with a code, but the code could be used to re-identify (similar to pseudonymization) Still considered personal information under most Canadian privacy laws
Aggregate Information Statistical information that has been compiled from individual records but does not contain individual-level data Generally not considered personal information if aggregation is sufficient to prevent re-identification
Direct Identifiers Information that directly identifies an individual (e.g., name, health card number) Must be removed or transformed in any de-identification process
Indirect Identifiers Information that could identify an individual when combined with other information (e.g., date of birth, postal code) Must be assessed for re-identification risk and appropriately modified

Example: De-identification in PHIPA (Ontario)

Under Ontario's Personal Health Information Protection Act, information is considered "de-identified" if:

  • All direct identifiers have been removed
  • It is not reasonably foreseeable in the circumstances that the information could be utilized, either alone or with other information, to identify the individual

For example, a hospital dataset might be considered de-identified if:

  • Original data: "Sarah Johnson, health card number 1234-567-890, DOB: 05/12/1978, admitted to Toronto General Hospital on 06/15/2023 with diagnosis code J45.901 (Asthma)"
  • De-identified data: "Patient ID: 45678, Age range: 40-45, Hospital region: Greater Toronto Area, Admission: Q2 2023, Diagnosis category: Respiratory"

This would generally meet PHIPA's de-identification standard, though the hospital would need to assess whether the combination of attributes could still allow for re-identification in their specific context.

Case Study: Re-identification Risks in Canadian Health Data

In 2011, researchers from the University of Ottawa demonstrated that supposedly de-identified prescription records could be re-identified by linking them with publicly available information:

  • They analyzed a dataset of prescription records that had been stripped of direct identifiers
  • By using publicly available information about local politicians (date of birth, postal code), they were able to uniquely identify individuals in the dataset
  • This research highlighted the risks of relying solely on removing direct identifiers
  • It led to improved de-identification practices that consider the mosaic effect (combining multiple data sources)

This case influenced the development of more sophisticated risk-based approaches to de-identification in Canadian health privacy frameworks.

Canadian Institute for Health Information (CIHI) Standards

The CIHI has developed comprehensive standards for de-identifying health data, which are widely used across Canada. The CIHI approach includes:

1. CIHI's De-identification Guidelines

CIHI's guidelines define three levels of de-identification:

Example: CIHI's Three-Level Approach

Consider a patient record in a Canadian hospital database:

  • Original data: "Robert Thompson, HCN: 123-456-789, DOB: 11/03/1962, 742 Evergreen Terrace, Winnipeg, Manitoba, R3T 2N2, admitted on 07/12/2024 for hip replacement, physician: Dr. Williams"
  • Level 1 de-identification: "Patient #89725, male, DOB: 11/03/1962, Winnipeg, Manitoba, R3T 2N2, admitted on 07/12/2024 for hip replacement"
  • Level 2 de-identification: "Patient #89725, male, year of birth: 1962, Winnipeg, postal code area: R3T, admitted in July 2024 for hip replacement"
  • Level 3 de-identification: "Patient #89725, male, age range: 60-65, Manitoba urban area, admitted in Q3 2024 for orthopedic procedure"

Each level progressively reduces the identifiability of the data while preserving analytical utility for different use cases.

2. CIHI's Privacy Impact Assessment Framework

CIHI has developed a comprehensive Privacy Impact Assessment (PIA) Framework that includes specific assessment criteria for de-identification methods and residual re-identification risk.

CIHI recommends conducting a re-identification risk assessment that considers:

3. CIHI's Information Life Cycle

CIHI applies privacy and security considerations throughout the information life cycle:

  1. Collection: Ensuring appropriate authority and limiting collection
  2. Use: Limiting use to authorized purposes
  3. Disclosure: Applying appropriate de-identification before disclosure
  4. Retention: Maintaining information only as long as necessary
  5. Disposal: Secure destruction when no longer needed

Case Study: CIHI's Discharge Abstract Database (DAD)

CIHI's Discharge Abstract Database contains clinical, demographic and administrative data on hospital discharges across Canada. CIHI applies multi-level de-identification to this database:

  • Patient identifiers are replaced with encrypted identifiers
  • Dates are modified to maintain relative time intervals while obscuring exact dates
  • Geographic information is generalized to health regions rather than specific locations
  • Rare diagnoses and procedures are grouped into broader categories
  • Different levels of access are provided based on user needs and authorization
  • Public reports use only aggregate statistics with small cell suppression
  • Researchers can access more detailed data through a controlled process

This approach has enabled valuable health system research while protecting patient privacy.

Provincial Approaches

Each province has specific approaches to de-identification:

Ontario (PHIPA)

Under PHIPA, information is considered de-identified if:

The Information and Privacy Commissioner of Ontario (IPC) provides specific guidelines for de-identification and has endorsed a risk-based approach.

Example: Ontario IPC's De-identification Guidelines

The IPC recommends a modified version of the "Five Safes" framework:

  • Safe data: Applying statistical methods to protect confidentiality
  • Safe projects: Ensuring the data use is appropriate and ethical
  • Safe people: Ensuring users are trained and trustworthy
  • Safe settings: Implementing technical and physical controls
  • Safe outputs: Ensuring research results don't disclose sensitive information

In practice, this means Ontario healthcare organizations must consider both the de-identification techniques and the broader context of data use.

The IPC also provides specific guidance on:

  • Risk-based de-identification for structured data
  • De-identification of free-text clinical notes
  • Re-identification risk thresholds (generally recommending less than 5% risk)
  • Data sharing agreements for de-identified information

Alberta (HIA)

The HIA defines "non-identifying" health information and provides guidelines for anonymization, focusing on both direct and indirect identifiers.

Alberta's approach emphasizes:

Example: Alberta's Data Matching Requirements

Under Alberta's HIA, data matching (combining information from different sources) requires:

  • A Privacy Impact Assessment submitted to the Privacy Commissioner
  • Public notice of the data matching program
  • Justification of why the matching is necessary
  • Description of security safeguards
  • Information about how the matched data will be used or disclosed

These requirements apply even when using de-identified information if there's a possibility of re-identification through the matching process.

British Columbia

The E-Health Act defines "de-identified" as data that has been modified so that the identity of the individual cannot be determined by using a single identifier or by combining identifiers.

BC's approach includes:

Quebec

Quebec has recently updated its privacy legislation with the passage of Bill 64 (2021), which includes more explicit provisions about de-identification and anonymization. The new law:

Case Study: Ontario's Electronic Health Record Initiative

Ontario's ConnectingOntario program illustrates the provincial approach to balancing data sharing with privacy protection:

  • Creates a secure provincial electronic health record system
  • Implements role-based access controls to limit data access
  • Uses audit logs to track all access to personal health information
  • Allows patients to implement consent directives to mask certain information
  • Applies de-identification for secondary uses like health system planning
  • Requires Privacy Impact Assessments for system changes
  • Includes mandatory privacy training for all users

This approach demonstrates how technical safeguards, governance controls, and de-identification work together in provincial health information systems.

Technical Approaches

Canadian frameworks generally recommend several technical approaches to de-identification:

Technique Application Example
Suppression Removing variables that can directly identify individuals Removing names, health card numbers, and medical record numbers
Generalization Reducing the precision of variables Using age ranges (30-35) instead of exact ages (32)
Randomization Adding statistical noise to data Adding random variations to lab test results while preserving clinical significance
Sub-sampling Using only a portion of the original dataset Releasing only 10% of records from a rare disease registry
Synthetic data generation Creating artificial data that preserves statistical properties Creating simulated patient records that match population statistics
Cell suppression Hiding small counts in tabular data Replacing counts less than 5 with an asterisk (*) in public health reports
Date shifting Adjusting dates while preserving time intervals Shifting all dates for a patient by a random number of days (consistent within each patient)
Masking Replacing portions of identifiers Replacing the last 3 digits of postal codes with XXX (e.g., M5S XXX)
Pseudonymization Replacing identifiers with codes Replacing patient names with randomly generated identifiers that allow linking records
Top/bottom coding Grouping extreme values Reporting ages as "90+" for anyone over 90 years old

Example: Statistics Canada Approach to Health Data

Statistics Canada applies specific disclosure control methods to health survey data:

  • Record swapping: Exchanging records between similar respondents to mask unique combinations of attributes
  • Top and bottom coding: Grouping extreme values (e.g., age 90+ instead of exact ages)
  • Controlled rounding: Rounding small counts in tables while maintaining totals
  • Geographic aggregation: Using health regions rather than postal codes
  • Removal of outliers: Excluding unusual cases from public use microdata files
  • Sampling fraction reduction: Releasing only a sample of the collected data
  • Global recoding: Reducing detail in variables (e.g., collapsing occupation codes)
  • Local suppression: Suppressing specific values that create unique combinations

These techniques allow Statistics Canada to release valuable health data while protecting privacy.

Case Study: Health Data Research Network Canada

The Health Data Research Network Canada (HDRN) has developed standardized approaches to working with sensitive health data:

  • Implemented the HDRN Distributed Analytics approach that allows analysis without centralizing sensitive data
  • Developed common data quality assessment frameworks
  • Created standardized processes for researcher access to health data
  • Established the Data Access Support Hub (DASH) to coordinate multi-jurisdictional research
  • Applied consistent privacy and security standards across provincial data centers
  • Implemented output checking protocols to ensure no identifiable information is released

This network has enabled national health research while respecting provincial privacy frameworks and maintaining appropriate de-identification standards.

Risk-Based Approach

Canadian frameworks generally emphasize a risk-based approach to de-identification that considers:

This risk-based approach recognizes that:

Example: Governance Controls in Canadian Research Networks

The Canadian Primary Care Sentinel Surveillance Network (CPCSSN) uses a combination of technical de-identification and governance controls:

  • Patient identifiers are replaced with randomly generated IDs
  • Dates are shifted by a random number of days (consistent within each patient)
  • Free text notes are processed to remove names and other identifiers
  • Postal codes are truncated to the first three characters
  • Researchers must apply for data access with a specific research protocol
  • A data access committee reviews all requests
  • Data use agreements prohibit re-identification attempts
  • Results are reviewed before publication to ensure no identifiable information
  • Secure computing environments control how data is accessed
  • Regular privacy audits are conducted

This multi-layered approach allows for valuable health research while maintaining privacy protections.

Case Study: Population Data BC

Population Data BC provides a comprehensive example of the risk-based approach to health data access:

  • Implements a secure research environment for accessing linked health data
  • Uses a five-stage approval process for data access requests
  • Requires ethics approval for all research projects
  • Applies different levels of de-identification based on research needs and context
  • Uses a separation principle where identifying information is kept separate from content data
  • Employs privacy officers to review all outputs before release
  • Requires researchers to complete privacy training
  • Implements technical controls including virtual secure research environments
  • Conducts regular privacy audits and compliance monitoring

This approach has enabled British Columbia to become a leader in population health research while maintaining strong privacy protections.

Implementation Considerations

Organizations implementing de-identification in the Canadian context should consider:

Example: Indigenous Data Governance Considerations

The First Nations principles of OCAP® (Ownership, Control, Access, and Possession) have important implications for health data de-identification:

  • Ownership: First Nations communities own their cultural knowledge and health information
  • Control: First Nations must control how their information is collected, used, and disclosed
  • Access: First Nations must have access to information about themselves
  • Possession: Physical control of the data should be maintained by First Nations institutions

Organizations working with Indigenous health data must consider these principles alongside technical de-identification approaches, often requiring community engagement and specific data governance agreements.

Case Study: Cross-Border Health Data Transfers

A Canadian healthcare organization partnering with a U.S.-based cloud provider for health analytics faced complex de-identification requirements:

  • Needed to comply with provincial health privacy law for data collection
  • Required to meet PIPEDA standards for cross-border transfers
  • Had to consider potential U.S. law enforcement access under the CLOUD Act
  • Implemented a multi-layered approach:
    • Strong de-identification before data left Canada
    • Contractual safeguards with the U.S. provider
    • Technical controls including encryption
    • Data residency requirements for certain sensitive elements
    • Privacy Impact Assessment reviewed by provincial commissioner

This illustrates how Canadian organizations must navigate multiple jurisdictional requirements when implementing de-identification for international data sharing.

Proposed Legislative Changes

Bill C-27 (The Digital Charter Implementation Act, 2022) proposes significant changes to Canada's privacy framework, including:

These proposed changes would bring Canada's approach closer to the GDPR model, though with distinct Canadian elements.

Example: Bill C-27 Definitions

Bill C-27 proposes the following definitions:

  • De-identified information: "Information that has been modified so that an individual cannot be directly identified from it, though a risk of the individual being identified remains"
  • Anonymized information: "Information that has been irreversibly and permanently modified, in accordance with generally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means"

These definitions would create clearer distinctions between different levels of de-identification and their legal status.

Pan-Canadian Health Data Strategy

In 2021, the Pan-Canadian Health Data Strategy Expert Advisory Group was established to address fragmentation in health data systems. Their recommendations include:

  • Creating common data standards and interoperability requirements across provinces
  • Developing harmonized approaches to de-identification and data sharing
  • Establishing clear governance frameworks for health data
  • Enhancing digital health literacy among healthcare providers and the public
  • Building public trust through transparency and meaningful engagement
  • Addressing legislative and policy barriers to appropriate health data use
  • Ensuring Indigenous data sovereignty principles are respected

If implemented, these recommendations could significantly streamline health data de-identification practices across Canada while maintaining strong privacy protections.

Case Study: The Digital Health Immunization Repository

The COVID-19 pandemic highlighted both the potential and challenges of pan-Canadian health data sharing. The development of digital vaccine credentials required:

  • Coordination between federal, provincial, and territorial governments
  • Balancing privacy concerns with public health needs
  • Creating interoperable systems that could work across jurisdictions
  • Implementing appropriate de-identification for aggregate reporting
  • Developing privacy-preserving verification mechanisms
  • Addressing public concerns about surveillance and tracking

This experience has informed ongoing discussions about modernizing Canada's health data framework, including approaches to de-identification that can better support public health responses while protecting individual privacy.

How It Compares to Other Frameworks

Canada's approach differs from HIPAA Safe Harbor in several ways:

Compared to the EU's GDPR:

Example: Comparative Approach to Dates

The treatment of dates illustrates the differences between frameworks:

  • HIPAA Safe Harbor: Requires all dates directly related to an individual (except year) to be removed or limited to year
  • Canadian Approach: Varies by context and risk assessment; might allow month/year in low-risk scenarios but require broader date ranges in higher-risk contexts
  • GDPR: Considers dates as personal data that may require pseudonymization or anonymization depending on context and purpose

For example, a dataset containing admission dates for patients with rare conditions might be handled as follows:

  • Under HIPAA: All dates would be limited to year only (e.g., "2024")
  • Under Canadian frameworks: Dates might be generalized to quarters or months based on a risk assessment of the specific dataset and its intended use
  • Under GDPR: Dates might be pseudonymized with technical and organizational measures to prevent re-identification

Case Study: Multi-jurisdictional Research Project

A research project involving health data from Canada, the US, and the EU had to navigate different de-identification requirements:

  • Canadian data: Required provincial research ethics board approvals and compliance with provincial health privacy laws
  • US data: Required HIPAA compliance with either Safe Harbor or Expert Determination
  • EU data: Required GDPR compliance with appropriate safeguards for pseudonymized data

The solution involved:

  • Creating jurisdiction-specific de-identification protocols
  • Implementing the most stringent requirements across all datasets to ensure compliance
  • Using data sharing agreements with specific provisions for each jurisdiction
  • Conducting regular compliance reviews
  • Implementing a federated analysis approach that minimized cross-border data transfers

This illustrates how organizations operating across multiple jurisdictions must navigate complex and sometimes conflicting de-identification requirements.

Official Resources

Academic and Professional Resources

Tools and Frameworks