Australia Health Data De-identification Framework

Overview

Australia has developed a comprehensive approach to health data de-identification that combines national legislation, sector-specific guidelines, and technical standards. The Australian framework places strong emphasis on risk-based assessment and recognizes the contextual nature of de-identification.

The Australian approach acknowledges that de-identification is not a binary concept but exists on a spectrum of risk. This risk-based approach is central to Australia's regulatory framework, which focuses on whether information is "reasonably identifiable" in a given context rather than prescribing specific technical methods.

"De-identification of personal information is not a fixed or single process but depends on context. It involves removing or altering information that identifies an individual or is reasonably likely to do so."
- Office of the Australian Information Commissioner (OAIC)

Key Regulatory Bodies:

Office of the Australian Information Commissioner (OAIC) - Australia's independent national privacy regulator, responsible for privacy functions established by the Privacy Act 1988
Australian Digital Health Agency - Leads Australia's digital health strategy and operates the My Health Record system
Australian Institute of Health and Welfare (AIHW) - Australia's national agency for health and welfare information and statistics
National Data Commissioner - Oversees the Data Availability and Transparency Act (DATA) Scheme established in 2022 to enable controlled access to public sector data
Therapeutic Goods Administration (TGA) - Regulates medical devices and pharmaceuticals, including health data used in clinical trials

References:

OAIC: De-identification and the Privacy Act Australian Digital Health Agency AIHW Data Governance National Data Commissioner TGA Clinical Trials Guidelines

Legal Framework

Australia's health data de-identification framework is built upon several key pieces of legislation and regulation:

Primary Legislation

Privacy Act 1988: The cornerstone of privacy regulation in Australia, which includes the Australian Privacy Principles (APPs). The Act was significantly amended in 2023-2024 to strengthen privacy protections, including increased penalties for serious or repeated privacy breaches (up to $50 million or more).
My Health Records Act 2012: Specific legislation governing Australia's national electronic health record system, with strict controls on data use and disclosure. The system became opt-out in 2019, meaning all Australians have a record unless they choose not to.
Healthcare Identifiers Act 2010: Legislation establishing a national system for uniquely identifying healthcare providers and recipients.
Data Availability and Transparency Act 2022: Establishes a scheme for controlled sharing of public sector data, including health data, with appropriate safeguards.
Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022: Significantly increased penalties for privacy breaches and enhanced regulatory powers.

Example: Privacy Act 2023-2024 Amendments

The Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022 introduced significant changes that came into effect in 2023-2024, including:

Increased maximum penalties for serious or repeated privacy breaches to $50 million, three times the value of any benefit obtained through the misuse of information, or 30% of a company's adjusted turnover in the relevant period
Enhanced powers for the OAIC to resolve privacy breaches
Strengthened notification requirements for data breaches
New information sharing powers to assist the OAIC in investigations
Extraterritorial application to organizations doing business in Australia, even if not physically present

These amendments significantly increase the potential consequences for organizations that fail to properly de-identify health data.

State and Territory Legislation

Each Australian state and territory has its own health privacy legislation, creating a complex regulatory landscape:

New South Wales: Health Records and Information Privacy Act 2002 - Contains 15 Health Privacy Principles
Victoria: Health Records Act 2001 - Contains 11 Health Privacy Principles
ACT: Health Records (Privacy and Access) Act 1997 - Contains 14 Privacy Principles
Queensland: Information Privacy Act 2009 and Hospital and Health Boards Act 2011
South Australia: Health Care Act 2008
Western Australia: Health Services Act 2016
Tasmania: Personal Information Protection Act 2004
Northern Territory: Information Act 2002

Example: Jurisdictional Complexity

A healthcare provider operating across multiple Australian states must comply with both the federal Privacy Act and the relevant state/territory legislation in each jurisdiction where they operate. For instance, a telehealth service operating in both NSW and Victoria would need to comply with:

Federal Privacy Act 1988 and APPs
NSW Health Records and Information Privacy Act
Victorian Health Records Act

This creates a complex compliance environment where de-identification practices may need to satisfy multiple regulatory frameworks.

Case Study: My Health Record Secondary Use Framework

The Framework to Guide the Secondary Use of My Health Record System Data, released in 2018 and updated in 2023, provides a comprehensive approach to de-identification of Australia's national electronic health record data:

Establishes a dedicated Secondary Use of Data Governance Board
Requires data to be de-identified before release except in specific circumstances
Prohibits certain uses including commercial and non-health-related purposes
Implements a multi-layered approval process for data access
Requires ethics committee approval for research projects
Establishes a secure data access environment
Mandates public benefit testing for all data uses
Allows individuals to opt out of having their data used for secondary purposes

This framework demonstrates Australia's comprehensive approach to balancing privacy protection with enabling beneficial uses of health data.

References:

Privacy Act 1988 (Commonwealth) My Health Records Act 2012 (Commonwealth) Healthcare Identifiers Act 2010 (Commonwealth) Data Availability and Transparency Act 2022 (Commonwealth) Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022 OAIC: Rights and Responsibilities under the Privacy Act Framework to Guide the Secondary Use of My Health Record System Data

Key Concepts and Definitions

Australian privacy law includes several important concepts related to de-identification:

Concept	Definition
De-identification	The process of removing or altering information that identifies an individual or is reasonably likely to identify an individual. This involves both removing direct identifiers and addressing indirect identification risks through technical and administrative controls.
Personal information	Information or an opinion about an identified individual, or an individual who is reasonably identifiable, whether the information or opinion is true or not, and whether recorded in material form or not.
Sensitive information	A subset of personal information that includes health information, genetic information, biometric information, and other categories that receive additional protections under the Privacy Act.
Re-identification	The process of turning de-identified data back into identifiable data, either by restoring removed identifiers or by using other available information to infer identity.
Reasonably identifiable	A key threshold concept that depends on context, including the nature and amount of information, who will have access to it, and other information that may be available.
Disclosure risk	The likelihood that an individual could be re-identified from supposedly de-identified data, considering all relevant factors including other available information.
Data custodian	The entity responsible for managing and protecting data, including implementing appropriate de-identification measures.
Accredited Data Service Provider	Under the DATA Scheme, an organization accredited to provide data services including de-identification.

Example: "Reasonably Identifiable" in Practice

The OAIC provides the following example: A dataset contains patient health records with names and Medicare numbers removed, but includes full date of birth, gender, and detailed postcode information. While no direct identifiers remain, the combination of these quasi-identifiers could make individuals "reasonably identifiable" in small geographic areas where few people share the same characteristics. The OAIC would likely consider this information to still be personal information subject to the Privacy Act.

In a 2017 case, the Department of Health published de-identified health data that was subsequently re-identified by researchers who linked it with other publicly available information. This led to significant reforms in Australia's approach to data release and de-identification practices.

Under Australian law, properly de-identified information is no longer considered personal information and thus falls outside the scope of the Privacy Act. However, the test is contextual and depends on the reasonable likelihood of re-identification in the circumstances.

References:

OAIC: What is Personal Information? De-identification Decision-Making Framework (2020) OAIC: De-identification and the Privacy Act National Data Commissioner: Accredited Data Service Providers

Office of the Australian Information Commissioner (OAIC) Guidelines

The OAIC has published extensive guidance on de-identification, including:

De-identification and the Privacy Act

This guidance outlines:

The process of de-identification involves both:

Removing direct identifiers
Taking additional steps to remove, obscure, aggregate, alter and/or protect information so that it is no longer reasonably identifiable

The importance of considering the context in which data will be released
The need to evaluate re-identification risk against available resources, skills, and motives
The requirement for ongoing risk assessment as technology and data availability evolve
The importance of governance frameworks for managing de-identified data

Example: Risk-Based Approach to De-identification

The OAIC recommends a risk-based approach that considers:

Data environment factors: Who can access the data, what other data is available, what controls exist
Data factors: What information remains in the dataset, how unique or distinguishable records are
Intent factors: The motivation, skills, and resources of potential attackers

For example, releasing hospital admission data might require different de-identification approaches depending on whether it's being:

Published openly on the internet (highest risk)
Shared with approved researchers in a secure environment (moderate risk)
Used internally for quality improvement (lower risk)

The De-identification Decision-Making Framework

Developed in partnership with CSIRO's Data61, this comprehensive framework provides a structured approach to de-identification that includes:

Establish context: Define the data situation and evaluate risks
Understand the data: Identify variables and assess disclosure risks
Choose de-identification methods: Select appropriate techniques based on risk assessment
Calculate re-identification risk: Quantify the likelihood of re-identification
Manage risk: Implement controls and governance frameworks

Case Study: Australian Census Data Release

The Australian Bureau of Statistics (ABS) applies sophisticated de-identification techniques to census data that contains sensitive health information:

Implements a multi-layered approach to protect individual privacy while maintaining data utility
Uses perturbation techniques to introduce small random adjustments to data
Applies different levels of geographic aggregation depending on sensitivity
Suppresses small counts for sensitive health conditions
Provides different access mechanisms based on user needs and trustworthiness
Conducts comprehensive disclosure risk assessments before each data release
Implements TableBuilder, a tool that applies automatic confidentiality protections

This approach has enabled the release of valuable population health data while protecting individual privacy.

References:

OAIC: De-identification and the Privacy Act De-identification Decision-Making Framework (2020) OAIC: Guide to Data Analytics and the Australian Privacy Principles ABS: Data Confidentiality Guide ABS: TableBuilder

Technical Approaches

Australian guidance recommends various technical approaches to de-identification:

1. Direct Identifier Removal

Removal of information that directly identifies individuals, such as:

Names and name-related information (initials, aliases)
Medicare numbers and healthcare identifiers
Tax File Numbers (TFNs)
Address details and geocodes
Contact information (phone, email, social media handles)
Biometric identifiers (fingerprints, retina scans)
Identification numbers (driver's license, passport)
Photos and video recordings
Indigenous status identifiers in small geographic areas
Unique device identifiers (medical devices, wearables)

Example: Direct Identifier Removal in Practice

The Australian Institute of Health and Welfare (AIHW) applies the following techniques when releasing health datasets:

Removal: Complete elimination of direct identifiers
Pseudonymization: Replacing identifiers with randomly generated codes that maintain the ability to link records while removing identifying information
Key separation: Storing linking keys separately from content data with strict access controls
Secure hash functions: Creating non-reversible identifiers for linkage purposes
Statistical linkage keys: Using standardized approaches to create linkage keys that don't reveal identity

For example, in the National Hospital Morbidity Database, patient names and Medicare numbers are replaced with randomly generated identifiers before data is provided to researchers.

2. Statistical Techniques

Various methods to address indirect identification risks:

Technique	Description	Example Application
Aggregation	Combining values into categories (e.g., age ranges instead of specific ages)	Converting exact ages to 5-year age bands (e.g., 30-34, 35-39)
Suppression	Removing variables or records that present high re-identification risk	Removing rare disease codes that affect very few individuals in a dataset
Perturbation	Adding noise to data while preserving statistical properties	Adding random variations to laboratory values while maintaining overall distribution
Synthetic data	Creating artificial data that maintains statistical properties	Generating synthetic patient records that reflect real population characteristics
k-anonymity	Ensuring each combination of attributes occurs in at least k records	Ensuring at least 5 people share each combination of age, gender, and postcode
l-diversity	Ensuring sensitive attributes have diverse values within each equivalence class	Ensuring multiple different diagnosis codes exist for each demographic group
Differential privacy	Adding statistical noise in a way that provides mathematical privacy guarantees	Adding calibrated noise to aggregate statistics about health conditions
Cell suppression	Hiding cells in tabular data that could reveal individuals	Suppressing counts less than 5 in health statistics tables
Data swapping	Exchanging values between records to break identifiability	Swapping certain demographic details between similar records
Microaggregation	Replacing individual values with averages from small groups	Replacing individual blood pressure readings with small group averages

Example: Differential Privacy Implementation

The Australian Bureau of Statistics has begun implementing differential privacy techniques for certain data releases:

Establishes a privacy budget (epsilon) that quantifies the privacy risk
Adds carefully calibrated noise to statistics based on sensitivity
Provides mathematical guarantees against re-identification
Balances privacy protection with data utility
Allows transparent communication about privacy protection levels

This approach represents the cutting edge of privacy-preserving data release in Australia.

References:

AIHW Data Governance AIHW Confidentiality Guidelines Australian Bureau of Statistics: Data Confidentiality Guide ABS: Differential Privacy in Census Data CSIRO: Privacy-Preserving Techniques

The Five Safes Framework

Australia has widely adopted the "Five Safes" framework, originally developed in the UK, as a structured approach to managing sensitive data. This framework is now embedded in Australia's data sharing practices, including the Data Availability and Transparency Act 2022:

Safe Dimension	Description	Australian Implementation Example
Safe People	Ensuring data users are authorized, trained, and trustworthy	AIHW requires researchers to sign confidentiality undertakings and complete training before accessing sensitive health data
Safe Projects	Ensuring data use is appropriate and ethical	Human Research Ethics Committee approval required for health data research projects
Safe Settings	Controlling the environment in which data is accessed	The Secure Unified Research Environment (SURE) provides a secure virtual environment for analyzing sensitive health data
Safe Data	Applying technical controls to remove identifiers	Application of statistical disclosure control methods to health datasets before release
Safe Outputs	Ensuring results of analysis don't disclose sensitive information	Statistical output checking before publication of research findings based on sensitive data

Example: The Five Safes in the Australian DATA Scheme

The Data Availability and Transparency Act 2022 explicitly incorporates the Five Safes framework into Australia's data sharing legislation. For health data sharing under this scheme:

Safe People: Data recipients must be accredited by the National Data Commissioner
Safe Projects: Data sharing must be for an authorized purpose (delivering government services, informing policy, or research and development)
Safe Settings: Appropriate security controls must be in place
Safe Data: Only the minimum necessary data can be shared
Safe Outputs: Results must be checked before wider release

Case Study: The Secure Unified Research Environment (SURE)

SURE is a secure computing environment developed by the Sax Institute to enable safe access to sensitive health data:

Provides a virtual desktop infrastructure for researchers to access and analyze sensitive health data
Implements multiple security layers including two-factor authentication
Prevents data downloads or transfers outside the secure environment
Records all user actions for audit purposes
Requires all outputs to be checked for disclosure risk before release
Enables collaboration across institutions while maintaining data security
Has supported over 800 research projects using sensitive health data

SURE exemplifies the "Safe Settings" component of the Five Safes framework and has become a model for secure data access internationally.

This approach recognizes that de-identification is not solely about technical treatments of data but includes the entire context of data use.

References:

National Data Commissioner: Data Sharing Principles AIHW: Five Safes Framework ABS: Five Safes Framework Sax Institute: Secure Unified Research Environment (SURE) Best Practice Guide to Applying Data Sharing Principles

Sector-Specific Guidance

Several Australian organizations have developed sector-specific guidance for health data de-identification:

Australian Institute of Health and Welfare (AIHW)

The AIHW's Confidentiality Guidelines provide specific approaches for de-identifying health and welfare data, including:

Detailed protocols for handling small numbers in health statistics
Cell suppression techniques for tabular data
Confidentiality checks before data release
Specific guidance for indigenous health data
Approaches for geographic data at different levels
Guidelines for longitudinal data linkage

Example: AIHW Small Numbers Protocol

The AIHW applies specific rules when reporting health statistics for small geographic areas:

Cells with counts of 1-4 are generally suppressed
Additional cells may be suppressed to prevent calculation of suppressed values (complementary suppression)
Rates based on small numbers are flagged as potentially unreliable
Indigenous health data has additional protections due to its sensitivity
Geographic areas may be combined to increase population size
Confidence intervals are provided to indicate statistical reliability

Department of Health and Aged Care

Provides specific guidance for de-identifying Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) data, including:

The MBS and PBS 10% Sample datasets - de-identified longitudinal data for research
Protocols for researcher access to linked health data through secure environments
Requirements for ethics approval and public interest certification
Specific guidance for handling sensitive health information such as mental health and HIV status
Procedures for data linkage using the Statistical Linkage Key (SLK-581)
Guidelines for the release of COVID-19 related health data

Australian Digital Health Agency

Provides guidance specific to My Health Record data de-identification, including:

Secondary use framework for My Health Record data
Strict controls on data access and use
Consumer opt-out options for secondary use
Prohibition on commercial use of de-identified data
Requirements for data minimization and purpose limitation
Technical standards for secure data access environments
Governance arrangements for data release decisions

Case Study: Population Health Research Network (PHRN)

The PHRN is a national data linkage infrastructure that enables privacy-preserving linkage of health datasets across Australia:

Implements a "separation principle" where identifying information is separated from content data
Uses specialized data linkage units in each state/territory
Applies privacy-preserving record linkage techniques
Creates linkage keys without revealing identities
Enables researchers to access linked data without seeing identifiers
Supports complex multi-jurisdictional data linkage projects
Has facilitated over 700 research projects using linked health data

This infrastructure has enabled valuable population health research while maintaining strong privacy protections.

References:

AIHW Confidentiality Guidelines Framework to Guide the Secondary Use of My Health Record System Data Services Australia: Statistical Information and Data Population Health Research Network: Data Privacy and Security Principles for Accessing, Linking and Using Health Data

How It Compares to HIPAA Safe Harbor

Australia's approach differs from HIPAA Safe Harbor in several key ways:

Principles vs. Prescriptive Rules: Australia takes a principles-based approach rather than providing a specific list of identifiers to remove (as in HIPAA's 18 identifiers)
Contextual Assessment: Places greater emphasis on context and intended data use when determining if data is "reasonably identifiable"
Holistic Framework: Incorporates the "Five Safes" framework to address the broader data environment beyond just the data itself
Risk-Based Approach: More explicitly recognizes the contextual nature of identifiability and requires risk assessment
Flexibility vs. Certainty: Provides greater flexibility but potentially less certainty about compliance compared to HIPAA's "safe harbor" approach
Organizational Responsibility: Places stronger emphasis on organizational responsibility to assess re-identification risk
Indigenous Data Considerations: Includes specific provisions for indigenous health data not present in HIPAA
Jurisdictional Complexity: Navigates federal and state/territory legislation, creating a more complex compliance environment

Example: De-identifying a Patient Dataset

Under HIPAA Safe Harbor: Remove the 18 specified identifiers (e.g., names, all geographic subdivisions smaller than a state, all dates directly related to an individual, phone numbers, etc.)

Under Australian Framework: Conduct a context-specific risk assessment that considers:

Who will have access to the data
What other information they might have
How the data will be used and protected
The specific re-identification risks in the dataset
Apply appropriate technical and administrative controls based on this assessment
Implement governance frameworks for ongoing management
Consider the full data environment including access controls

Feature	HIPAA Safe Harbor	Australian Framework
Approach	Prescriptive list of 18 identifiers to remove	Principles-based assessment of "reasonably identifiable"
Legal certainty	High - clear compliance pathway	Moderate - requires judgment and risk assessment
Flexibility	Low - same approach for all contexts	High - tailored to specific use cases and contexts
Environmental controls	Limited focus - primarily on data transformation	Comprehensive - incorporates Five Safes framework
Risk assessment	Limited - focused on "actual knowledge" test	Extensive - central to compliance approach
Documentation	Minimal requirements	Comprehensive documentation expected

References:

OAIC: De-identification and the Privacy Act U.S. HHS: Guidance on HIPAA De-identification De-identification Decision-Making Framework (2020)