Overview
Australia has developed a comprehensive approach to health data de-identification that combines national legislation, sector-specific guidelines, and technical standards. The Australian framework places strong emphasis on risk-based assessment and recognizes the contextual nature of de-identification.
The Australian approach acknowledges that de-identification is not a binary concept but exists on a spectrum of risk. This risk-based approach is central to Australia's regulatory framework, which focuses on whether information is "reasonably identifiable" in a given context rather than prescribing specific technical methods.
- Office of the Australian Information Commissioner (OAIC)
Key Regulatory Bodies:
- Office of the Australian Information Commissioner (OAIC) - Australia's independent national privacy regulator, responsible for privacy functions established by the Privacy Act 1988
- Australian Digital Health Agency - Leads Australia's digital health strategy and operates the My Health Record system
- Australian Institute of Health and Welfare (AIHW) - Australia's national agency for health and welfare information and statistics
- National Data Commissioner - Oversees the Data Availability and Transparency Act (DATA) Scheme established in 2022 to enable controlled access to public sector data
- Therapeutic Goods Administration (TGA) - Regulates medical devices and pharmaceuticals, including health data used in clinical trials
Legal Framework
Australia's health data de-identification framework is built upon several key pieces of legislation and regulation:
Primary Legislation
- Privacy Act 1988: The cornerstone of privacy regulation in Australia, which includes the Australian Privacy Principles (APPs). The Act was significantly amended in 2023-2024 to strengthen privacy protections, including increased penalties for serious or repeated privacy breaches (up to $50 million or more).
- My Health Records Act 2012: Specific legislation governing Australia's national electronic health record system, with strict controls on data use and disclosure. The system became opt-out in 2019, meaning all Australians have a record unless they choose not to.
- Healthcare Identifiers Act 2010: Legislation establishing a national system for uniquely identifying healthcare providers and recipients.
- Data Availability and Transparency Act 2022: Establishes a scheme for controlled sharing of public sector data, including health data, with appropriate safeguards.
- Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022: Significantly increased penalties for privacy breaches and enhanced regulatory powers.
Example: Privacy Act 2023-2024 Amendments
The Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022 introduced significant changes that came into effect in 2023-2024, including:
- Increased maximum penalties for serious or repeated privacy breaches to $50 million, three times the value of any benefit obtained through the misuse of information, or 30% of a company's adjusted turnover in the relevant period
- Enhanced powers for the OAIC to resolve privacy breaches
- Strengthened notification requirements for data breaches
- New information sharing powers to assist the OAIC in investigations
- Extraterritorial application to organizations doing business in Australia, even if not physically present
These amendments significantly increase the potential consequences for organizations that fail to properly de-identify health data.
State and Territory Legislation
Each Australian state and territory has its own health privacy legislation, creating a complex regulatory landscape:
- New South Wales: Health Records and Information Privacy Act 2002 - Contains 15 Health Privacy Principles
- Victoria: Health Records Act 2001 - Contains 11 Health Privacy Principles
- ACT: Health Records (Privacy and Access) Act 1997 - Contains 14 Privacy Principles
- Queensland: Information Privacy Act 2009 and Hospital and Health Boards Act 2011
- South Australia: Health Care Act 2008
- Western Australia: Health Services Act 2016
- Tasmania: Personal Information Protection Act 2004
- Northern Territory: Information Act 2002
Example: Jurisdictional Complexity
A healthcare provider operating across multiple Australian states must comply with both the federal Privacy Act and the relevant state/territory legislation in each jurisdiction where they operate. For instance, a telehealth service operating in both NSW and Victoria would need to comply with:
- Federal Privacy Act 1988 and APPs
- NSW Health Records and Information Privacy Act
- Victorian Health Records Act
This creates a complex compliance environment where de-identification practices may need to satisfy multiple regulatory frameworks.
Case Study: My Health Record Secondary Use Framework
The Framework to Guide the Secondary Use of My Health Record System Data, released in 2018 and updated in 2023, provides a comprehensive approach to de-identification of Australia's national electronic health record data:
- Establishes a dedicated Secondary Use of Data Governance Board
- Requires data to be de-identified before release except in specific circumstances
- Prohibits certain uses including commercial and non-health-related purposes
- Implements a multi-layered approval process for data access
- Requires ethics committee approval for research projects
- Establishes a secure data access environment
- Mandates public benefit testing for all data uses
- Allows individuals to opt out of having their data used for secondary purposes
This framework demonstrates Australia's comprehensive approach to balancing privacy protection with enabling beneficial uses of health data.
References:
Privacy Act 1988 (Commonwealth) My Health Records Act 2012 (Commonwealth) Healthcare Identifiers Act 2010 (Commonwealth) Data Availability and Transparency Act 2022 (Commonwealth) Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022 OAIC: Rights and Responsibilities under the Privacy Act Framework to Guide the Secondary Use of My Health Record System DataKey Concepts and Definitions
Australian privacy law includes several important concepts related to de-identification:
| Concept | Definition |
|---|---|
| De-identification | The process of removing or altering information that identifies an individual or is reasonably likely to identify an individual. This involves both removing direct identifiers and addressing indirect identification risks through technical and administrative controls. |
| Personal information | Information or an opinion about an identified individual, or an individual who is reasonably identifiable, whether the information or opinion is true or not, and whether recorded in material form or not. |
| Sensitive information | A subset of personal information that includes health information, genetic information, biometric information, and other categories that receive additional protections under the Privacy Act. |
| Re-identification | The process of turning de-identified data back into identifiable data, either by restoring removed identifiers or by using other available information to infer identity. |
| Reasonably identifiable | A key threshold concept that depends on context, including the nature and amount of information, who will have access to it, and other information that may be available. |
| Disclosure risk | The likelihood that an individual could be re-identified from supposedly de-identified data, considering all relevant factors including other available information. |
| Data custodian | The entity responsible for managing and protecting data, including implementing appropriate de-identification measures. |
| Accredited Data Service Provider | Under the DATA Scheme, an organization accredited to provide data services including de-identification. |
Example: "Reasonably Identifiable" in Practice
The OAIC provides the following example: A dataset contains patient health records with names and Medicare numbers removed, but includes full date of birth, gender, and detailed postcode information. While no direct identifiers remain, the combination of these quasi-identifiers could make individuals "reasonably identifiable" in small geographic areas where few people share the same characteristics. The OAIC would likely consider this information to still be personal information subject to the Privacy Act.
In a 2017 case, the Department of Health published de-identified health data that was subsequently re-identified by researchers who linked it with other publicly available information. This led to significant reforms in Australia's approach to data release and de-identification practices.
Under Australian law, properly de-identified information is no longer considered personal information and thus falls outside the scope of the Privacy Act. However, the test is contextual and depends on the reasonable likelihood of re-identification in the circumstances.
Office of the Australian Information Commissioner (OAIC) Guidelines
The OAIC has published extensive guidance on de-identification, including:
De-identification and the Privacy Act
This guidance outlines:
- The process of de-identification involves both:
- Removing direct identifiers
- Taking additional steps to remove, obscure, aggregate, alter and/or protect information so that it is no longer reasonably identifiable
- The importance of considering the context in which data will be released
- The need to evaluate re-identification risk against available resources, skills, and motives
- The requirement for ongoing risk assessment as technology and data availability evolve
- The importance of governance frameworks for managing de-identified data
Example: Risk-Based Approach to De-identification
The OAIC recommends a risk-based approach that considers:
- Data environment factors: Who can access the data, what other data is available, what controls exist
- Data factors: What information remains in the dataset, how unique or distinguishable records are
- Intent factors: The motivation, skills, and resources of potential attackers
For example, releasing hospital admission data might require different de-identification approaches depending on whether it's being:
- Published openly on the internet (highest risk)
- Shared with approved researchers in a secure environment (moderate risk)
- Used internally for quality improvement (lower risk)
The De-identification Decision-Making Framework
Developed in partnership with CSIRO's Data61, this comprehensive framework provides a structured approach to de-identification that includes:
- Establish context: Define the data situation and evaluate risks
- Understand the data: Identify variables and assess disclosure risks
- Choose de-identification methods: Select appropriate techniques based on risk assessment
- Calculate re-identification risk: Quantify the likelihood of re-identification
- Manage risk: Implement controls and governance frameworks
Case Study: Australian Census Data Release
The Australian Bureau of Statistics (ABS) applies sophisticated de-identification techniques to census data that contains sensitive health information:
- Implements a multi-layered approach to protect individual privacy while maintaining data utility
- Uses perturbation techniques to introduce small random adjustments to data
- Applies different levels of geographic aggregation depending on sensitivity
- Suppresses small counts for sensitive health conditions
- Provides different access mechanisms based on user needs and trustworthiness
- Conducts comprehensive disclosure risk assessments before each data release
- Implements TableBuilder, a tool that applies automatic confidentiality protections
This approach has enabled the release of valuable population health data while protecting individual privacy.
Technical Approaches
Australian guidance recommends various technical approaches to de-identification:
1. Direct Identifier Removal
Removal of information that directly identifies individuals, such as:
- Names and name-related information (initials, aliases)
- Medicare numbers and healthcare identifiers
- Tax File Numbers (TFNs)
- Address details and geocodes
- Contact information (phone, email, social media handles)
- Biometric identifiers (fingerprints, retina scans)
- Identification numbers (driver's license, passport)
- Photos and video recordings
- Indigenous status identifiers in small geographic areas
- Unique device identifiers (medical devices, wearables)
Example: Direct Identifier Removal in Practice
The Australian Institute of Health and Welfare (AIHW) applies the following techniques when releasing health datasets:
- Removal: Complete elimination of direct identifiers
- Pseudonymization: Replacing identifiers with randomly generated codes that maintain the ability to link records while removing identifying information
- Key separation: Storing linking keys separately from content data with strict access controls
- Secure hash functions: Creating non-reversible identifiers for linkage purposes
- Statistical linkage keys: Using standardized approaches to create linkage keys that don't reveal identity
For example, in the National Hospital Morbidity Database, patient names and Medicare numbers are replaced with randomly generated identifiers before data is provided to researchers.
2. Statistical Techniques
Various methods to address indirect identification risks:
| Technique | Description | Example Application |
|---|---|---|
| Aggregation | Combining values into categories (e.g., age ranges instead of specific ages) | Converting exact ages to 5-year age bands (e.g., 30-34, 35-39) |
| Suppression | Removing variables or records that present high re-identification risk | Removing rare disease codes that affect very few individuals in a dataset |
| Perturbation | Adding noise to data while preserving statistical properties | Adding random variations to laboratory values while maintaining overall distribution |
| Synthetic data | Creating artificial data that maintains statistical properties | Generating synthetic patient records that reflect real population characteristics |
| k-anonymity | Ensuring each combination of attributes occurs in at least k records | Ensuring at least 5 people share each combination of age, gender, and postcode |
| l-diversity | Ensuring sensitive attributes have diverse values within each equivalence class | Ensuring multiple different diagnosis codes exist for each demographic group |
| Differential privacy | Adding statistical noise in a way that provides mathematical privacy guarantees | Adding calibrated noise to aggregate statistics about health conditions |
| Cell suppression | Hiding cells in tabular data that could reveal individuals | Suppressing counts less than 5 in health statistics tables |
| Data swapping | Exchanging values between records to break identifiability | Swapping certain demographic details between similar records |
| Microaggregation | Replacing individual values with averages from small groups | Replacing individual blood pressure readings with small group averages |
Example: Differential Privacy Implementation
The Australian Bureau of Statistics has begun implementing differential privacy techniques for certain data releases:
- Establishes a privacy budget (epsilon) that quantifies the privacy risk
- Adds carefully calibrated noise to statistics based on sensitivity
- Provides mathematical guarantees against re-identification
- Balances privacy protection with data utility
- Allows transparent communication about privacy protection levels
This approach represents the cutting edge of privacy-preserving data release in Australia.
The Five Safes Framework
Australia has widely adopted the "Five Safes" framework, originally developed in the UK, as a structured approach to managing sensitive data. This framework is now embedded in Australia's data sharing practices, including the Data Availability and Transparency Act 2022:
| Safe Dimension | Description | Australian Implementation Example |
|---|---|---|
| Safe People | Ensuring data users are authorized, trained, and trustworthy | AIHW requires researchers to sign confidentiality undertakings and complete training before accessing sensitive health data |
| Safe Projects | Ensuring data use is appropriate and ethical | Human Research Ethics Committee approval required for health data research projects |
| Safe Settings | Controlling the environment in which data is accessed | The Secure Unified Research Environment (SURE) provides a secure virtual environment for analyzing sensitive health data |
| Safe Data | Applying technical controls to remove identifiers | Application of statistical disclosure control methods to health datasets before release |
| Safe Outputs | Ensuring results of analysis don't disclose sensitive information | Statistical output checking before publication of research findings based on sensitive data |
Example: The Five Safes in the Australian DATA Scheme
The Data Availability and Transparency Act 2022 explicitly incorporates the Five Safes framework into Australia's data sharing legislation. For health data sharing under this scheme:
- Safe People: Data recipients must be accredited by the National Data Commissioner
- Safe Projects: Data sharing must be for an authorized purpose (delivering government services, informing policy, or research and development)
- Safe Settings: Appropriate security controls must be in place
- Safe Data: Only the minimum necessary data can be shared
- Safe Outputs: Results must be checked before wider release
Case Study: The Secure Unified Research Environment (SURE)
SURE is a secure computing environment developed by the Sax Institute to enable safe access to sensitive health data:
- Provides a virtual desktop infrastructure for researchers to access and analyze sensitive health data
- Implements multiple security layers including two-factor authentication
- Prevents data downloads or transfers outside the secure environment
- Records all user actions for audit purposes
- Requires all outputs to be checked for disclosure risk before release
- Enables collaboration across institutions while maintaining data security
- Has supported over 800 research projects using sensitive health data
SURE exemplifies the "Safe Settings" component of the Five Safes framework and has become a model for secure data access internationally.
This approach recognizes that de-identification is not solely about technical treatments of data but includes the entire context of data use.
Sector-Specific Guidance
Several Australian organizations have developed sector-specific guidance for health data de-identification:
Australian Institute of Health and Welfare (AIHW)
The AIHW's Confidentiality Guidelines provide specific approaches for de-identifying health and welfare data, including:
- Detailed protocols for handling small numbers in health statistics
- Cell suppression techniques for tabular data
- Confidentiality checks before data release
- Specific guidance for indigenous health data
- Approaches for geographic data at different levels
- Guidelines for longitudinal data linkage
Example: AIHW Small Numbers Protocol
The AIHW applies specific rules when reporting health statistics for small geographic areas:
- Cells with counts of 1-4 are generally suppressed
- Additional cells may be suppressed to prevent calculation of suppressed values (complementary suppression)
- Rates based on small numbers are flagged as potentially unreliable
- Indigenous health data has additional protections due to its sensitivity
- Geographic areas may be combined to increase population size
- Confidence intervals are provided to indicate statistical reliability
Department of Health and Aged Care
Provides specific guidance for de-identifying Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) data, including:
- The MBS and PBS 10% Sample datasets - de-identified longitudinal data for research
- Protocols for researcher access to linked health data through secure environments
- Requirements for ethics approval and public interest certification
- Specific guidance for handling sensitive health information such as mental health and HIV status
- Procedures for data linkage using the Statistical Linkage Key (SLK-581)
- Guidelines for the release of COVID-19 related health data
Australian Digital Health Agency
Provides guidance specific to My Health Record data de-identification, including:
- Secondary use framework for My Health Record data
- Strict controls on data access and use
- Consumer opt-out options for secondary use
- Prohibition on commercial use of de-identified data
- Requirements for data minimization and purpose limitation
- Technical standards for secure data access environments
- Governance arrangements for data release decisions
Case Study: Population Health Research Network (PHRN)
The PHRN is a national data linkage infrastructure that enables privacy-preserving linkage of health datasets across Australia:
- Implements a "separation principle" where identifying information is separated from content data
- Uses specialized data linkage units in each state/territory
- Applies privacy-preserving record linkage techniques
- Creates linkage keys without revealing identities
- Enables researchers to access linked data without seeing identifiers
- Supports complex multi-jurisdictional data linkage projects
- Has facilitated over 700 research projects using linked health data
This infrastructure has enabled valuable population health research while maintaining strong privacy protections.
How It Compares to HIPAA Safe Harbor
Australia's approach differs from HIPAA Safe Harbor in several key ways:
- Principles vs. Prescriptive Rules: Australia takes a principles-based approach rather than providing a specific list of identifiers to remove (as in HIPAA's 18 identifiers)
- Contextual Assessment: Places greater emphasis on context and intended data use when determining if data is "reasonably identifiable"
- Holistic Framework: Incorporates the "Five Safes" framework to address the broader data environment beyond just the data itself
- Risk-Based Approach: More explicitly recognizes the contextual nature of identifiability and requires risk assessment
- Flexibility vs. Certainty: Provides greater flexibility but potentially less certainty about compliance compared to HIPAA's "safe harbor" approach
- Organizational Responsibility: Places stronger emphasis on organizational responsibility to assess re-identification risk
- Indigenous Data Considerations: Includes specific provisions for indigenous health data not present in HIPAA
- Jurisdictional Complexity: Navigates federal and state/territory legislation, creating a more complex compliance environment
Example: De-identifying a Patient Dataset
Under HIPAA Safe Harbor: Remove the 18 specified identifiers (e.g., names, all geographic subdivisions smaller than a state, all dates directly related to an individual, phone numbers, etc.)
Under Australian Framework: Conduct a context-specific risk assessment that considers:
- Who will have access to the data
- What other information they might have
- How the data will be used and protected
- The specific re-identification risks in the dataset
- Apply appropriate technical and administrative controls based on this assessment
- Implement governance frameworks for ongoing management
- Consider the full data environment including access controls
| Feature | HIPAA Safe Harbor | Australian Framework |
|---|---|---|
| Approach | Prescriptive list of 18 identifiers to remove | Principles-based assessment of "reasonably identifiable" |
| Legal certainty | High - clear compliance pathway | Moderate - requires judgment and risk assessment |
| Flexibility | Low - same approach for all contexts | High - tailored to specific use cases and contexts |
| Environmental controls | Limited focus - primarily on data transformation | Comprehensive - incorporates Five Safes framework |
| Risk assessment | Limited - focused on "actual knowledge" test | Extensive - central to compliance approach |
| Documentation | Minimal requirements | Comprehensive documentation expected |