Japan Health Data De-identification Framework

Overview

Japan has established a sophisticated framework for health data de-identification that balances privacy protection with the desire to leverage health data for research and innovation. The framework is characterized by a two-tiered approach: general data protection laws that apply to all sectors, and health-specific legislation that provides additional requirements and opportunities for health data use.

Key Developments in Japan's Health Data Framework

May 2017: Next Generation Medical Infrastructure Law enacted
May 2018: Next Generation Medical Infrastructure Law came into effect
June 2020: Major amendments to the Act on the Protection of Personal Information (APPI) passed
April 2022: Amended APPI came into effect, introducing the concept of pseudonymously processed information
October 2022: First certified medical information providers approved under NGMIL
April 2023: Enhanced guidelines for anonymization of medical data published
June 2023: Japan's Digital Health Strategy released, emphasizing secure health data utilization
March 2024: Updated PPC guidelines on anonymization techniques published
May 2024: New certification standards for medical information handling organizations

Legal Framework

Japan's health data de-identification framework is built on several key pieces of legislation:

Primary Legislation

Act on the Protection of Personal Information (APPI): Japan's comprehensive data protection law, significantly amended in 2020 with implementation in 2022. It establishes the general framework for personal data protection, including special categories like health data.
Next Generation Medical Infrastructure Law (NGMIL): Specific legislation enacted in 2017 to facilitate the use of medical data for research and development. It creates a framework for anonymized medical data sharing.
Medical Practitioners' Act and Medical Care Act: Contain provisions on medical confidentiality and management of medical records that complement data protection requirements.
Act on Anonymized Medical Data to Contribute to Research and Development in the Medical Field: Provides specific rules for anonymization of medical data for research purposes.
Digital Health Enhancement Act (2023): Promotes digital transformation in healthcare while ensuring appropriate data protection.

Reference Links:

Personal Information Protection Commission Japan (PPC): https://www.ppc.go.jp/en/
APPI English Translation (2022 version): https://www.ppc.go.jp/files/pdf/Act_on_the_Protection_of_Personal_Information.pdf
Ministry of Health, Labour and Welfare (MHLW): https://www.mhlw.go.jp/english/
Next Generation Medical Infrastructure Law Overview: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000148944.html (Japanese)
Japan Agency for Medical Research and Development (AMED): https://www.amed.go.jp/en/
Cabinet Office Healthcare Policy: https://www.kantei.go.jp/jp/singi/kenkouiryou/en/

Key Concepts and Definitions

Under APPI

The APPI establishes several important categories of data:

Concept	Definition	Regulatory Status	Examples in Health Context
Personal Information	Information that can identify a specific individual, including information that can be easily collated with other information to identify an individual	Fully regulated under APPI	Patient name, address, phone number, combined with medical record number
Special Care-Required Personal Information	Sensitive data including medical history and healthcare records that requires special handling	Subject to stricter requirements under APPI, including explicit consent for collection in most cases	Diagnosis information, treatment history, genetic test results, disability status
Pseudonymously Processed Information	Personal information that has been processed so that it cannot identify a specific individual without additional information	Still regulated but with some exemptions from APPI requirements, including for internal analysis purposes	Medical records with patient identifiers replaced by codes, with the mapping table stored separately
Anonymously Processed Information	Information that has been irreversibly processed to prevent identification of specific individuals	Falls outside most APPI requirements and can be used and shared with fewer restrictions	Statistical summaries of patient outcomes with all identifiers removed and data generalized

Reference:

PPC Guidelines on Anonymously Processed Information (English): https://www.ppc.go.jp/files/pdf/211119_guidelines_anonymously_processed_information.pdf

PPC Guidelines on Pseudonymously Processed Information (English): https://www.ppc.go.jp/en/legal/guidelines/

Example: Different Data Categories in Practice

Original Data (Personal Information):

Name: Tanaka Hiroshi
Date of Birth: May 15, 1975
Address: 1-2-3 Chiyoda, Tokyo
Diagnosis: Type 2 Diabetes
Treatment: Metformin 500mg twice daily
Hospital: Tokyo Medical Center
Insurance ID: 12345678
Phone: 03-1234-5678
Email: tanaka.h@example.jp

Pseudonymously Processed Information:

Patient ID: PT-12345
Age: 50
Region: Tokyo
Diagnosis: Type 2 Diabetes
Treatment: Metformin 500mg twice daily
Hospital: Hospital A
Insurance Category: Employee Health Insurance

Anonymously Processed Information:

Age Group: 50-55
Region: Kanto
Diagnosis: Type 2 Diabetes
Treatment Category: Oral hypoglycemic agent
Hospital Type: Large Urban Medical Center

Special Rules for Health Data

The APPI and NGMIL establish special rules for health data:

Enhanced Consent Requirements: Collection of health data generally requires explicit consent unless specific exceptions apply
Opt-out for Research: Under NGMIL, medical institutions can share data with certified providers based on opt-out consent
Data Minimization: Only necessary health data should be collected and processed
Purpose Limitation: Health data should only be used for specified purposes
Security Requirements: Enhanced security measures are required for health data

Example: NGMIL Opt-out Notice

Under the NGMIL, medical institutions must provide patients with an opt-out notice that includes:

The name and contact information of the medical institution
The name and contact information of the certified medical information provider
The types of medical information to be provided
The purposes for which the anonymized medical information will be used
The patient's right to opt out and the procedure for doing so
The fact that the data will be anonymized before being provided to third parties

Example notice from Keio University Hospital (translated):

"Keio University Hospital participates in the Next Generation Medical Infrastructure program to support medical research. Your medical information may be provided to Life Data Initiative (certified medical information provider) for anonymization and research use. If you do not wish your data to be used, please notify our reception desk. For more information, please see our website or ask our staff."

Next Generation Medical Infrastructure Law (NGMIL)

The NGMIL creates a specific framework for the use of medical data for research and innovation:

Key Features

Certified Medical Information Providers: Medical institutions can provide patient data to certified organizations (認定匿名加工医療情報作成事業者) that are authorized to collect and anonymize medical data
Opt-out Mechanism: Uses an opt-out rather than opt-in approach for data sharing, where patients must be notified but don't need to provide explicit consent
Anonymization Standards: Establishes specific standards for medical data anonymization that are more detailed than general APPI requirements
Data Utilization Framework: Creates a structured process for researchers to access anonymized health data
Prohibition of Re-identification: Explicitly prohibits attempts to re-identify anonymized medical data with significant penalties
Certification Requirements: Detailed requirements for organizations seeking certification, including technical capabilities, security measures, and governance structures

Reference:

MHLW Next Generation Medical Infrastructure Law Information: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000148944.html (Japanese)

Cabinet Office Healthcare Data Policy: https://www.kantei.go.jp/jp/singi/kenkouiryou/jisedai_kiban/ (Japanese)

MHLW Guidelines for Certified Medical Information Providers (2023 update): https://www.mhlw.go.jp/content/10800000/001088247.pdf (Japanese)

NGMIL Anonymization Process

The NGMIL establishes a specific process for medical data anonymization:

Medical institutions provide data to certified medical information providers after notifying patients (with opt-out option)
These providers process the data according to strict anonymization standards specified in MHLW guidelines
Anonymized data can then be provided to researchers and companies for approved purposes
Results of research using the anonymized data must be reported back to the certified provider
Certified providers must conduct regular audits and report to MHLW
MHLW conducts periodic inspections of certified providers

Case Study: Life Data Initiative (LDI)

In 2022, the Life Data Initiative was certified as a medical information provider under NGMIL. The initiative:

Collects medical data from over 100 participating hospitals and clinics
Processes approximately 10 million patient records using standardized anonymization protocols
Makes anonymized data available to approved researchers for medical innovation projects
Has supported research on treatment efficacy for various conditions including diabetes, cardiovascular disease, and cancer
Implements a secure data access environment with strict controls to prevent re-identification attempts
Provides detailed reports to MHLW on data utilization and security measures
Conducts regular risk assessments of anonymization techniques

This initiative demonstrates how the NGMIL framework enables large-scale health data utilization while maintaining privacy protections.

Example: NGMIL Certification Requirements

Organizations seeking certification under NGMIL must meet stringent requirements, including:

Technical capability: Demonstrated expertise in medical data anonymization
Security measures: Physical, technical, and administrative safeguards
Governance structure: Independent oversight committee with medical and ethics experts
Financial stability: Sufficient resources to maintain operations
Operational procedures: Documented processes for data handling
Personnel qualifications: Staff with appropriate expertise
Audit capabilities: Systems for tracking data access and use
Breach response plan: Procedures for handling security incidents

As of May 2024, only four organizations have received this certification, highlighting the rigorous nature of the requirements.

Technical Requirements for De-identification

Japanese regulations specify technical requirements for both pseudonymization and anonymization:

Pseudonymous Processing Requirements

To qualify as "Pseudonymously Processed Information" under APPI, data controllers must:

Replace all direct identifiers with codes or pseudonyms
Delete descriptions that could easily identify the individual (such as rare disease information)
Store any information linking pseudonyms to original identities separately and securely
Implement security measures to prevent unauthorized access
Not attempt to re-identify the information
Document the pseudonymization process and security measures
Limit use to internal analysis, testing, and research purposes only
Conduct risk assessment for potential re-identification vulnerabilities

Example: Hospital Pseudonymization Process

A large Tokyo hospital implements pseudonymization for internal quality improvement research as follows:

Patient names replaced with randomly generated codes (e.g., "PT-2024-78945")
Dates of birth converted to ages
Addresses generalized to prefecture level
Rare conditions (affecting fewer than 0.1% of population) grouped into broader categories
The pseudonymization key (mapping between patient IDs and codes) stored in a separate secure database with access limited to the hospital's data protection officer
Access to pseudonymized data restricted to authorized research staff
All access logged and monitored
Regular audits of access logs and security controls
Specific training for staff on pseudonymized data handling
Prohibition on combining the pseudonymized data with external datasets

Anonymous Processing Requirements

To qualify as "Anonymously Processed Information" under APPI, data controllers must:

Delete all direct personal identifiers
Process any code numbers or indirect identifiers so that individuals cannot be identified
Delete any free-text descriptions that could identify individuals
Take additional measures based on a risk assessment considering properties of the data and processing methods
Publicly disclose information about the categories of data being processed anonymously
Not combine the anonymized data with other information to attempt re-identification
Implement security measures to prevent unauthorized access to the data
Document the anonymization process and risk assessment
Regularly review the effectiveness of anonymization techniques

Reference:

PPC Guidelines on Anonymously Processed Information (English): https://www.ppc.go.jp/en/legal/guidelines/

MHLW Guidelines on Medical Data Anonymization (Japanese): https://www.mhlw.go.jp/content/10601000/000499627.pdf

PPC Handbook on Anonymization (2024 version, Japanese): https://www.ppc.go.jp/personalinfo/legal/anonymously_processed_information/

Technical Methods Recommended in Japanese Guidelines

Method	Description	Example in Health Context
K-anonymity	Ensuring each record is indistinguishable from at least k-1 other records	Ensuring any combination of age, gender, and prefecture appears at least 5 times in the dataset
L-diversity	Ensuring sensitive attributes have at least l different values within each group	Ensuring patients with the same demographic characteristics have at least 3 different diagnoses
T-closeness	Ensuring the distribution of sensitive attributes within each group is similar to the overall distribution	Ensuring the distribution of diagnoses within each demographic group is similar to the overall patient population
Generalization	Reducing precision of data	Converting exact ages to 5-year ranges, specific locations to prefecture level
Data Suppression	Removing high-risk values	Removing information about very rare conditions or treatments
Noise Addition	Adding statistical noise to numerical values	Adding small random variations to laboratory values or vital signs
Top/Bottom Coding	Grouping extreme values	Reporting ages as "90+" for all patients over 90 years old
Differential Privacy	Adding mathematical noise to query results	Adding calibrated noise to statistical queries on health data to provide mathematical privacy guarantees

Example: Anonymization Process for a National Health Survey

Japan's National Health and Nutrition Survey data is anonymized using these steps:

Direct identifier removal: Names, addresses, phone numbers, and other direct identifiers are completely removed
Age generalization: Exact ages are converted to 5-year age bands (e.g., 45-49)
Geographic aggregation: Specific locations are generalized to prefecture level
Rare value suppression: Very rare conditions or characteristics affecting fewer than 0.5% of the population are either suppressed or grouped into broader categories
K-anonymity verification: Data is checked to ensure each combination of quasi-identifiers appears at least 5 times
L-diversity implementation: Ensuring sensitive attributes have sufficient diversity within each demographic group
Date shifting: Exact dates are converted to months or seasons
Free text processing: Any free text fields are either removed or processed to remove potential identifiers
Outlier management: Extreme values are top/bottom coded
Risk assessment: Final dataset undergoes re-identification risk assessment

Regulatory Oversight

Japan's framework includes strong regulatory oversight:

Personal Information Protection Commission (PPC): The primary regulatory authority for personal data protection, with powers to investigate, issue orders, and impose penalties for APPI violations
Ministry of Health, Labour and Welfare (MHLW): Oversees the implementation of the NGMIL and certifies medical information providers
Certified Organizations: Special entities certified under the NGMIL to process medical information, subject to strict operational requirements and regular audits
Healthcare Information Technical Committee: Advisory body that develops technical standards for health data security and anonymization
Japan Medical Association: Professional body that provides guidance on ethical handling of medical data
Digital Agency of Japan: Coordinates digital health initiatives and data protection standards

Enforcement Example: PPC Actions

In 2023, the PPC took enforcement action against a healthcare application provider for improper handling of health data. The company had:

Failed to properly pseudonymize health data used for product development
Not implemented adequate security measures for sensitive health information
Shared what it claimed was anonymized data with third parties without meeting APPI anonymization standards
Failed to disclose to users how their health data would be processed

The PPC issued an administrative guidance order requiring the company to:

Implement proper pseudonymization procedures
Enhance security measures
Cease sharing data until proper anonymization could be verified
Submit to regular audits for the following two years
Revise privacy notices to clearly explain data processing
Provide additional training to staff on data protection

Reference:

Personal Information Protection Commission: https://www.ppc.go.jp/en/

PPC Annual Report 2023 (English): https://www.ppc.go.jp/en/aboutus/roles/annual/

MHLW Certified Medical Information Providers List (Japanese): https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000202384.html

Digital Agency of Japan: https://www.digital.go.jp/en/

Penalties for Non-compliance

Japan's framework includes significant penalties for violations:

APPI Violations: Fines up to 100 million yen (approximately $700,000) for corporations
NGMIL Violations: Fines up to 1 million yen (approximately $7,000) and/or imprisonment up to 1 year for unauthorized re-identification attempts
Administrative Actions: The PPC can issue improvement orders, business suspension orders, and public announcements of violations
Certification Revocation: MHLW can revoke certification of medical information providers for serious violations
Civil Liability: Affected individuals can seek damages for privacy violations

Practical Implementation

In practice, Japan's framework supports several mechanisms for health data use:

1. Jisedai Iryo-ban (Next-Generation Healthcare Platform)

A system established under the NGMIL that:

Collects and anonymizes medical data from hospitals and clinics
Makes anonymized data available for research and development
Operates under strict certification requirements
Includes multiple certified medical information providers
Supports both academic and commercial research initiatives
Implements standardized data formats and anonymization protocols
Provides secure access environments for researchers

Case Study: JMDC Health Data Bank

JMDC Inc. operates one of Japan's largest health data banks as a certified medical information provider. Their platform:

Contains anonymized data from over 10 million individuals
Includes health insurance claims, medical checkups, and prescription data
Implements the NGMIL's opt-out mechanism through participating healthcare providers
Applies standardized anonymization protocols developed with MHLW guidance
Provides data access to pharmaceutical companies, medical device manufacturers, and academic researchers
Has supported over 300 research projects, including COVID-19 treatment effectiveness studies
Operates a secure cloud environment for data analysis with access controls and audit logging
Conducts regular re-identification risk assessments
Publishes annual transparency reports on data utilization

In 2023, JMDC data was used in a major study on diabetes treatment outcomes that led to updated clinical guidelines, demonstrating the practical value of the NGMIL framework.

2. Clinical Innovation Network (CIN)

A network for sharing clinical data for research purposes using de-identified information that:

Connects multiple hospitals and research institutions
Standardizes clinical data collection and de-identification
Focuses on rare disease and specialty care research
Operates under MHLW oversight
Implements registry-based research using pseudonymized data
Supports clinical trial optimization and real-world evidence generation

Reference:

Clinical Innovation Network: https://cinc.ncgm.go.jp/en/

CIN Research Publications: https://cinc.ncgm.go.jp/en/achievements/publications.html

3. National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB)

A large database of health insurance claims and health checkup data that provides anonymized data for research. The NDB:

Contains data from approximately 95% of Japan's population
Includes over 16 billion medical claims records
Implements specialized anonymization techniques for large-scale health data
Provides data access to approved researchers through a controlled process
Supports policy research and healthcare system planning
Enables population-level health analyses
Implements tiered access controls based on research needs and data sensitivity

Reference:

NDB Open Data: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000177221.html (Japanese)

MHLW NDB User Guide (2023 version): https://www.mhlw.go.jp/content/12400000/001053185.pdf (Japanese)

4. Medical Information Database Network (MID-NET)

A specialized network for pharmacovigilance and drug safety research that:

Connects electronic health record systems from multiple hospitals
Enables near real-time monitoring of drug safety signals
Uses pseudonymized patient data for analysis
Implements distributed analysis to minimize data transfer
Supports regulatory decision-making on pharmaceuticals
Operates under the Pharmaceuticals and Medical Devices Agency (PMDA)

Reference:

PMDA MID-NET: https://www.pmda.go.jp/english/safety/surveillance-analysis/0018.html

Case Study: COVID-19 Data Platform

During the COVID-19 pandemic, Japan established a specialized data platform that demonstrated the flexibility of its health data framework:

Combined data from multiple sources including testing centers, hospitals, and public health departments
Implemented expedited anonymization protocols while maintaining privacy protections
Created tiered access levels for different user groups (public health officials, researchers, policymakers)
Enabled rapid analysis of treatment outcomes, vaccine effectiveness, and variant impacts
Supported both domestic policy decisions and international research collaboration
Demonstrated how Japan's framework could adapt to emergency situations while maintaining privacy principles

Recent Developments and Future Directions

Japan continues to evolve its approach to health data de-identification:

AI and Healthcare: Development of specific guidelines for using de-identified health data in AI development (2023-2024)
International Data Transfers: Japan has received an adequacy decision from the EU, facilitating health data transfers for research
Genomic Data Initiatives: The Genomic Medicine Implementation Platform with specialized anonymization protocols for genetic information
Enhanced Technical Standards: Ongoing development of more sophisticated technical standards for health data anonymization
Public-Private Partnerships: Increased collaboration between government, academia, and industry for health data utilization
Digital Health Strategy: Japan's 2023 Digital Health Strategy emphasizes secure data sharing for innovation
Interoperability Standards: Development of standardized formats for health data exchange
Federated Learning: Exploration of privacy-preserving analytics that don't require data centralization

Emerging Approach: Differential Privacy

Japan's National Institute of Information and Communications Technology (NICT) has been developing differential privacy techniques specifically for healthcare applications. This approach:

Adds mathematical noise to query results rather than to the underlying data
Provides provable privacy guarantees regardless of external information
Is being piloted in several Japanese healthcare research initiatives
May be incorporated into future MHLW guidelines for health data anonymization
Has been implemented in a pilot project with three university hospitals for rare disease research
Enables more precise analysis while maintaining strong privacy protections

Reference:

NICT Research on Healthcare Data Privacy: https://www.nict.go.jp/en/research/research-areas-and-research-centers.html

Japan's Digital Health Strategy (Cabinet Office): https://www.kantei.go.jp/jp/singi/kenkouiryou/suisin/ (Japanese)

AMED Genomic Medicine Program: https://www.amed.go.jp/en/program/list/04/01/genome_medicine.html

Challenges and Ongoing Work

Despite its sophisticated framework, Japan continues to address several challenges:

Balancing Innovation and Privacy: Finding the right balance between data access for innovation and privacy protection
Public Trust: Building and maintaining public trust in health data systems
Technical Complexity: Managing the technical complexity of advanced anonymization techniques
International Harmonization: Aligning Japanese standards with global frameworks
Emerging Technologies: Adapting the framework to emerging technologies like AI and genomics
Implementation Costs: Addressing the costs of implementing sophisticated de-identification systems

How It Compares to HIPAA Safe Harbor

Japan's approach differs from HIPAA Safe Harbor in several key ways:

Legal Categories: Creates a clear legal distinction between pseudonymized data (still regulated) and anonymized data (less regulated), whereas HIPAA has a binary approach (either PHI or de-identified)
Dedicated Framework: Provides a specific legal framework (NGMIL) dedicated to medical data sharing for innovation, more specialized than HIPAA
Consent Approach: Uses an opt-out rather than opt-in approach for health data sharing for research under NGMIL, while HIPAA requires authorization unless data is de-identified
De-identification Approach: Takes a more risk-based approach to de-identification rather than a specific list of identifiers as in HIPAA Safe Harbor
Intermediary Model: Creates a certified intermediary model for processing health data, not present in HIPAA
Purpose Emphasis: Places greater emphasis on the purpose and context of data use in determining appropriate de-identification methods
Risk Assessment: Incorporates privacy impact assessments into the de-identification process more explicitly than HIPAA
Technical Standards: Provides more specific guidance on technical methods like k-anonymity and l-diversity
Certification Process: Implements a formal certification process for organizations handling health data
Transparency Requirements: Has more explicit requirements for transparency about data processing

Practical Comparison Example

For a clinical research project using patient data:

Under HIPAA Safe Harbor: Remove 18 specific identifiers to create a de-identified dataset that can be used without patient authorization
Under Japan's Framework: Either:
1. Work with a certified medical information provider under NGMIL, who collects data from medical institutions (with patient opt-out option) and anonymizes it according to MHLW standards, or
2. Create anonymously processed information according to APPI and PPC guidelines, with a documented risk assessment and disclosure of data categories being processed

The Japanese approach typically involves more stakeholders and formal processes, but may enable more flexible use of the data while maintaining strong privacy protections.

Reference:

Comparative Analysis of Global Health Data Protection Frameworks (MHLW): https://www.mhlw.go.jp/content/10904750/000923639.pdf (Japanese)

Japan-US Digital Health Cooperation: https://www.mofa.go.jp/files/100064068.pdf