← Back to All Frameworks

Japan Health Data De-identification Framework

Act on the Protection of Personal Information (APPI) and Next Generation Medical Infrastructure Law

Overview

Japan has established a sophisticated framework for health data de-identification that balances privacy protection with the desire to leverage health data for research and innovation. The framework is characterized by a two-tiered approach: general data protection laws that apply to all sectors, and health-specific legislation that provides additional requirements and opportunities for health data use.

Key Developments in Japan's Health Data Framework

  • May 2017: Next Generation Medical Infrastructure Law enacted
  • May 2018: Next Generation Medical Infrastructure Law came into effect
  • June 2020: Major amendments to the Act on the Protection of Personal Information (APPI) passed
  • April 2022: Amended APPI came into effect, introducing the concept of pseudonymously processed information
  • October 2022: First certified medical information providers approved under NGMIL
  • April 2023: Enhanced guidelines for anonymization of medical data published
  • June 2023: Japan's Digital Health Strategy released, emphasizing secure health data utilization
  • March 2024: Updated PPC guidelines on anonymization techniques published
  • May 2024: New certification standards for medical information handling organizations

Legal Framework

Japan's health data de-identification framework is built on several key pieces of legislation:

Primary Legislation

Reference Links:

Key Concepts and Definitions

Under APPI

The APPI establishes several important categories of data:

Concept Definition Regulatory Status Examples in Health Context
Personal Information Information that can identify a specific individual, including information that can be easily collated with other information to identify an individual Fully regulated under APPI Patient name, address, phone number, combined with medical record number
Special Care-Required Personal Information Sensitive data including medical history and healthcare records that requires special handling Subject to stricter requirements under APPI, including explicit consent for collection in most cases Diagnosis information, treatment history, genetic test results, disability status
Pseudonymously Processed Information Personal information that has been processed so that it cannot identify a specific individual without additional information Still regulated but with some exemptions from APPI requirements, including for internal analysis purposes Medical records with patient identifiers replaced by codes, with the mapping table stored separately
Anonymously Processed Information Information that has been irreversibly processed to prevent identification of specific individuals Falls outside most APPI requirements and can be used and shared with fewer restrictions Statistical summaries of patient outcomes with all identifiers removed and data generalized

Reference:

PPC Guidelines on Anonymously Processed Information (English): https://www.ppc.go.jp/files/pdf/211119_guidelines_anonymously_processed_information.pdf

PPC Guidelines on Pseudonymously Processed Information (English): https://www.ppc.go.jp/en/legal/guidelines/

Example: Different Data Categories in Practice

Original Data (Personal Information):

  • Name: Tanaka Hiroshi
  • Date of Birth: May 15, 1975
  • Address: 1-2-3 Chiyoda, Tokyo
  • Diagnosis: Type 2 Diabetes
  • Treatment: Metformin 500mg twice daily
  • Hospital: Tokyo Medical Center
  • Insurance ID: 12345678
  • Phone: 03-1234-5678
  • Email: tanaka.h@example.jp

Pseudonymously Processed Information:

  • Patient ID: PT-12345
  • Age: 50
  • Region: Tokyo
  • Diagnosis: Type 2 Diabetes
  • Treatment: Metformin 500mg twice daily
  • Hospital: Hospital A
  • Insurance Category: Employee Health Insurance

Anonymously Processed Information:

  • Age Group: 50-55
  • Region: Kanto
  • Diagnosis: Type 2 Diabetes
  • Treatment Category: Oral hypoglycemic agent
  • Hospital Type: Large Urban Medical Center

Special Rules for Health Data

The APPI and NGMIL establish special rules for health data:

Example: NGMIL Opt-out Notice

Under the NGMIL, medical institutions must provide patients with an opt-out notice that includes:

  • The name and contact information of the medical institution
  • The name and contact information of the certified medical information provider
  • The types of medical information to be provided
  • The purposes for which the anonymized medical information will be used
  • The patient's right to opt out and the procedure for doing so
  • The fact that the data will be anonymized before being provided to third parties

Example notice from Keio University Hospital (translated):

"Keio University Hospital participates in the Next Generation Medical Infrastructure program to support medical research. Your medical information may be provided to Life Data Initiative (certified medical information provider) for anonymization and research use. If you do not wish your data to be used, please notify our reception desk. For more information, please see our website or ask our staff."

Next Generation Medical Infrastructure Law (NGMIL)

The NGMIL creates a specific framework for the use of medical data for research and innovation:

Key Features

Reference:

MHLW Next Generation Medical Infrastructure Law Information: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000148944.html (Japanese)

Cabinet Office Healthcare Data Policy: https://www.kantei.go.jp/jp/singi/kenkouiryou/jisedai_kiban/ (Japanese)

MHLW Guidelines for Certified Medical Information Providers (2023 update): https://www.mhlw.go.jp/content/10800000/001088247.pdf (Japanese)

NGMIL Anonymization Process

The NGMIL establishes a specific process for medical data anonymization:

  1. Medical institutions provide data to certified medical information providers after notifying patients (with opt-out option)
  2. These providers process the data according to strict anonymization standards specified in MHLW guidelines
  3. Anonymized data can then be provided to researchers and companies for approved purposes
  4. Results of research using the anonymized data must be reported back to the certified provider
  5. Certified providers must conduct regular audits and report to MHLW
  6. MHLW conducts periodic inspections of certified providers

Case Study: Life Data Initiative (LDI)

In 2022, the Life Data Initiative was certified as a medical information provider under NGMIL. The initiative:

  • Collects medical data from over 100 participating hospitals and clinics
  • Processes approximately 10 million patient records using standardized anonymization protocols
  • Makes anonymized data available to approved researchers for medical innovation projects
  • Has supported research on treatment efficacy for various conditions including diabetes, cardiovascular disease, and cancer
  • Implements a secure data access environment with strict controls to prevent re-identification attempts
  • Provides detailed reports to MHLW on data utilization and security measures
  • Conducts regular risk assessments of anonymization techniques

This initiative demonstrates how the NGMIL framework enables large-scale health data utilization while maintaining privacy protections.

Example: NGMIL Certification Requirements

Organizations seeking certification under NGMIL must meet stringent requirements, including:

  • Technical capability: Demonstrated expertise in medical data anonymization
  • Security measures: Physical, technical, and administrative safeguards
  • Governance structure: Independent oversight committee with medical and ethics experts
  • Financial stability: Sufficient resources to maintain operations
  • Operational procedures: Documented processes for data handling
  • Personnel qualifications: Staff with appropriate expertise
  • Audit capabilities: Systems for tracking data access and use
  • Breach response plan: Procedures for handling security incidents

As of May 2024, only four organizations have received this certification, highlighting the rigorous nature of the requirements.

Technical Requirements for De-identification

Japanese regulations specify technical requirements for both pseudonymization and anonymization:

Pseudonymous Processing Requirements

To qualify as "Pseudonymously Processed Information" under APPI, data controllers must:

Example: Hospital Pseudonymization Process

A large Tokyo hospital implements pseudonymization for internal quality improvement research as follows:

  1. Patient names replaced with randomly generated codes (e.g., "PT-2024-78945")
  2. Dates of birth converted to ages
  3. Addresses generalized to prefecture level
  4. Rare conditions (affecting fewer than 0.1% of population) grouped into broader categories
  5. The pseudonymization key (mapping between patient IDs and codes) stored in a separate secure database with access limited to the hospital's data protection officer
  6. Access to pseudonymized data restricted to authorized research staff
  7. All access logged and monitored
  8. Regular audits of access logs and security controls
  9. Specific training for staff on pseudonymized data handling
  10. Prohibition on combining the pseudonymized data with external datasets

Anonymous Processing Requirements

To qualify as "Anonymously Processed Information" under APPI, data controllers must:

Reference:

PPC Guidelines on Anonymously Processed Information (English): https://www.ppc.go.jp/en/legal/guidelines/

MHLW Guidelines on Medical Data Anonymization (Japanese): https://www.mhlw.go.jp/content/10601000/000499627.pdf

PPC Handbook on Anonymization (2024 version, Japanese): https://www.ppc.go.jp/personalinfo/legal/anonymously_processed_information/

Technical Methods Recommended in Japanese Guidelines

Method Description Example in Health Context
K-anonymity Ensuring each record is indistinguishable from at least k-1 other records Ensuring any combination of age, gender, and prefecture appears at least 5 times in the dataset
L-diversity Ensuring sensitive attributes have at least l different values within each group Ensuring patients with the same demographic characteristics have at least 3 different diagnoses
T-closeness Ensuring the distribution of sensitive attributes within each group is similar to the overall distribution Ensuring the distribution of diagnoses within each demographic group is similar to the overall patient population
Generalization Reducing precision of data Converting exact ages to 5-year ranges, specific locations to prefecture level
Data Suppression Removing high-risk values Removing information about very rare conditions or treatments
Noise Addition Adding statistical noise to numerical values Adding small random variations to laboratory values or vital signs
Top/Bottom Coding Grouping extreme values Reporting ages as "90+" for all patients over 90 years old
Differential Privacy Adding mathematical noise to query results Adding calibrated noise to statistical queries on health data to provide mathematical privacy guarantees

Example: Anonymization Process for a National Health Survey

Japan's National Health and Nutrition Survey data is anonymized using these steps:

  1. Direct identifier removal: Names, addresses, phone numbers, and other direct identifiers are completely removed
  2. Age generalization: Exact ages are converted to 5-year age bands (e.g., 45-49)
  3. Geographic aggregation: Specific locations are generalized to prefecture level
  4. Rare value suppression: Very rare conditions or characteristics affecting fewer than 0.5% of the population are either suppressed or grouped into broader categories
  5. K-anonymity verification: Data is checked to ensure each combination of quasi-identifiers appears at least 5 times
  6. L-diversity implementation: Ensuring sensitive attributes have sufficient diversity within each demographic group
  7. Date shifting: Exact dates are converted to months or seasons
  8. Free text processing: Any free text fields are either removed or processed to remove potential identifiers
  9. Outlier management: Extreme values are top/bottom coded
  10. Risk assessment: Final dataset undergoes re-identification risk assessment

Regulatory Oversight

Japan's framework includes strong regulatory oversight:

Enforcement Example: PPC Actions

In 2023, the PPC took enforcement action against a healthcare application provider for improper handling of health data. The company had:

  • Failed to properly pseudonymize health data used for product development
  • Not implemented adequate security measures for sensitive health information
  • Shared what it claimed was anonymized data with third parties without meeting APPI anonymization standards
  • Failed to disclose to users how their health data would be processed

The PPC issued an administrative guidance order requiring the company to:

  • Implement proper pseudonymization procedures
  • Enhance security measures
  • Cease sharing data until proper anonymization could be verified
  • Submit to regular audits for the following two years
  • Revise privacy notices to clearly explain data processing
  • Provide additional training to staff on data protection

Reference:

Personal Information Protection Commission: https://www.ppc.go.jp/en/

PPC Annual Report 2023 (English): https://www.ppc.go.jp/en/aboutus/roles/annual/

MHLW Certified Medical Information Providers List (Japanese): https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000202384.html

Digital Agency of Japan: https://www.digital.go.jp/en/

Penalties for Non-compliance

Japan's framework includes significant penalties for violations:

Practical Implementation

In practice, Japan's framework supports several mechanisms for health data use:

1. Jisedai Iryo-ban (Next-Generation Healthcare Platform)

A system established under the NGMIL that:

Case Study: JMDC Health Data Bank

JMDC Inc. operates one of Japan's largest health data banks as a certified medical information provider. Their platform:

  • Contains anonymized data from over 10 million individuals
  • Includes health insurance claims, medical checkups, and prescription data
  • Implements the NGMIL's opt-out mechanism through participating healthcare providers
  • Applies standardized anonymization protocols developed with MHLW guidance
  • Provides data access to pharmaceutical companies, medical device manufacturers, and academic researchers
  • Has supported over 300 research projects, including COVID-19 treatment effectiveness studies
  • Operates a secure cloud environment for data analysis with access controls and audit logging
  • Conducts regular re-identification risk assessments
  • Publishes annual transparency reports on data utilization

In 2023, JMDC data was used in a major study on diabetes treatment outcomes that led to updated clinical guidelines, demonstrating the practical value of the NGMIL framework.

2. Clinical Innovation Network (CIN)

A network for sharing clinical data for research purposes using de-identified information that:

Reference:

Clinical Innovation Network: https://cinc.ncgm.go.jp/en/

CIN Research Publications: https://cinc.ncgm.go.jp/en/achievements/publications.html

3. National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB)

A large database of health insurance claims and health checkup data that provides anonymized data for research. The NDB:

Reference:

NDB Open Data: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000177221.html (Japanese)

MHLW NDB User Guide (2023 version): https://www.mhlw.go.jp/content/12400000/001053185.pdf (Japanese)

4. Medical Information Database Network (MID-NET)

A specialized network for pharmacovigilance and drug safety research that:

Case Study: COVID-19 Data Platform

During the COVID-19 pandemic, Japan established a specialized data platform that demonstrated the flexibility of its health data framework:

  • Combined data from multiple sources including testing centers, hospitals, and public health departments
  • Implemented expedited anonymization protocols while maintaining privacy protections
  • Created tiered access levels for different user groups (public health officials, researchers, policymakers)
  • Enabled rapid analysis of treatment outcomes, vaccine effectiveness, and variant impacts
  • Supported both domestic policy decisions and international research collaboration
  • Demonstrated how Japan's framework could adapt to emergency situations while maintaining privacy principles

Recent Developments and Future Directions

Japan continues to evolve its approach to health data de-identification:

Emerging Approach: Differential Privacy

Japan's National Institute of Information and Communications Technology (NICT) has been developing differential privacy techniques specifically for healthcare applications. This approach:

  • Adds mathematical noise to query results rather than to the underlying data
  • Provides provable privacy guarantees regardless of external information
  • Is being piloted in several Japanese healthcare research initiatives
  • May be incorporated into future MHLW guidelines for health data anonymization
  • Has been implemented in a pilot project with three university hospitals for rare disease research
  • Enables more precise analysis while maintaining strong privacy protections

Reference:

NICT Research on Healthcare Data Privacy: https://www.nict.go.jp/en/research/research-areas-and-research-centers.html

Japan's Digital Health Strategy (Cabinet Office): https://www.kantei.go.jp/jp/singi/kenkouiryou/suisin/ (Japanese)

AMED Genomic Medicine Program: https://www.amed.go.jp/en/program/list/04/01/genome_medicine.html

Challenges and Ongoing Work

Despite its sophisticated framework, Japan continues to address several challenges:

How It Compares to HIPAA Safe Harbor

Japan's approach differs from HIPAA Safe Harbor in several key ways:

Practical Comparison Example

For a clinical research project using patient data:

  • Under HIPAA Safe Harbor: Remove 18 specific identifiers to create a de-identified dataset that can be used without patient authorization
  • Under Japan's Framework: Either:
    1. Work with a certified medical information provider under NGMIL, who collects data from medical institutions (with patient opt-out option) and anonymizes it according to MHLW standards, or
    2. Create anonymously processed information according to APPI and PPC guidelines, with a documented risk assessment and disclosure of data categories being processed

The Japanese approach typically involves more stakeholders and formal processes, but may enable more flexible use of the data while maintaining strong privacy protections.

Reference:

Comparative Analysis of Global Health Data Protection Frameworks (MHLW): https://www.mhlw.go.jp/content/10904750/000923639.pdf (Japanese)

Japan-US Digital Health Cooperation: https://www.mofa.go.jp/files/100064068.pdf