Method and system for providing anonymized patient datasets

Inventors

Suppan, Santiago ReinhardCuellar Jaramillo, Jorge RicardoRosenbaum, Ute

Assignees

Siemens Healthineers AG

Publication Number

US-12361170-B2

Publication Date

2025-07-15

Expiration Date

2042-07-26

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

A computer-implemented method for providing anonymized patient datasets, comprises: analyzing statistical population data to ascertain obfuscation parameters; and anonymizing patient datasets including quasi-identifiers as attributes by obfuscating the quasi-identifiers of the patient datasets based on the obfuscation parameters to generate the anonymized patient datasets. A system includes at least one processor and a memory, and is configured to provide the anonymized patient datasets.

Core Innovation

The invention relates to a computer-implemented method and system for providing anonymized patient datasets where statistical population data is analyzed to ascertain obfuscation parameters. Patient datasets containing quasi-identifiers are anonymized by obfuscating these quasi-identifiers based on the obtained obfuscation parameters, generating anonymized patient datasets. This approach enables efficient anonymization without the need for laborious preprocessing of entire database content.

The problem being solved stems from conventional anonymization methods that rely on preprocessing large datasets of clinical patient data, which is often time-consuming or practically unfeasible due to high data volumes and limited data transfer speeds in medical institutions. Additionally, continuous real-time data streams from sensors make traditional preprocessing impossible. Thus, there is a need for a method and system that can efficiently anonymize patient datasets, including real-time generated data, without extensive preprocessing.

Claims Coverage

The patent includes multiple independent claims covering computer-implemented methods and systems for anonymizing patient datasets by analyzing statistical population data to derive obfuscation parameters used to obfuscate quasi-identifiers.

Analyzing statistical population data to ascertain obfuscation parameters

The method analyzes statistical population data read from at least one statistical database to determine obfuscation parameters used for anonymization.

Anonymizing patient datasets by obfuscating quasi-identifiers based on obfuscation parameters

Patient datasets that include quasi-identifiers are anonymized by obfuscating the quasi-identifiers according to the ascertained obfuscation parameters to generate anonymized datasets.

Obfuscation including generalization, masking digits, or arithmetic/logic changes

Obfuscating a quasi-identifier can be done by generalizing its value to an interval, partially deleting or masking digits, or changing the value by a change value in an arithmetic or logic operation.

Use of obfuscation parameters indicating spread, digit deletion positions, or change values

Obfuscation parameters derived from statistical population data indicate specifics for obfuscation such as spread of generalization intervals, number/position of digits to be deleted or masked, or values for changing quasi-identifiers.

Patient dataset attributes including identifiers, quasi-identifiers, and sensitive attributes

The patient datasets comprise identifiers uniquely identifying patients, quasi-identifiers that in combination identify patients, and sensitive attributes comprising personal data.

Automatic deletion or masking of identifiers during anonymization

Identifiers contained in patient datasets are automatically deleted or masked in the anonymization process.

Reading datasets from project data source and generating datasets in real time from sensors

Patient datasets are read from a project data source and can also be generated in real time based on sensor data.

Continuous obfuscation of data streams and storage in an anonymous database

Patient datasets provided as a continuous data stream are continuously obfuscated based on obfuscation parameters and stored in an anonymous database.

Formation of clusters with identical obfuscated quasi-identifiers

Anonymized patient datasets form clusters characterized by a cluster size in which all obfuscated quasi-identifiers are identical.

Calculating population expected value based on statistical population data

A population expected value is calculated indicating the number of people in the catchment area satisfying the obfuscated quasi-identifiers, guiding the anonymization.

Obfuscation ensuring population expected value exceeds cluster size

Quasi-identifiers are obfuscated such that the calculated population expected value is greater than a selectable cluster size within the anonymized dataset.

Training artificial intelligence modules based on anonymized datasets

Anonymized patient datasets stored in the anonymous database are used to train artificial intelligence modules, such as artificial neural networks.

Using anonymized data for setting medical device parameters and making medical diagnoses

Anonymized patient datasets are used to set device parameters of medical devices or to automatically create medical diagnoses relating to patients.

Detection of patient attributes by sensors and inclusion of various data types

Attributes of patient datasets can be at least partially detected by sensors and may include text, audio, or image data.

System configured to execute the described anonymization based on population data

A system including a processor and memory is configured to analyze statistical population data to ascertain obfuscation parameters and anonymize patient datasets read from a project data source by obfuscating quasi-identifiers accordingly.

The claims collectively cover computer-implemented methods and systems that determine obfuscation parameters from statistical population data to efficiently anonymize patient datasets by obfuscating quasi-identifiers using various techniques. They include provisions for automated deletion of identifiers, continuous anonymization of real-time data streams, cluster formation ensuring data anonymity, calculation of population expected values for guiding anonymization, and applications in AI training and medical device parameterization.

Stated Advantages

Efficient ascertainment of obfuscation parameters without requiring laborious data preprocessing of large patient datasets.

Ability to process continuous real-time data streams for anonymization.

Increased anonymization speed with fewer computing and storage resources.

Reduced exposure and processing of sensitive patient data, enhancing cybersecurity and data protection.

Capability to anonymize patient datasets with any number of different attributes.

Real-time capable system allowing anonymization of sensor-generated patient data in real-time.

Possibility to ascertain or update obfuscation parameters in the background during anonymization, enabling dynamic adaptability.

Documented Applications

Use of anonymized patient datasets for training artificial intelligence modules, particularly artificial neural networks.

Setting device parameters of medical devices for patient examinations based on anonymized data.

Automatically creating medical diagnoses relating to patients using anonymized patient datasets.

Evaluation and analysis of medical studies using anonymized patient data.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.