Method and system for providing anonymized patient datasets
Inventors
Suppan, Santiago Reinhard • Cuellar Jaramillo, Jorge Ricardo • Rosenbaum, Ute
Assignees
Publication Number
US-12361170-B2
Publication Date
2025-07-15
Expiration Date
2042-07-26
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A computer-implemented method for providing anonymized patient datasets, comprises: analyzing statistical population data to ascertain obfuscation parameters; and anonymizing patient datasets including quasi-identifiers as attributes by obfuscating the quasi-identifiers of the patient datasets based on the obfuscation parameters to generate the anonymized patient datasets. A system includes at least one processor and a memory, and is configured to provide the anonymized patient datasets.
Core Innovation
The invention relates to a computer-implemented method and system for providing anonymized patient datasets where statistical population data is analyzed to ascertain obfuscation parameters. Patient datasets containing quasi-identifiers are anonymized by obfuscating these quasi-identifiers based on the obtained obfuscation parameters, generating anonymized patient datasets. This approach enables efficient anonymization without the need for laborious preprocessing of entire database content.
The problem being solved stems from conventional anonymization methods that rely on preprocessing large datasets of clinical patient data, which is often time-consuming or practically unfeasible due to high data volumes and limited data transfer speeds in medical institutions. Additionally, continuous real-time data streams from sensors make traditional preprocessing impossible. Thus, there is a need for a method and system that can efficiently anonymize patient datasets, including real-time generated data, without extensive preprocessing.
Claims Coverage
The patent includes multiple independent claims covering computer-implemented methods and systems for anonymizing patient datasets by analyzing statistical population data to derive obfuscation parameters used to obfuscate quasi-identifiers.
Analyzing statistical population data to ascertain obfuscation parameters
The method analyzes statistical population data read from at least one statistical database to determine obfuscation parameters used for anonymization.
Anonymizing patient datasets by obfuscating quasi-identifiers based on obfuscation parameters
Patient datasets that include quasi-identifiers are anonymized by obfuscating the quasi-identifiers according to the ascertained obfuscation parameters to generate anonymized datasets.
Obfuscation including generalization, masking digits, or arithmetic/logic changes
Obfuscating a quasi-identifier can be done by generalizing its value to an interval, partially deleting or masking digits, or changing the value by a change value in an arithmetic or logic operation.
Use of obfuscation parameters indicating spread, digit deletion positions, or change values
Obfuscation parameters derived from statistical population data indicate specifics for obfuscation such as spread of generalization intervals, number/position of digits to be deleted or masked, or values for changing quasi-identifiers.
Patient dataset attributes including identifiers, quasi-identifiers, and sensitive attributes
The patient datasets comprise identifiers uniquely identifying patients, quasi-identifiers that in combination identify patients, and sensitive attributes comprising personal data.
Automatic deletion or masking of identifiers during anonymization
Identifiers contained in patient datasets are automatically deleted or masked in the anonymization process.
Reading datasets from project data source and generating datasets in real time from sensors
Patient datasets are read from a project data source and can also be generated in real time based on sensor data.
Continuous obfuscation of data streams and storage in an anonymous database
Patient datasets provided as a continuous data stream are continuously obfuscated based on obfuscation parameters and stored in an anonymous database.
Formation of clusters with identical obfuscated quasi-identifiers
Anonymized patient datasets form clusters characterized by a cluster size in which all obfuscated quasi-identifiers are identical.
Calculating population expected value based on statistical population data
A population expected value is calculated indicating the number of people in the catchment area satisfying the obfuscated quasi-identifiers, guiding the anonymization.
Obfuscation ensuring population expected value exceeds cluster size
Quasi-identifiers are obfuscated such that the calculated population expected value is greater than a selectable cluster size within the anonymized dataset.
Training artificial intelligence modules based on anonymized datasets
Anonymized patient datasets stored in the anonymous database are used to train artificial intelligence modules, such as artificial neural networks.
Using anonymized data for setting medical device parameters and making medical diagnoses
Anonymized patient datasets are used to set device parameters of medical devices or to automatically create medical diagnoses relating to patients.
Detection of patient attributes by sensors and inclusion of various data types
Attributes of patient datasets can be at least partially detected by sensors and may include text, audio, or image data.
System configured to execute the described anonymization based on population data
A system including a processor and memory is configured to analyze statistical population data to ascertain obfuscation parameters and anonymize patient datasets read from a project data source by obfuscating quasi-identifiers accordingly.
The claims collectively cover computer-implemented methods and systems that determine obfuscation parameters from statistical population data to efficiently anonymize patient datasets by obfuscating quasi-identifiers using various techniques. They include provisions for automated deletion of identifiers, continuous anonymization of real-time data streams, cluster formation ensuring data anonymity, calculation of population expected values for guiding anonymization, and applications in AI training and medical device parameterization.
Stated Advantages
Efficient ascertainment of obfuscation parameters without requiring laborious data preprocessing of large patient datasets.
Ability to process continuous real-time data streams for anonymization.
Increased anonymization speed with fewer computing and storage resources.
Reduced exposure and processing of sensitive patient data, enhancing cybersecurity and data protection.
Capability to anonymize patient datasets with any number of different attributes.
Real-time capable system allowing anonymization of sensor-generated patient data in real-time.
Possibility to ascertain or update obfuscation parameters in the background during anonymization, enabling dynamic adaptability.
Documented Applications
Use of anonymized patient datasets for training artificial intelligence modules, particularly artificial neural networks.
Setting device parameters of medical devices for patient examinations based on anonymized data.
Automatically creating medical diagnoses relating to patients using anonymized patient datasets.
Evaluation and analysis of medical studies using anonymized patient data.
Interested in licensing this patent?