Distribution-invariant data protection mechanism

Inventors

Bi, XuanShen, Xiaotong

Assignees

University of Minnesota System

Publication Number

US-12248613-B2

Publication Date

2025-03-11

Expiration Date

2042-03-31

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

A method includes storing a value in data storage so that a third party is prevented from accessing the value, retrieving the value and applying a first transform to the value to form a transformed value having a uniform distribution. Noise is added to the transformed value to form a sum and a second transform is applied to the sum to form a transformed sum having a uniform distribution. An inverse of the first transform is applied to the transformed sum to form a privatized value and the privatized value is provided to the third party.

Core Innovation

The invention provides a method and system for data privacy protection by generating a privatized value from data stored in such a way that a third party cannot directly access the original value. This is achieved by, for example, transforming the original data through a first transformation to obtain a uniformly distributed value, adding noise to the transformed data to obscure it (often using Laplacian noise), applying a second transformation to regain a uniform distribution, and then applying the inverse of the original transformation to restore the privatized value to the original scale. This privatized value is then made available to the third party, ensuring data privacy.

The problem addressed by this invention stems from the limitations of existing privacy protection mechanisms, such as those designed to achieve differential privacy. Traditional methods often alter the original data distribution, require restrictive bounded support, or disrupt multivariate dependency structures, leading to a loss of statistical accuracy in downstream analyses. The invention aims to overcome these drawbacks by providing a distribution-invariant privatization mechanism that retains the statistical properties of the data while enabling privacy protection.

The core innovation, named distribution-invariant privatization (DIP), is applicable to a wide range of data types, including continuous, discrete, mixed, and categorical variables, regardless of whether their distribution has bounded or unbounded support. DIP transforms and perturbs the data in such a way that the privatized data maintains the same distribution as the original, supporting accurate statistical analysis and ensuring differential privacy. The method can be implemented on a computing device capable of executing the necessary transformations and noise addition processes.

Claims Coverage

The patent includes three independent claims, each introducing a main inventive feature for data privacy protection using transformation and noise addition to maintain distribution-invariant privatized values.

Generating distribution-invariant privatized values while preventing direct access

A method of storing a plurality of values in data storage so that a third party is prevented from accessing them; applying a first transform to a retrieved value to form a transformed value with a uniform distribution; adding noise to the transformed value to obtain a sum; applying a second transform to the sum to achieve another uniform distribution; forming a privatized value from the transformed sum, ensuring the privatized value is part of a distribution of privatized values that matches the distribution of the original values; providing the privatized value to the third party.

Computer system for transformation-based, distribution-preserving privatization

A computer comprising a memory and processor executing instructions to: store values inaccessible to third parties; retrieve a value; apply a first transform to achieve a uniform distribution; add noise to the transformed value for obfuscation; apply a second transform to regain a uniform distribution; convert the transformed sum into a privatized value that preserves the original distribution; and provide the privatized value to a third party.

Privatizing values while preserving original data distribution using noise and transformation

A method of storing values inaccessible to third parties; retrieving and applying a first transform to a value to place it within a uniform distribution of transformed values; using noise to create a privatized value from the transformed value such that the resulting privatized value is part of the same distribution as the original data; and providing access to this privatized value for a third party.

These inventive features collectively establish methods and systems that transform and perturb data to produce privatized values for third parties, while maintaining the original data distribution and ensuring access control to the unprivatized data.

Stated Advantages

Enables privatized data to preserve the original data distribution, supporting accurate downstream statistical analysis and learning.

Satisfies differential privacy for a wide range of data types, including continuous, discrete, mixed, and categorical variables with either bounded or unbounded support.

Maintains statistical accuracy even under strict privacy protection, overcoming the trade-off between privacy and analysis accuracy faced by existing methods.

Retains multivariate dependency structures, allowing accurate regression, classification, and graphical model analysis on privatized data.

Requires no knowledge of the true underlying data distribution, functioning with empirical or estimated distributions.

Robust against changes in privacy parameters and sample size, sustaining low distributional error and estimation error across various scenarios.

Provides privacy even for data with unbounded support due to the use of nonlinear transformation, which existing mechanisms typically cannot handle effectively.

Documented Applications

Privatizing continuous univariate variables for secure data sharing while enabling accurate statistical inference.

Privatizing discrete univariate variables, including categorical and mixed-type data, for privacy-preserving analysis.

Privatizing multivariate continuous variables to maintain joint distributions in complex data analysis tasks such as regression or graphical models.

Implementation in computing systems for privacy-preserving data storage and sharing, including but not limited to public data releases and analytics.

Applied to real-world datasets such as university salary records and bank marketing data for privacy-preserving estimation and modeling.

Use in scenarios requiring differential privacy, such as census data publication or survey data analysis for public use.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.