Methods for mapping data into lower dimensions
Inventors
VIRKAR, Hemant • Stark, Karen • Borgman, Jacob
Assignees
Publication Number
US-8812274-B2
Publication Date
2014-08-19
Expiration Date
2030-04-26
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Methods and systems for creating ensembles of hypersurfaces in high-dimensional feature spaces, and to machines and systems relating thereto. More specifically, exemplary aspects of the invention relate to methods and systems for generating supervised hypersurfaces based on user domain expertise, machine learning techniques, or other supervised learning techniques. These supervised hypersurfaces may optionally be combined with unsupervised hypersurfaces derived from unsupervised learning techniques. Lower-dimensional subspaces may be determined by the methods and systems for creating ensembles of hypersurfaces in high-dimensional feature spaces. Data may then be projected onto the lower-dimensional subspaces for use, e.g., in further data discovery, visualization for display, or database access. Also provided are tools, systems, devices, and software implementing the methods, and computers embodying the methods and/or running the software, where the methods, software, and computers utilize various aspects of the present invention relating to analyzing data.
Core Innovation
The invention provides methods and systems for creating ensembles of hypersurfaces in high-dimensional feature spaces and projecting this data into lower-dimensional subspaces. These hypersurfaces can be generated using supervised learning techniques, such as those based on user domain expertise or machine learning methods, and may optionally be combined with unsupervised hypersurfaces derived from unsupervised learning techniques. The lower-dimensional subspaces are determined based on the ensemble of hypersurfaces, allowing high-dimensional data to be projected for further analysis or visualization.
A significant problem addressed by this invention is the difficulty of accessing, displaying, and interpreting large-scale, high-dimensional data. Traditional methods are limited to sequential or one-dimensional views, and modern data, which is often high-dimensional and complex, cannot be meaningfully captured or visualized this way. Existing unsupervised reduction techniques such as PCA or ICA are limited in their ability to accurately represent category separation achieved by supervised learning methods, resulting in reduced interpretability and missed insights.
The present invention overcomes these shortcomings by combining supervised and unsupervised learning-derived hypersurfaces into a single lower-dimensional space, on which data may then be projected and visualized. This enables improved graphical representation of high-dimensional relationships, supports the direct incorporation of actual or hypothetical data patterns into models, and facilitates intuitive visualization, analysis, and access to large datasets without necessitating deep mathematical expertise from the user.
Claims Coverage
The patent includes several independent claims that define core inventive features around the generation of supervised and unsupervised hypersurfaces, projection into lower-dimensional subspaces, provision of software/computer program products for such analysis, methods for graphic indexing of databases, and methods for detecting deviations from normal in high-dimensional data.
Combined supervised and unsupervised hypersurface subspace projection
A method for analysis of high-dimensional feature spaces comprising labelled data by: - Generating a supervised hypersurface (and its normal vector) using supervised learning techniques. - Generating an unsupervised hypersurface (and its normal vector) using unsupervised learning techniques after label removal. - Selecting a subspace comprising both hypersurfaces. - Projecting data onto an orthonormal basis spanning the subspace defined by these normal vectors. - Outputting the projected data into computer memory.
Computer program product for generating supervised and unsupervised hypersurfaces and projection
A non-transitory computer-usable medium with control logic to: - Generate a first supervised hypersurface and normal vector using supervised learning on labelled data. - Generate a second unsupervised hypersurface and normal vector using unsupervised learning after removing labels. - Select a subspace defined by the hypersurfaces. - Project high-dimensional data onto the orthonormal basis spanning the subspace. - Output the projected data on an output device.
Projection of high-dimensional data using axes from supervised, hypothetical, or actual data patterns
A method for projecting high-dimensional data onto lower-dimensional subspaces by: - Generating axes from labelled data, including vectors normal to supervised hypersurfaces, vectors from hypothetical data patterns, or vectors from actual data patterns. - Generating axes normal to unsupervised hypersurfaces after label removal. - Orthonormalizing the vectors. - Projecting data onto these orthonormalized vectors to form a lower-dimensional subspace. - Outputting the subspace to computer memory.
Computer program product for generating projection axes from supervised, hypothetical, or actual data patterns
A computer-usable medium with code to: - Generate axes from labelled data (including vectors normal to supervised hypersurfaces, vectors from hypothetical or actual data patterns). - Generate axes normal to unsupervised hypersurfaces (after label removal). - Orthonormalize vectors and project data onto these axes creating lower-dimensional subspaces. - Output the subspaces into computer memory.
Graphic database indexing via lower-dimensional subspace projection
A graphic method of indexing a database by: - Generating a lower-dimensional subspace using database data. - Projecting the database data onto this subspace. - Graphically representing the data within the subspace. - Enabling access to database records by selecting the graphical data representation. - Outputting the generated index to computer memory.
Deviation detection in high-dimensional data using typical models from unsupervised hypersurfaces
A method for detecting deviations from a complex typical state, and identifying features responsible, by: - Generating at least two hypersurfaces and their normal vectors using unsupervised learning from typical data in a high-dimensional feature space. - Selecting a lower-dimensional subspace from these hypersurfaces. - Projecting data onto the orthonormal basis spanning the subspace to create a typical model. - Comparing new data to the typical model. - Identifying features mismatched with the typical model. - Outputting the projected typical model into computer memory.
Collectively, these inventive features cover the generation and combination of supervised, unsupervised, hypothetical, and actual data patterns into lower-dimensional subspaces for projection, analysis, visualization, and database indexing, as well as specific computer implementations and methods for detecting deviations from normal in high-dimensional data.
Stated Advantages
Enables accurate and intuitive visualization of high-dimensional data in lower dimensions, revealing complex relationships not apparent in sequential or one-dimensional views.
Combines supervised and unsupervised learning within a single model, providing a more comprehensive and accurate representation than separate models.
Facilitates data discovery, efficient further analysis, and use of advanced analytical tools by reducing the computational burden of high-dimensional data.
Allows the incorporation of hypothetical and actual data patterns directly into models, supporting hypothesis exploration and validation without requiring advanced mathematical expertise.
Improves methods for detecting deviations from typical states in complex datasets without the need for abnormal training examples.
Provides novel graphical database indexing methods that go beyond ordered lists to visually represent and enable access to complex data relationships.
Supports applications and implementations on general computer systems, as interactive software, dashboards, or cloud computing environments.
Documented Applications
Visualization and analysis of high-dimensional biomedical, geospatial, climate, marketing, economic, financial, or surveillance data via projection into lower-dimensional spaces.
Graphical database indexing and exploration, where entire databases are displayed as spatial models, enabling intuitive access and retrieval of records.
Combining supervised and unsupervised models for improved data discovery, including class separation and identification of novel data patterns.
Incorporation of hypothetical and actual data patterns into data models for hypothesis exploration, model comparison, and discovery in research domains.
Detection of deviations from normal or typical states in large-scale data, such as monitoring human health (e.g., gene expression), equipment monitoring, and general data surveillance.
Implementation as software tools, interactive displays, dashboards, or cloud computing systems supporting visual data exploration and alerts.
Interested in licensing this patent?