Methods for mapping data into lower dimensions
Inventors
VIRKAR, Hemant • Stark, Karen • Borgman, Jacob
Assignees
Publication Number
US-10546245-B2
Publication Date
2020-01-28
Expiration Date
2030-04-26
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Methods and systems for creating ensembles of hypersurfaces in high-dimensional feature spaces, and to machines and systems relating thereto. More specifically, exemplary aspects of the invention relate to methods and systems for generating supervised hypersurfaces based on user domain expertise, machine learning techniques, or other supervised learning techniques. These supervised hypersurfaces may optionally be combined with unsupervised hypersurfaces derived from unsupervised learning techniques. Lower-dimensional subspaces may be determined by the methods and systems for creating ensembles of hypersurfaces in high-dimensional feature spaces. Data may then be projected onto the lower-dimensional subspaces for use, e.g., in further data discovery, visualization for display, or database access. Also provided are tools, systems, devices, and software implementing the methods, and computers embodying the methods and/or running the software, where the methods, software, and computers utilize various aspects of the present invention relating to analyzing data.
Core Innovation
The invention provides methods and systems for creating ensembles of hypersurfaces in high-dimensional feature spaces. These hypersurfaces are generated using supervised learning techniques based on user domain expertise, machine learning, or other supervised learning methods. The invention enables the combination of supervised hypersurfaces with unsupervised hypersurfaces derived from unsupervised learning techniques, such as principal component analysis (PCA) or independent component analysis (ICA). Through these methods, lower-dimensional subspaces are determined, allowing data from high-dimensional feature spaces to be projected onto these subspaces for purposes including data discovery, visualization for display, and database access.
The invention addresses significant challenges related to accessing and expressing relationships in large-scale, high-dimensional data, which existing methods typically reduce to sequential, one-dimensional lists that inadequately represent complex inter-relationships among data records. High-dimensional datasets, such as those found in biological or biomedical domains, make visualization and analysis computationally difficult, especially for non-specialists in mathematics. Moreover, existing display methods distort relationships when reducing data to just two or three dimensions, impeding understanding and discovery.
By projecting data onto lower-dimensional subspaces constructed from ensembles of supervised and optionally unsupervised hypersurfaces, the invention enables intuitive, accurate visualizations and analytical representations of complex data. The approach also supports incorporation of actual data patterns and hypothetical patterns into the model, expanding the analytical capabilities and facilitating both data discovery and hypothesis evaluation. The invention’s methods, systems, software tools, and integrated computer implementations provide improved means to visualize, analyze, and access high-dimensional data, making complex relationships in such data accessible and interpretable to users.
Claims Coverage
The patent includes two primary independent claims that define inventive features relating to the generation, combination, and visualization of hypersurfaces in high-dimensional biological or biomedical data spaces, as well as the corresponding system implementation.
Method for generating an integrated data model from supervised and unsupervised hypersurfaces in high-dimensional biological or biomedical data
This method includes: - Obtaining a biological or biomedical data set with a high-dimensional feature space (at least four dimensions) from at least one data repository, the data relating to gene expression, protein expression, or clinical study data. - Generating a supervised hypersurface by applying supervised learning techniques based on known categories in the data set. - Generating an unsupervised hypersurface using unsupervised learning techniques on the same data set. - Combining the supervised and unsupervised hypersurfaces to create a lower-dimensional subspace where the respective vectors normal to each hypersurface determine the subspace’s directions. - Projecting the high-dimensional data onto the created lower-dimensional subspace to generate an integrated data model. - Configuring and displaying the integrated data model as a user interface on a two-dimensional computer display in a pseudo-three dimensional representation.
System for accessing and modeling a high-dimensional feature space with integrated data model display
This system includes: - A memory and a processor configured to store and execute instructions for: - Generating a supervised hypersurface and an unsupervised hypersurface from a biological or biomedical data set (gene expression, protein expression, or clinical study data) of at least four dimensions using supervised and unsupervised learning techniques based on known categories. - Combining these hypersurfaces to create a lower-dimensional subspace, with directions defined by the respective vectors normal to each hypersurface. - Projecting high-dimensional data onto the lower-dimensional subspace to form an integrated data model. - Configuring and displaying the integrated data model as a user interface on a two-dimensional computer display, using three ortho-normalized axes (with at least one axis defined by a vector normal to a supervised or unsupervised hypersurface), and presenting a pseudo-three dimensional representation of the data.
The claims cover methods and systems for generating, combining, and visualizing supervised and unsupervised hypersurfaces from high-dimensional biological or biomedical data, enabling their projection and interactive display on a two-dimensional computer interface as a pseudo-three dimensional integrated data model.
Stated Advantages
Enables more accurate visualization and display of high-dimensional data by projecting it into comprehensible lower-dimensional subspaces.
Combines supervised and unsupervised methods into a single model, providing a more comprehensive data representation than either method alone.
Simplifies the use of complex analytical tools, reducing the computational burden and making advanced analysis more accessible to non-mathematicians.
Allows direct incorporation of hypothetical or actual data patterns into models, supporting hypothesis testing and data exploration.
Facilitates discovery from large-scale data by improving methods for data display, access, and database exploration.
Permits intuitive graphical user interfaces for data visualization and manipulation, including rotating and altering data views.
Assists in detecting deviations from normal data without requiring abnormal examples for training, beneficial for monitoring and surveillance applications.
Documented Applications
Visualization and analysis of high-dimensional biological or biomedical data such as gene expression, protein expression, and clinical study data.
Improved data discovery, visualization, and display for large-scale data in fields including geospatial, climate, marketing, economics, and surveillance.
Database access and exploration using graphical models as indexes, facilitating intuitive data retrieval and representation.
Detection of deviations from normal in complex data, including surveillance, health monitoring, and equipment monitoring.
Presentation and verification of hypotheses by incorporating hypothetical data patterns and comparing them visually to actual data.
Integration into scientific dashboards for alerts and overviews of data analysis results using the invented visualization methods.
Interested in licensing this patent?