Machine learning method that modifies a core of a machine to adjust for a weight and selects a trained machine comprising a sequential minimal optimization (SMO) algorithm

Inventors

VIRKAR, HemantStark, KarenBorgman, Jacob

Assignees

DIGITAL INFUZION Inc

Publication Number

US-9082083-B2

Publication Date

2015-07-14

Expiration Date

2029-09-10

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

Methods for training machines to categorize data, and/or recognize patterns in data, and machines and systems so trained. More specifically, variations of the invention relates to methods for training machines that include providing one or more training data samples encompassing one or more data classes, identifying patterns in the one or more training data samples, providing one or more data samples representing one or more unknown classes of data, identifying patterns in the one or more of the data samples of unknown class(es), and predicting one or more classes to which the data samples of unknown class(es) belong by comparing patterns identified in said one or more data samples of unknown class with patterns identified in said one or more training data samples. Also provided are tools, systems, and devices, such as support vector machines (SVMs) and other methods and features, software implementing the methods and features, and computers or other processing devices incorporating and/or running the software, where the methods and features, software, and processors utilize specialized methods to analyze data.

Core Innovation

The invention provides machine learning methods, systems, and software for training machines to categorize data and recognize patterns. Training involves providing training data samples with known classes, training multiple learning machines (such as those using a sequential minimal optimization (SMO) algorithm), and selecting the optimal trained machine based on a performance function that measures variables including divergence, cross-validation, number of support vectors, and other criteria derived from the training process. The selected trained machine can then be outputted or used to perform queries on databases to identify matching data samples.

A key aspect of the method is the ability to modify the core of the learning machine, particularly the SMO algorithm, to adjust the influence (weight) assigned to individual training samples. This allows incorporation of prior knowledge or data quality, and facilitates handling of noisy data, user-defined weights, or hypothetical negative examples by appropriately regulating their impact on the trained machine. This innovation provides a flexible way to manage how specific samples affect the training outcome in an automated or user-driven manner.

The problem addressed by the invention is the lack of efficient, accurate, and automated tools for using machine learning techniques—particularly support vector machines—to recognize patterns and categorize data, especially when training data comprising noisy, limited, or imbalanced classes are present. Current methods require expert mathematical knowledge, cannot adequately select the best generalizing machine without test data, and are not designed for flexible integration of sample quality or hypothetical examples. There is also a continuing need for advanced tools for searching large and diverse data repositories beyond annotation-driven approaches.

Claims Coverage

The patent includes one independent claim, which covers four main inventive features.

Training learning machines with SMO algorithm and adjustable sample weights

The invention provides a machine learning method that includes providing training data samples with known classes, training two or more learning machines employing a sequential minimal optimization (SMO) algorithm, and modifying the core of the learning machine to adjust a weight assigned to individual training samples. This approach allows the influence of each sample to be varied during training.

Selection of the optimal trained machine based on performance function

The trained learning machine is selected based on optimization of a performance function dependent on one or more variables between the known classes. The performance function can include, but is not limited to, divergence, cross-validation, number of support vectors, or other quantifiable training-derived criteria.

Flexible assignment of sample weights by user or automatic quality measures

The invention enables assigning sample weights either through user input or automatically, for example by detecting data quality measures within each training sample or reducing the weight of data that contain a high level of noise, hypothetical negative examples, or to adjust the weight of false positive or false negative errors.

Output of the selected machine to computer storage

After selection, the optimal trained learning machine is output to computer memory, making it available for further queries or deployment in analysis tasks.

In summary, the claims broadly cover training of multiple learning machines using SMO with modifiable sample weights, objective selection of the best machine via performance-based optimization, flexible weight assignment based on user input or automated assessment, and storing the selected machine for use.

Stated Advantages

Enables efficient and automated identification of the optimal trained machine without relying on test data, improving ease of use for non-mathematical experts.

Allows dynamic adjustment of individual sample weights, facilitating better handling of noisy or low-quality data and incorporation of prior knowledge.

Improves generalization and classification performance by preventing overfitting to outliers or noisy samples during training.

Permits automated or user-driven sample weight assignment based on quality measures or specific application needs, increasing flexibility.

Increases the speed and performance of machine learning tasks by leveraging an improved SMO algorithm.

Provides more accurate and information-rich assessment of training outcomes via performance functions such as divergence.

Documented Applications

Analyzing biological and gene expression data, such as identifying relationships between gene expression and disease states for diagnosis and prognosis.

Searching and querying large and complex data repositories, including bioinformatics datasets, for pattern recognition and classification.

Identifying features (e.g., genes) relevant for disease pathways, drug discovery, or research directions based on training outcomes.

Applying to other fields characterized by large volumes of data, including climate data, document classification, financial data mining, geospatial analysis, handwriting recognition, information retrieval, speech recognition, strategy-based domains, and vision recognition.

Development of diagnostic tests and treating diseases by detecting specific biological changes identified using the trained learning machine.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.