Machine learning extraction of clinical variable values for subjects from clinical record data

Inventors

Wittmershaus, BrettAmster, GuyWaskom, MichaelRoher, NatalieSingh, NishaPhadke, SharangShapiro, Will

Assignees

Flatiron Health Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-11854675-B1

Patent

Publication Date

2023-12-26

Expiration Date


Abstract

Described herein are techniques of using machine learning to automatically extract clinical variable values for subjects from clinical record data. The techniques designate certain clinical variables as hybrid variables that can be assigned values by machine learning model prediction. The techniques process, using a machine learning model trained to predict a value of a hybrid variable, clinical record data associated with a subject to obtain a predicted hybrid variable value and an associated confidence score. The techniques set the value of the hybrid variable for the subject to the predicted hybrid variable value when the model prediction is of sufficiently high confidence.

Core Innovation

The invention uses machine learning to automatically extract values of clinical variables for a plurality of subjects from clinical record data. It generates, using the clinical record, a dataset for storing values of a plurality of clinical variables, where the plurality of clinical variables include a subset designated as hybrid variables and a subset designated as non-hybrid variables. Hybrid variables can have their values assigned by machine learning model prediction or by manual extraction, while non-hybrid variables cannot have their values assigned by machine learning prediction.

For each subject, values of the hybrid variables in the dataset are set at least in part by processing clinical record data with a machine learning model trained to predict a value of the hybrid variable. The processing obtains a predicted hybrid variable value and an associated confidence score, and the method determines, using the confidence score, whether to set the value of the hybrid variable for the subject to the predicted hybrid variable value.

In response to determining to set the value of the hybrid variable to the predicted hybrid variable value, the method sets the value of the hybrid variable in the dataset to the predicted hybrid variable value. In response to determining to not set the value to the predicted hybrid variable value, the method obtains input indicating a manually extracted hybrid variable value and sets the value of the hybrid variable in the dataset to the manually extracted hybrid variable value. The method sets, for each subject, values of the non-hybrid variables to manually extracted values without obtaining machine learning predicted values of the non-hybrid variables.

Claims Coverage

The independent claims cover three computer-implemented aspects of the same hybrid-variable extraction workflow: a method, a system, and a non-transitory computer-readable storage medium. The shared inventive features are the separation of clinical variables into hybrid variables and non-hybrid variables, confidence-scored machine learning prediction for hybrid variables, and manual extraction fallback, with non-hybrid variables set only by manual extraction.

Hybrid and non-hybrid variable datasets with controlled assignment sources

A subset of clinical variables designated as hybrid variables that can have their values assigned by machine learning model prediction or by manual extraction, and a subset of clinical variables designated as non-hybrid variables that cannot have their values assigned by machine learning prediction; generating, using the clinical record, a dataset for storing values of the plurality of clinical variables for the plurality of subjects.

Confidence-scored prediction for hybrid variables

Processing, using a machine learning model trained to predict a value of the hybrid variable, clinical record data associated with the subject to obtain a predicted hybrid variable value and an associated confidence score; determining, using the confidence score associated with the predicted hybrid variable value, whether to set a value of the hybrid variable for the subject to the predicted hybrid variable value.

Manual extraction fallback for hybrid variables and manual-only setting for non-hybrid variables

In response to determining to not set the value of the hybrid variable for the subject to the predicted hybrid variable value, obtaining input indicating a manually extracted hybrid variable value for the subject; and setting the value of the hybrid variable for the subject to the manually extracted hybrid variable value in the dataset; setting, for each of the plurality of subjects, values of the non-hybrid variables to manually extracted values of the non-hybrid variables without obtaining machine learning predicted values of the non-hybrid variables.

Method implemented as a system

A system comprising at least one processor and at least one non-transitory computer-readable storage medium storing instructions that cause the at least one processor to perform obtaining clinical record data, generating the dataset with hybrid and non-hybrid variables, and performing the confidence-scored hybrid-variable assignment with manual extraction fallback and manual-only setting for non-hybrid variables.

Non-transitory medium implementing the hybrid-variable extraction method

At least one non-transitory computer-readable storage medium storing instructions that cause at least one processor to perform the method of obtaining clinical record data, generating the dataset with hybrid and non-hybrid variables, processing with a machine learning model to obtain a predicted hybrid variable value and associated confidence score, determining whether to set the hybrid variable to the predicted value, using manual input when not set to the predicted value, and setting non-hybrid variables without machine learning predicted values.

Across the independent claims, the shared inventive coverage is the dataset-based separation of clinical variables into hybrid variables and non-hybrid variables, where hybrid-variable values are selected based on a confidence score from machine learning prediction with a manual-extraction fallback, while non-hybrid-variable values are set only from manual extraction without obtaining machine learning predicted values.

Stated Advantages

Improves throughput for extracting clinical variable values (e.g., up to 400%) without accuracy degradation.

Documented Applications

Extracting cancer-related clinical variables from clinical record data, including cancer stage, metastatic cancer diagnosis, and date of metastatic diagnosis.

Using a graphical user interface (GUI) to display predicted values and restrict user modification when confidence is high, with an optional override/confirmation.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.