Multimodal dynamic attention fusion

Inventors

Kollada, MatthewBanerjee, Tathagata

Assignees

Neumora Therapeutics Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-12087446-B2

Patent

Publication Date

2024-09-10

Expiration Date


Abstract

Methods and systems are provided for diagnosing mental health conditions using multiple data modalities. In particular, a trained machine learning model is used for mental health diagnosis, wherein the trained model utilizes a dynamic fusion approach for capturing and preserving interactions as well as timing information between the multiple data modalities.

Core Innovation

The invention provides a device and computing device for mental health evaluation by extracting timing-preserving dynamic embeddings from time series data output from at least two types of sensors. The time series data are pre-processed to a common timing resolution and converted into dynamic embeddings that include timing information for each data representation. Dynamic embeddings from multiple modalities are learned using a plurality of unimodal encoders.

The invention combines the modality-specific dynamic embeddings using a modality combination model that includes one or more transformer encoders and one or more positional encoders. The transformer encoders take as input the positionally encoded data representations using the one or more positional encoders. This processing outputs a combined dynamic data representation and/or a combined set of multimodal dynamic embeddings that includes timing information.

The invention further determines a mental health diagnosis by using diagnosis determination logic that includes a supervised machine learning model trained using a dataset of time series data relating to mood disorder symptoms collected from users and corresponding outcome labels of mental health conditions. The determination is based on the relevance of the combined dynamic data representation to a mental health diagnosis, and the diagnosis determination logic outputs one or more mental health diagnoses.

Claims Coverage

The partial content includes three independent claims. Across these claims, the coverage centers on timing-aware dynamic embeddings from multiple sensor modalities, multimodal fusion using transformer encoders with positional encodings, and diagnosis using supervised models trained on user-collected mood-disorder time-series data with outcome labels.

Timing-aware dynamic embeddings from at least two sensor types

A memory coupled to a control system performs modality processing logic that processes time series data output from at least two types of sensors, pre-processed to a common timing resolution, to output a set of data representations for each sensor type, where the set of data representations comprises a set of dynamic embeddings including timing information for each data representation.

Multimodal fusion using transformer encoders with positional encodings

A modality combination logic processes the set of data representations via a plurality of unimodal encoders to output a combined dynamic data representation including the timing information, where the modality combination logic comprises one or more transformer encoders and one or more positional encoders, and the one or more transformer encoders take as an input the set of data representations positionally encoded using the one or more positional encoders.

Supervised diagnosis using mood-disorder time-series labels

A diagnosis determination logic including a supervised machine learning model is trained using a dataset of time series data relating to mood disorder symptoms collected from users and a set of corresponding outcome labels of mental health conditions, and the modality combination logic determines a mental health diagnosis based on a relevance of the combined dynamic data representation to a mental health diagnosis.

Across the independent claims, the inventive scope is directed to processing multimodal time series into timing-aware dynamic embeddings using unimodal encoders, fusing the embeddings using transformer encoders with positional encodings to produce combined dynamic multimodal representations that include timing information, and diagnosing mental health conditions using supervised machine learning models trained on user-collected mood-disorder time series with outcome labels.

Stated Advantages

Provides interpretability via SHAP-based time-series feature importance.

Reported median F1 performance improvements versus static or alternative fusion baselines.

Supports robustness/efficiency advantages such as shorter monitoring.

Supports remote quality control flags.

Documented Applications

Mental health evaluation and diagnosis of mood disorder conditions based on time series data relating to mood disorder symptoms collected from users.

Diagnosis/evaluation for depression using PHQ-9, anxiety using GAD-7, and anhedonia using SHAPS.

Remote smartphone data collection for monitoring, including use of remote quality control flags.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.