Deep multi-task representation learning
Inventors
Amer, Mohamed R. • Shields, Timothy J. • Tamrakar, Amir • Ehrlich, Max • ALMAEV, Timur
Assignees
Publication Number
US-12073305-B2
Publication Date
2024-08-27
Expiration Date
2037-03-17
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
Technologies for analyzing multi-task multimodal data to detect multi-task multimodal events using a deep multi-task representation learning, are disclosed. A combined model with both generative and discriminative aspects is used to share information during both generative and discriminative processes. The technologies can be used to classify data and also to generate data from classification events. The data can then be used to morph data into a desired classification event.
Core Innovation
The invention presents a deep multi-task representation learning system for analyzing multi-task multimodal data. This approach utilizes a combined generative and discriminative model to jointly learn a shared representation of the data through both generative and discriminative processes, trained in unison within a single, non-staged framework. The system processes data with multiple modalities and associated tasks to enable efficient analysis, reasoning, and event detection within complex, temporally- and spatially-varying datasets.
A core problem addressed by this technology is the exponential increase in complexity as the number of features and tasks in machine learning algorithms grows. This complexity demands higher computational resources, such as increased memory and computing power, making it challenging to maintain accuracy and scalability. The disclosed deep multi-task representation learning model mitigates this problem by factorizing feature representations and optimizing parameter sharing, resulting in a more resource-efficient solution.
The system enables both classification and data generation/morphing (e.g., transforming one event into another) by learning inference models using iterative bottom-up (reconstructive) and top-down (generative) approaches. It is capable of inferring missing data within and across modalities, associating semantic event labels, and supporting various use cases, including automated event recognition and advanced data analysis applications. The technology enhances the ability to handle multimodal data streams for multi-task classification, feature identification, and event labeling.
Claims Coverage
The patent contains three independent claims, each introducing unique inventive features related to deep multi-task representation learning and multi-task multimodal data analysis.
Joint optimization of generative and discriminative processes in a non-staged framework
A data analyzer is configured to access multi-task, multimodal data and algorithmically learn a shared representation using a joint optimization of generative and discriminative processes integrated as a single, non-staged framework. The parameters are learned simultaneously and trained in unison, distinguishing this approach from staged or serial processing of separate models.
Learning inference model using iterative bottom-up reconstructive and top-down generative approach
The data analyzer and associated methods learn an inference model through an iterative process that encompasses both a bottom-up (reconstructive) approach and a top-down (generative) approach. This enables the system to efficiently handle classification and data generation tasks within the multi-task, multimodal context.
Algorithmic recognition and semantic labeling of multi-task multimodal events
A non-transitory computer readable medium includes instructions to access dataset instances with multiple modalities and tasks, classify instances using the integrated generative and discriminative processes (trained together), generate semantic labels for recognized events, and implement inference learning through iterative bottom-up and top-down processes.
In summary, the patent claims cover an integrated, jointly optimized hybrid model for deep multi-task representation learning, iterative learning of inference models combining reconstructive and generative processes, and application to event recognition and semantic labeling in multimodal, multi-task environments.
Stated Advantages
Provides a versatile approach that enables learning a shared representation using multiple sets of labels, resulting in factored feature representation and reduced number of model parameters.
Maintains superior classification performance while requiring substantially less computational resources, such as memory, processing power, time, and storage, compared to previous techniques.
Enables joint optimization of generative and discriminative aspects in a single, non-staged framework, eliminating the need for serial or staged processing.
Supports multi-tasking, allowing multiple sets and types of classes to be output, rather than being limited to a single class set or type.
Operates effectively even with incomplete data by inferring missing data within and across modalities, making a full vector of data unnecessary.
Documented Applications
Recognition of multi-task multimodal action-affect and modeling of interpersonal (social) interactions.
Enabling fluid, lifelike human-computer interaction for applications such as training, machine operation, remotely piloted aircraft, surveillance and security systems, flight control systems, video games, and navigation.
Interpretation, search, retrieval, and classification of multi-task multimodal data, including automatic interpretation and classification of online video content and content recommendation.
Multi-task multimodal interpretation, including action-affect or sentiment analysis, or complex event analysis for automated video analysis.
Correlating temporal and cause-and-effect relationships between different multi-task multimodal data streams even when events are not temporally aligned.
Modeling sub-phenomena, such as short-term events, and generalizing those to larger or more abstract event categories.
Improving virtual personal assistant applications, health and fitness monitoring, spoken dialog-based assistants, social media, and multimodal messaging applications.
Providing multi-task learning for body affect, including morphing human body actions/affect (e.g., morphing a neutral walk to a happy walk) for affective computing and motion analysis.
Interested in licensing this patent?