System and method for content comprehension and response
Inventors
Divakaran, Ajay • Sikka, Karan • Yao, Yi • Gong, Yunye • NUNN, Stephanie • SAHU, Pritish • COGSWELL, Michael A. • Hostetler, Jesse • RUTHERFORD-QUACH, Sara
Assignees
Publication Number
US-11934793-B2
Publication Date
2024-03-19
Expiration Date
2041-11-01
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A method, apparatus and system for training an embedding space for content comprehension and response includes, for each layer of a hierarchical taxonomy having at least two layers including respective words resulting in layers of varying complexity, determining a set of words associated with a layer of the hierarchical taxonomy, determining a question answer pair based on a question generated using at least one word of the set of words and at least one content domain, determining a vector representation for the generated question and for content related to the at least one content domain of the question answer pair, and embedding the question vector representation and the content vector representations into a common embedding space where vector representations that are related, are closer in the embedding space than unrelated embedded vector representations. Requests for content can then be fulfilled using the trained, common embedding space.
Core Innovation
The invention introduces a method, apparatus, and system for content comprehension and response, which centers on training an embedding space using a hierarchical taxonomy with at least two layers of varying complexity. For each layer of the taxonomy, a set of words is associated with the layer, and question answer pairs are generated based on questions formulated using these words within a specific content domain. Vector representations are then created for both the questions and the associated content, and these vectors are embedded into a common embedding space, ensuring that related question and content representations are closer together than unrelated ones.
Unlike existing content understanding systems that focus on memorization and treat questions independently of their difficulty or interrelationships, this approach systematically incorporates a hierarchical framework (for example, using Bloom’s Taxonomy) to categorize and structure questions by their complexity. The resulting embedding space contains question answer pairs from all layers of the hierarchy, enabling relationships between varying complexities to be determined and leveraged for content comprehension.
The problem being addressed is that current neural network-based content comprehension systems can only answer specific trained questions and cannot generalize to similar questions or related topics which were not part of the training data. The patent’s solution facilitates adaptation and response to new questions or topics by structuring knowledge and embedding relationships in a way that mirrors human-like graded comprehension across varying difficulty levels and domains.
Claims Coverage
The patent claims cover six main inventive features that form the core of the invention.
Training a common embedding space using a hierarchical taxonomy
A method where, for each layer of a hierarchical taxonomy with at least two layers and varying complexity, a set of associated words is determined. For each layer, question answer pairs are generated based on questions using at least one word and at least one relevant content domain. Vector representations for both questions and content are embedded into a single embedding space, such that related representations are closer together than unrelated ones. This embedding space includes embedded question answer pairs for each of the layers, allowing for relationships between questions of varying complexity.
Question answer pair generation via domain-adapted questions
Generating question answer pairs further includes determining at least one stem question for a word in the set and then determining at least one domain-adapted question for the stem question based on a selected content domain. These domain-adapted questions are used to create the final question answer pairs for embedding.
Application of trained embedding space for content search and response
A method for content comprehension and response includes receiving a question, determining its vector representation, projecting it into the trained embedding space, and using a distance function to find content most related to the question by measuring proximity to embedded question answer pair vector representations.
System and machine-readable medium implementations
Implementation of the above methods in a system comprising a processor and memory storing executable instructions, or as a non-transitory machine-readable medium carrying instructions that enable a processor-based system to perform all steps, including embedding, question generation, and response processes as described.
Model determination and adaptation for unrepresented content
Methods and systems include determining a content model for each question answer pair or collectively for all pairs within the taxonomy, and then enabling adaptation of these models to content not directly represented by the original model. Rules may be applied to constrain such adaptation for increased accuracy.
Computational representation of content for cross-domain and cross-layer relationships
Generation of computational representations for content associated with each content domain and for each taxonomy layer, which are then used to determine relationships not only between content domains but also across different taxonomy layers.
In summary, the claims protect methods and systems for structured question and answer embedding using hierarchical taxonomy, domain adaptation, content model adaptation, and computational representations, all implemented across multiple platforms including systems and machine-readable media.
Stated Advantages
Provides a systematic graded approach to knowledge acquisition, advancing comprehension capabilities by structuring tasks according to increasing difficulty using a hierarchical taxonomy.
Enables adaptation and response to questions or topics that the system was not explicitly trained on, by leveraging relationships between question answer pairs in the embedding space.
Allows comprehensive content understanding across different domains and modalities, including text, images, video, and audio.
Facilitates the identification of relationships between content of different domains and complexity levels through the embedding of question answer pairs at multiple layers.
Documented Applications
Semantic content retrieval from the web.
Automatic document summarization.
Multimodal human-computer interaction.
Question answering and feedback for recipes or stories across various content modalities.
Interested in licensing this patent?