Multimodal retrieval and execution monitoring using rich recipe representation
Inventors
Srivastava, Biplav • PALLAGANI, VISHAL • CHANDRASEKARAN VENKA, REVATHY • KHANDELWAL, VEDANT • LAKKARAJU, KAUSIK
Assignees
Publication Number
US-12332873-B2
Publication Date
2025-06-17
Expiration Date
2043-08-25
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
The disclosure deals with a system and method for improved representation and retrieval of recipes or workflows. Recipes or workflows such as for preparing food or assembling furniture or performing other complex activities exist as textual or image documents, which makes it difficult for machines to read, reason, and handle ambiguity. The present disclosure provides a Rich Recipe Representation (“R3”), which is enhanced with additional knowledge such as outcomes like allergen information, possible failures, and solutions for each atomic step (such as a cooking step). The disclosed R3 is used in a web-based decision support system that helps users perform constrained queries using multiple modalities while also monitoring execution of an agent cooking or otherwise acting based on it.
Core Innovation
The invention provides a system and method for improved representation and retrieval of recipes or workflows through a novel Rich Recipe Representation (R3). R3 is a machine-processable JavaScript Object Notation (JSON) format that unifies information about ingredients, actions, and recipes across multiple modes, such as text, image, and video. This representation is enhanced with additional knowledge for each atomic step, including outcomes like allergen information, possible failures, and solutions.
Existing recipes and workflows are typically formatted as textual or image documents, making it difficult for machines to read, reason, and resolve ambiguity. The problem addressed by this invention is to enable systems to understand recipes or workflows at a granular level, including implicit actions, allergen constraints, and error recovery, to support constrained multimodal search and execution monitoring.
The approach incorporates a searchable repository of rich recipe representations connected to a knowledge graph of external information. Each instruction in R3 can be traced back to its source and is broken down into tasks and atomic actions, each with input and output conditions, alternative actions, and associated tools or failures. This enables constrained queries via multiple modalities (text, image, or both), expressive searching, and fine-grained monitoring of execution, including real-time notifications and suggested solutions for detected failures during workflow execution.
Claims Coverage
The patent contains several independent claims outlining three primary inventive features.
Methodology for multimodal recipe retrieval and execution monitoring
A method comprising: - Providing a user interface to access the system. - Maintaining a database with a knowledge graph of external information, including possible failures and solutions for each atomic action. - Maintaining a searchable repository of rich recipe representations, with each instruction containing atomic actions described in text and images, conditions for execution, and references to the knowledge graph. - Enabling users to input recipe queries as text, image, or their combination and receiving output in similar modalities. - Monitoring the execution of recipe atomic actions and notifying users of failure conditions with corresponding solutions retrieved from the knowledge graph.
Methodology for multimodal workflow retrieval and execution monitoring
A method comprising: - Providing a user interface for workflow tasks. - Maintaining a database with a knowledge graph of external information for workflows and their atomic actions. - Maintaining a searchable repository of rich workflow representations, with each instruction recorded as atomic actions, including descriptions, conditions, and links to the knowledge graph. - Enabling users to make workflow queries in text, image, or combination, and receiving workflow outputs in similar formats. - Monitoring execution of workflow atomic actions, reporting failures, and providing solutions to users via the user interface.
System for multimodal recipe retrieval and execution monitoring
A system comprising: - A user interface for interaction with users. - A database with a knowledge graph containing information about possible failures and solutions for atomic actions. - A searchable repository of rich recipe representations with unified instructions for each recipe, including text, images, execution conditions, and pointers to the knowledge graph. - One or more processors programmed to: receive user recipe queries in text, image, or both; provide query outputs; monitor atomic action execution; notify users of detected failures; and provide relevant solutions based on the knowledge graph.
In summary, the inventive features protect methods and systems for representing, searching, monitoring, and providing solutions for recipes or workflows using a rich, multimodal, machine-processable representation combined with a knowledge graph of external information.
Stated Advantages
The system enables machines to better understand recipes and workflows by representing them with additional knowledge such as potential allergies, failures, and solutions at each atomic step.
The approach allows users to perform expressive and constrained queries about recipe outcomes, process constraints, allergens, and ingredient substitutions using text, images, or both.
The invention facilitates fine-grained execution monitoring and real-time notification of failures and solutions during the execution of recipes or workflows.
The unified recipe or workflow representation improves search quality and helps track progress during execution in various domains.
Documented Applications
Searching and retrieving food recipes based on multimodal queries and constraints such as ingredient availability, allergens, or desired outcome.
Monitoring and guiding the execution of recipes by users or agents, including detecting errors or failures and suggesting corrective actions.
Applying the rich recipe/workflow representation to other industries, including manufacturing, education, and hospitality, to improve workflow search and execution monitoring.
Assembling products or performing complex activities through workflows represented, searched, and monitored with R3.
Interested in licensing this patent?