Interactive complex task teaching system that allows for natural language input, recognizes a user's intent, and automatically performs tasks in document object model (DOM) nodes
Inventors
Allen, James F. • Chambers, Nathanael • Galescu, Lucian • Jung, Hyuckchul • Taysom, William
Assignees
Florida Institute for Human and Machine Cognition Inc (IHMC)
Publication Number
US-7983997-B2
Publication Date
2011-07-19
Expiration Date
Interested in licensing this patent?
MTEC can help explore whether this patent might be available for licensing for your application.
Abstract
A system which allows a user to teach a computational device how to perform complex, repetitive tasks that the user usually would perform using the device's graphical user interface (GUI) often but not limited to being a web browser. The system includes software running on a user's computational device. The user “teaches” task steps by inputting natural language and demonstrating actions with the GUI. The system uses a semantic ontology and natural language processing to create an explicit representation of the task that is stored on the computer. After a complete task has been taught, the system is able to automatically execute the task in new situations. Because the task is represented in terms of the ontology and user's intentions, the system is able to adapt to changes in the computer code while still pursuing the objectives taught by the user.
Core Innovation
The invention comprises a system which allows a user to teach a computational device how to perform complex, repetitive tasks by providing input in natural language and by demonstrating actions with the graphical user interface. The software includes a language understanding module, an intent recognition component, action monitoring, task learning, collaborative actor and a task execution module, and it uses a semantic ontology and a word lexicon to create an explicit representation of the task that is stored on the computer.
The system recognizes a user's intent for each taught step, identifies and generalizes step objectives and task input parameters, and learns to execute steps from demonstrations so that after a task has been taught the system can automatically execute the task in new situations. Because tasks are represented in terms of the ontology and the user's intentions, the system is able to adapt to changes in source code and web page structure and to perform iterative operations over Document Object Model nodes using an alternate GUI for list-based operations.
Claims Coverage
The claims include one independent claim with five main inventive features extracted from the independent claim's recited elements.
Natural dialog-based interface
Providing software which includes a natural dialog-based interface whereby the user can communicate with the software using natural dialog-based language and wherein the software describes tasks and steps in natural language, poses clarifying questions, summarizes task steps and adds to the lexicon as new words appear.
Intent recognition and step identification
For each executable task, recognizing the user's overall intent, identifying a plurality of steps needed to complete the task, detecting the start of each new step from language and observed user actions, and identifying action types and parameters in the ontology.
Learning from demonstrations and task storage
Learning to execute each step from demonstrations provided by the user, providing incremental execution and interaction with the user, and providing and storing task definitions in a database where each task definition includes the steps and step objectives.
Semantic characterization and retrieval
Learning semantic characterization of each task for later retrieval, encoding described tasks in terms of task ontology, retrieving particular task definitions from the database using semantic characterization, and improving task definitions through practice with user instruction.
List iteration with DOM-based GUI reconfiguration
When a task returns a list of results, displaying the list as multiple Document Object Model nodes in a first GUI, responding to user-provided natural language to indicate iteration, automatically creating a second GUI configuration displaying cells with each row representing a DOM node, and after the user demonstrates a task in a first DOM node automatically performing that demonstrated task in all other DOM nodes and displaying the results in the second GUI.
The inventive features cover (1) a natural dialog-based interface with grammar and lexicon support, (2) ontology-driven intent recognition and step identification, (3) learning from demonstrations with task definition storage, (4) semantic characterization and retrieval of tasks, and (5) automated handling of list iteration via DOM-aware GUI reconfiguration.
Stated Advantages
Robustness and adaptability to changes in host-server websites and computer code, allowing the system to continue functioning when interfaces change.
Greater flexibility and generalization compared to conventional macros, because the system reasons in terms of ontology and objectives rather than rote sequences of actions.
Ability to learn from natural language and demonstrated actions so tasks can be taught through spoken or typed instructions and GUI demonstrations.
Dynamic expansion of the lexicon and grammars so new words on web pages are added and recognized during operation.
Documented Applications
Application to virtually any computational device employing a graphical user interface, including computers, cell phones, video recorders, iPods and ATMs.
Operation within stand-alone programs such as spreadsheets and word processors.
Operation within web-based environments and web browsers using HTML, XHTML, CSS and scripts.
Teaching and automating online purchasing tasks, illustrated by the example of teaching the system how to buy a book.
Parsing and iterating over search result lists and tables presented as Document Object Model nodes, including extraction of titles, publications and other fields.
Proximity searches and termination-condition based iterations, illustrated by a MAPQUEST.COM hotel search example with numeric comparison termination.
Interested in licensing this patent?