Machine learning system for technical knowledge capture

Inventors

ACHARYA, GIRISHYarnall, LouiseRoy, AnirbanWessel, MichaelYao, YiByrnes, John J.Freitag, DayneWeiler, ZacharyKalmar, Paul

Assignees

SRI International Inc

Publication Number

US-12118773-B2

Publication Date

2024-10-15

Expiration Date

2040-12-21

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

This disclosure describes machine learning techniques for capturing human knowledge for performing a task. In one example, a video device obtains video data of a first user performing the task and one or more sensors generate sensor data during performance of the task. An audio device obtains audio data describing performance of the task. A computation engine applies a machine learning system to correlate the video data to the audio data and sensor data to identify portions of the video, sensor, and audio data that depict a same step of a plurality of steps for performing the task. The machine learning system further processes the correlated data to update a domain model defining performance of the task. A training unit applies the domain model to generate training information for performing the task. An output device outputs the training information for use in training a second user to perform the task.

Core Innovation

The invention describes a system that employs machine learning techniques to capture both explicit and tacit human knowledge required to perform a task. This is achieved by collecting multi-modal data, including video recordings of a user (such as a subject matter expert) performing the task, audio narrations describing the task, and sensor data that records physical actions such as movements, pressure, and force. Additionally, the system can process textual materials such as manuals or domain documents relevant to the task.

A core component of the invention is a computation engine that applies a machine learning system to correlate the collected video, audio, and sensor data, identifying parts of each modality that depict the same step of a multi-step task. The machine learning system further processes this correlated data to update a domain model that formally defines the task's steps, entities, actions, and rules. This domain model can be derived from interviews, narrative explanations, and recognized activities, and is continually refined by integrating data from multiple experts or users.

The updated domain model is then used by a training unit to generate training information, which is output to assist in training other users, such as novices. The output can take the form of augmented reality content, interactive multimedia manuals, or searchable databases where information is cross-referenced across video, audio, sensor, and textual data. The system addresses the need to efficiently and comprehensively transfer complex technical knowledge for tasks where details are often omitted or cannot be effectively conveyed using traditional methods such as videos, written guides, or one-on-one shadowing.

Claims Coverage

The patent includes several independent claims covering systems, methods, and computer-readable media for capturing and processing multi-modal knowledge of task performance using machine learning. The core inventive features can be grouped as follows:

System for capturing and correlating multi-modal data using a domain model

A system consisting of: - A domain model defining a plurality of steps for performing a task. - Video input device to obtain video data of a first user performing the task. - Audio input device to obtain audio data describing performance of the task. - Sensors to generate sensor data during performance of the task. - A computation engine to correlate at least two kinds of data (video, audio, sensor) to identify portions that depict the same step, and to update the domain model based on this correlation. - A training unit to generate training information from the updated model. - An output device to output the training information for training a second user.

Machine learning-based semantic correlation and domain model updating

The computation engine includes a machine learning system configured to: - Extract semantic information from video, audio, and sensor data. - Identify which portions of each modality describe the same step in the task. - Correlate references to objects and actions across video, audio, and sensor data, including physical measurements from sensors, and update the domain model based on these correlations, defining elements such as ontology, entities, actions, events, or rules.

Comparative performance and feedback analysis between users

Extends the system to collect, correlate, and compare data from additional users (second or third), allowing: - Alignment and correlation of performance steps between multiple users. - Identification and analysis of semantic differences or performance discrepancies between users (e.g., expert vs. novice). - Querying users for explanation of these differences and updating the domain model or generating feedback for trainees.

Multi-modal annotation, cross-referencing, and adaptive output

- Video data may be annotated with human pose or object detection data. - System can also incorporate and correlate textual data from domain documents. - Sensor data may describe micromovements, pressures, angles, or interactions. - Output can include augmented reality content, interactive technical manuals, and query-based retrieval of training information.

Method and computer-readable medium for knowledge capture and training

A method and a non-transitory computer-readable medium including instructions to cause: 1. Acquisition of video, audio, and sensor data from a first user performing a multi-step task. 2. Correlation of at least two data types to align steps across modalities. 3. Processing of correlated data to update a domain model describing steps of the task. 4. Generation of training information from the updated model. 5. Output of training information for training a second user. The system supports personalization, querying, and adaptive feedback based on data and model updates.

The claims collectively cover a comprehensive system and method for capturing, correlating, and processing multi-modal data using machine learning to define, update, and deliver structured training information for multi-step technical tasks.

Stated Advantages

Enables efficient capture of both explicit and tacit human knowledge, including subtle or hard-to-articulate aspects, for use in technical training.

Provides highly-focused, experiential training information that may increase training efficiency and reduce costs compared to conventional training methods.

Allows creation of interactive, multi-media instruction manuals and augmented reality content in different languages using AI and machine learning.

Facilitates the identification and transfer of important task information that may be unknown, unrecognized, or subjective to the original expert.

Supports faster and more comprehensive knowledge capture, potentially speeding up the process by three or more times over conventional techniques.

Documented Applications

Training users to perform technical, mechanical, or artisanal tasks using cross-referenced multi-modal training information.

Generating augmented reality content to provide an experiential, first-person perspective of expert task performance for training purposes.

Creating interactive technical manuals or multimedia instruction systems for workplace training in complex or specialized fields.

Capturing and transferring subject matter expert knowledge for tasks such as maintenance processes, machine operation, crafting activities, or instrument calibration.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.