Interactive system using speech recognition and digital media

Inventors

Tomasic, AnthonyThiessen, ErikEng, CassondraOgan, Amy

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Assignees

Carnegie Mellon University

Carnegie Mellon University is a global research institution based in Pittsburgh, Pennsylvania, recognized for interdisciplinary education, research, and innovation in science, engineering, arts, technology, and social sciences. The university leads advancements in artificial intelligence, robotics, digital health, and performing arts. Located in a technology-driven and culturally rich city, CMU powers real-world impact through research centers, industry engagement, workforce training, and initiatives that shape regional and global communities.

Publication Number

US-12444415-B2

Patent

Publication Date

2025-10-14

Expiration Date


Abstract

A system for interactive system using speech recognition and digital media is described. The system uses automated speech recognition and recognizes interactions from users to execute digital media items. The interactions are based on behavior of the user. The user is given a prompt. If the student responds to a prompt correctly, the student is rewarded with an animation. Otherwise the user experience continues without a reward. The system recognizes natural language responses for interactions of the user. The media item is dynamically generated as the user interacts with the digital media item.

Core Innovation

The invention describes an AI speech-recognition interactive digital media system that receives user input to interpret instructions in relation to a media state. The system operates on portions of an interactive media item and uses criteria associated with a first state to determine whether to provide reward media. When the criteria are satisfied, the system generates additional media representing objects, concepts, or both objects and concepts specified by the user input and incorporates the additional media into the portion for a constructive media experience.

For a first state, the system generates executable code representing a playback configuration that includes generation of additional media and incorporation of the additional media into the portion of the media item. The playback configuration further includes causing playback of the portion in a second state based on which objects, concepts, or both objects and concepts are specified in the user input. If the criteria are not satisfied, the portion is repeated without presentation of the reward media item, relative to one or more prior states.

The system includes a conversation architecture in which speech recognition (ASR) and natural language understanding (NLU) interpret user prompts and user input, and a backend produces machine learning actions and natural language generation to support interactive playback. The architecture uses an interaction script with finite-state automata/state machine and multiple interaction types, including regular voice, multi-choice, NLU constructive, compare and contrast, and evaluation. The document also describes logging for learning-science analytics and A/B experimental design, and a device/controller workflow that measures interactions via sensors and conditionally generates reward media and causes presentation or repetition without reward.

Claims Coverage

The partial content includes two independent claims. Each independent claim defines a state-based, criteria-driven interactive media presentation system that generates reward media incorporating user-interpreted objects/concepts and either advances with conditional playback or repeats without reward.

State-based interactive reward media playback

The system presents portions of an interactive media item and determines whether criteria are satisfied, where the meaning of the particular interaction is interpreted based on a state associated with the portion. In response to determining that the criteria are satisfied, the system generates a reward media item associated with objects, concepts, or both objects and concepts represented by the particular interaction, incorporates the reward into the portion, and causes presentation of the interactive media item incorporating the reward; otherwise, it repeats the portion without presentation of the reward media item.

Executable playback configuration based on user-specified objects or concepts

A server system receives at least one instruction specifying a portion of a media item and at least one event associated with the portion, where the event represents a solicitation of a user input to a playback device during playback when the playback of the media portion is associated with a first state. The controller generates executable code representing a playback configuration, including generation of additional media based on the user input when the playback is associated with the first state, incorporation of the additional media into the portion for a constructive media experience relative to one or more prior states, and in a second state causing playback of the portion including the additional media based on which objects, concepts, or both objects and concepts are specified in the user input.

Sensor-measured interactions interpreted using media-state meaning

A device configured for presenting an interactive media item includes a user interface configured to present portions, at least one sensor configured to measure interaction of the user with the user interface and generate sensor data, and a controller receiving the sensor data. The controller obtains sensor data indicative of a particular interaction, determines whether the particular interaction satisfies one or more criteria based on a state associated with the portion, and uses the determination to generate reward media item and incorporate it or repeat the portion without the reward.

Across the independent claims, the inventive core is a state-associated, criteria-driven interactive media system that interprets user input as objects/concepts, conditionally generates reward media, incorporates the reward into the presented media portion, and either plays an altered portion in a second state or repeats the portion without reward.

Stated Advantages

Not explicitly described in patent.

Documented Applications

Not explicitly described in patent.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.