System and method of facial analysis

Inventors

Chan, JohnDatta, Sandeep R.WILTSCHKO, Alexander B.

Assignees

Neumora Therapeutics Inc

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-12232848-B2

Patent

Publication Date

2025-02-25

Expiration Date


Abstract

A system for facial analysis includes a camera, a data storage device and a data processing system. The camera takes video of a subject's face, and the data storage device receives and stores the video. The data processing system extracts a pose of the subject's face, and a representation of the subject's facial gesture state. The pose includes the angle and position of the subject's face. The representation includes facial keypoints that are a collection of points on the subject's face. The system then concatenates each data stream to align the data streams in time, extracts a plurality of facial syllables from the aligned data streams, and compiles the facial syllables into a series of state sequences. Based on the series of state sequences, the system extracts a behavioral fingerprint for the subject that provides a summary of the subject's state over a given period of time.

Core Innovation

The invention concerns a facial analysis system that takes a video of a subject's face having a plurality of frames, and generates pose data by extracting a plurality of poses from the frames. Each pose includes an angle and a position of the subject's face. In parallel, the system generates a facial gestures state data stream by extracting facial-gesture representations from the frames, where each representation includes facial keypoints as a collection of points on the subject's face.

The system concatenates the pose data stream and the facial gestures state data stream to align the data streams in time. From the aligned data streams, the system extracts a plurality of facial syllables and compiles the plurality of facial syllables into a series of state sequences. The extracted facial syllables and state sequences form the basis for summarizing facial behavior over time.

The invention further extracts a behavioral fingerprint for the subject based on the series of state sequences, where the behavioral fingerprint provides a summary of the subject's state over a given period of time. Extracting the behavioral fingerprint includes calculating a histogram of state frequencies based on the series of state sequences, with the histogram indicating how often each of the plurality of facial syllables occurs in the series of state sequences.

In certain implementations, facial syllables are extracted using a sticky hierarchical Dirichlet process with an autoregressive-emission hidden Markov model. The framework models facial expressions as smoothly varying trajectories over time, models grammatical structure between syllables with the hidden Markov model, and uses the sticky modifier to model syllable durations.

Claims Coverage

The independent claims are clm-00001 (system for facial analysis), clm-00010 (method of facial analysis), and clm-00019 (system for subject analysis for a portion of a subject). Across the independent claims, the inventive features cover time-aligned pose and facial-gesture/keypoint representations, extraction of facial syllables, compilation into state sequences, and behavioral fingerprint extraction using a histogram of state/syllable frequencies.

Video-based pose stream extraction

Extracting, from the plurality of frames, a plurality of poses of the subject's face (or portion of the subject), each pose including an angle and a position of the subject's face (or portion of the subject).

Keypoint-based facial gesture state representation extraction

Extracting, from the plurality of frames, a plurality of representations of the subject's facial gesture state (or gesture state), each representation including facial keypoints (or keypoints) as a collection of points on the subject's face (or portion of the subject).

Time alignment by concatenating pose and gesture state streams

Concatenating the pose data stream and the facial gestures state data stream (or gestures state data stream) to align the data streams in time.

Facial syllable extraction from aligned streams

Extracting a plurality of facial syllables (or syllables) from the aligned data streams.

State sequence compilation of syllables

Compiling the plurality of facial syllables (or syllables) into a series of state sequences.

Behavioral fingerprint extraction via histogram of syllable/state frequencies

Extracting a behavioral fingerprint for the subject based on the series of state sequences, the behavioral fingerprint providing a summary of the subject's state over a given period of time, wherein extracting the behavioral fingerprint includes calculating a histogram of state frequencies based on the series of state sequences, the histogram indicative of how often each of the plurality of facial syllables (or syllables) occurs in the series of state sequences.

The claim set centers on deriving a time-aligned sequence representation from pose and facial-gesture/keypoint representations, transforming that representation into facial syllables and state sequences, and then producing a behavioral fingerprint summary by calculating a histogram of state/syllable frequencies. Dependent claim refinements specify particular camera modalities, face-region extraction, selected facial keypoints, latent embeddings for gesture-state extraction, and a sticky hierarchical Dirichlet process with an autoregressive-emission hidden Markov model for syllable extraction.

Stated Advantages

Provides a summary of the subject's state over a given period of time based on the behavioral fingerprint.

Computes a behavioral fingerprint using a histogram of state frequencies indicative of how often each facial syllable occurs in the series of state sequences.

Documented Applications

Predicting pain level, anxiety level, depression level, hunger, satiety, and fatigue from the behavioral fingerprint, including pre-event and post-event behavioral summaries.

Providing decision-support device outputs based on behavioral-fingerprint summaries and classifications (including pre-event vs post-event).

Detecting placebo-related facial differences as a described example use case.

Comparing performance against FACS/expert scoring for pain in described example results.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.