Video based detection of pulse waveform

Inventors

Speth, JeremyFlynn, PatrickCzajka, AdamBowyer, KevinCARPENTER, NathanOlie, Leandro

Assignees

University of Notre DameSecuriport LLC

Publication Number

US-12343177-B2

Publication Date

2025-07-01

Expiration Date

2042-02-03

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.


Abstract

The video based detection of pulse waveform includes systems, devices, methods, and computer-readable instructions for capturing a video stream including a sequence of frames, processing each frame of the video stream to spatially locate a region of interest, cropping each frame of the video stream to encapsulate the region of interest, processing the sequence of frames, by a 3-dimensional convolutional neural network, to determine the spatial and temporal dimensions of each frame of the sequence of frames and to produce a pulse waveform point for each frame of the sequence of frames, and generating a time series of pulse waveform points to generate the pulse waveform of the subject for the sequence of frames.

Core Innovation

The invention provides systems, devices, methods, and computer-readable instructions for accurately capturing a pulse waveform from a subject without physical contact and with minimal constraints on the subject's movement or position. This is achieved by capturing a video stream including a sequence of frames, processing each frame to spatially locate a region of interest, cropping each frame to encapsulate the region of interest, and processing the sequence of frames by a 3-dimensional convolutional neural network (3DCNN) to determine the spatial and temporal dimensions and produce a pulse waveform point for each frame, ultimately generating a time series of pulse waveform points representing the pulse waveform of the subject.

The problem being addressed is the difficulty of accurate, reliable, and contactless measurement of pulse rate and pulse waveform from video. Existing techniques require physical devices attached to the subject or impose constraints such as controlled posture or close camera positioning. Video-based remote pulse estimation methods operate on spatial and temporal dimensions separately, resulting in limitations such as sensitivity to subject movement and lack of robustness.

This invention addresses these limitations by concurrently processing spatial and temporal dimensions using a 3DCNN that applies three-dimensional kernels with temporally dilated convolutions increasing the temporal receptive field without increasing model size or computational burden. This approach improves robustness to subject movement and talking, enabling accurate pulse waveform estimation over short video intervals, which is advantageous over prior methods that rely on long-term frequency descriptions or handcrafted features.

Claims Coverage

The patent includes three independent claims directed to a computer-implemented method, a system, and a non-transitory computer-readable medium for generating a pulse waveform with several main inventive features.

Video stream capturing and processing for region of interest

Capturing a video stream including a sequence of frames and processing each frame to spatially locate a region of interest, such as a face or multiple body parts, followed by cropping each frame to encapsulate the region of interest.

Use of 3-dimensional convolutional neural network with temporal dilations

Processing the sequence of frames by a 3-dimensional convolutional neural network to determine spatial and temporal dimensions of each frame and produce a pulse waveform point per frame, further modifying the temporal dimension with one or more dilations to enhance temporal receptive field.

Generating pulse waveform and handling subsequences with overlapping frames

Generating a time series of pulse waveform points to form the pulse waveform, with optional partitioning of the sequence into partially overlapping subsequences, applying a Hann function to each subsequence, and recombining the outputs to prevent edge effects.

Fusion of multi-modality video streams

Combining at least two video streams from visible-light, near-infrared, and thermal modalities into a fused video stream processed according to synchronization devices or video analysis techniques.

Calculation of heart rate and heart rate variability from pulse waveform

Calculating heart rate or heart rate variability from the generated pulse waveform to assess physiological parameters of the subject.

The independent claims cover computer-implemented methods, systems, and computer-readable media that capture video streams, spatially locate and crop regions of interest, process sequences with a 3-dimensional convolutional neural network with temporal dilations, generate pulse waveforms as time series, optionally fuse multi-modality video streams, and extract heart rate and variability diagnostics from the pulse waveform.

Stated Advantages

Accurate capture of pulse waveform without physical contact and minimal constraints on subject movement and position.

Increased robustness to movement, talking, and uncontrolled subject posture compared to existing methods.

Improved heart rate estimation without increasing model size or computational requirements by using temporal dilation in the 3DCNN.

Ability to estimate reliable pulse waveforms over short video intervals, not limited to long-term frequency descriptions.

Documented Applications

Use at immigration kiosks, border control booths, and entry gates to capture pulse waveform biometrics.

Integration with electronic devices such as tablets, mobile phones, or computers for video analysis applications including social media or health monitoring.

Distinguishing between liveness and synthetic video, such as detecting deep fake videos, by analyzing pulse waveform differences across regions of interest.

Health monitoring and telemedicine applications including injury precursor detection, impairment detection, vital sign monitoring, stroke and concussion assessment, cognitive testing, recovery tracking, diagnostics, physical therapy, stress and anxiety detection, epidemic monitoring, infant monitoring for sudden infant death syndrome (SIDS), monitoring interest or engagement in activities, and monitoring non-verbal communication cues and deception.

Applications in exercise engagement, entertainment, audience monitoring, and other biometric collection scenarios.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.