Eye tracking method and system

Inventors

De Villers-Sidani, EtienneDROUIN-PICARO, Paul Alexandre

Assignees

Innodem Neurosciences

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-11074714-B2

Patent

Publication Date

2021-07-27

Expiration Date


Abstract

A method for training a neural network for determining a gaze position of at least one eye in an initial image comprising the at least one eye. A plurality of training initial images are obtained, of which at least one training color component image is extracted, each of the training initial images respectively comprising at least one eye and a known gaze position. Those are fed into a neural network outputting a respective internal representation for each one of the at least one component image. The neural network is trained by readjusting weights in the neural network to have the respective internal representation for each one of the at least one training color component image more consistent with a respective one of the known gaze position. Once trained, the neural network is used to determine the estimated gaze position relative to a screen of an electronic device.

Core Innovation

The invention relates to training a neural network for determining a gaze position of at least one eye in an initial image comprising the at least one eye. The training uses a plurality of training initial images in which at least one training color component image is extracted from each training initial image, and each training initial image respectively comprises at least one eye and a known gaze position. The method feeds the training color component image(s) into the neural network so the neural network outputs a respective internal representation for each component image, and training readjusts network weights so the internal representations become more consistent with the respective known gaze position.

In the disclosed architecture, a primary stream produces the internal representation for each training color component image, with separate processing for the at least one component image. Downstream of the primary stream, the neural network includes an internal stream that determines an estimated gaze position, where the internal stream includes at least one fusion layer and at least one fully-connected layer. The invention further describes component-image decomposition from the initial image in a single, distinct color, generating a respective internal representation for each component image, and then determining the gaze position in the initial image using the respective internal representations.

The system and method also describe extracting eye regions and handling head-pose effects by using facial landmark recognition, including eye localization and cropping to obtain a cropped eye image used for component extraction. The disclosed approach can incorporate illuminant value information and facial landmark features when combining internal representations through fusion and fully-connected layers to output estimated gaze coordinates. In addition, the document describes calibration image models and calibration images associated with a calibration position to determine a calibrated estimated gaze position, and mapping gaze estimates to electronic-device screen coordinates for user-interface interaction.

Claims Coverage

The document includes two independent claims: a training method for learning a neural network using training initial images with known gaze positions and extracted color component images, and a determination method for computing gaze position from component-decomposed images using respective internal representations. Across the dependent claims, the inventive features emphasize per-component primary-stream processing, downstream internal-stream gaze estimation using fusion and fully-connected layers, and optional eye localization, illuminant and landmark feature integration, and calibration/screen-coordinate transformation.

Training a neural network with component images and known gaze positions

obtaining a plurality of training initial images of which at least one training color component image is extracted, each of the training initial images respectively comprising at least one eye and a known gaze position; feeding into a neural network the at least one training color component image of the training initial images, the neural network outputting a respective internal representation for each one of the at least one component image; training the neural network by readjusting weights in the neural network as the at least one training color component image of the training initial images are fed into the neural network to have the respective internal representation for each one of the at least one training color component image more consistent with a respective one of the known gaze position.

Primary-stream internal representations per color component image

the feeding step applies a primary stream separately to each training color component image to generate an internal representation using at least one convolutional layer followed by at least one fully-connected layer for each component image.

Internal-stream gaze estimation using fusion and fully-connected layers

the estimated gaze position is determined using an internal stream of a neural network downstream of a primary stream, where the internal stream includes at least one fusion layer and at least one fully-connected layer.

Per-component single-color decomposition to obtain internal representations and determine gaze

feeding into a neural network at least one component image, each of the at least one component image comprising a decomposition from the initial image in a single, distinct color, to obtain a respective internal representation for each one of the at least one component image; determining the gaze position in the initial image using the respective internal representation for each one of the at least one component image.

Overall, the claim set is centered on decomposing an initial image into distinct single-color component images, processing each component through primary stream(s) to obtain respective internal representations, and then determining an estimated gaze position using a downstream internal stream that includes fusion layers and fully-connected layers. The training claim further grounds this structure by readjusting network weights so component-image internal representations become consistent with known gaze positions, with dependent features adding eye localization/cropping, color-component constraints, illuminant/landmark integration, and calibration-based output adjustment.

Stated Advantages

Not explicitly described in patent.

Documented Applications

Not explicitly described in patent.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.