Method of enhancing distorted signal, a mobile communication device and a computer program product

Inventors

Kuropatwinski, Marcin

Assignees

MED EL Elektromedizinische Geraete GmbH

Interested in licensing this patent?

MTEC can help explore whether this patent might be available for licensing for your application.

Publication Number

US-11393485-B2

Patent

Publication Date

2022-07-19

Expiration Date


Abstract

A method of enhancing distorted signal having a speech and noise component, with a processing device having memory with stored training information T, comprising a step of removing noise in spectrum domain according to a noise and speech model to obtain a clear signal spectrum, wherein the training information T comprises dictionaries of compact spectra prototypes of speech and noise, speech gains and noise gains forming together composite states and it comprises probabilities of state history. Dictionaries of compact spectra prototypes comprise compact spectra prototypes of a first dimension P1 and of a second dimension P2 where second dimension P2 is higher or equal to the first dimension P1. A mobile communication device having a battery and processing device with connected memory, radio transceiver and audio input, according to the invention is adapted to receive audio signal from audio input and execute the method according to the invention on this signal and then transmit this signal with radio transceiver. A computer program product according to the invention when executed by mobile communicating device causes execution of the method according to the invention.

Core Innovation

The invention relates to a method of enhancing a distorted signal having a speech and noise component by processing each acquired distorted signal frame through an orthogonal transform. The transform provides a distorted signal magnitude spectrum and a distorted signal phase spectrum, and noise is removed in a spectrum domain according to a noise and speech model to obtain a clear amplitude spectrum for the frame. A clear speech signal frame is then synthesized by inverse transforming the clear amplitude spectrum from the spectrum domain to time domain using the distorted signal phase spectrum.

The noise and speech model is defined through training information T that includes dictionaries of compact spectra prototypes of speech and noise. Speech gains and noise gains form composite states i=(i1, i2, i3, i4), and probabilities of speech PS(l) and noise PN(l) state history are included as part of T. The dictionaries comprise compact spectra prototypes of a first dimension P1 and of a second dimension P2, with the second dimension P2 being higher or equal to the first dimension P1.

Noise removal includes computing a first compact spectrum d1 of the first dimension P1 from the magnitude spectrum d and computing a second compact spectrum d2 of the second dimension P2 from the magnitude spectrum d. A set of composite states matching the first compact spectrum d1 is preselected from the training information T, then likelihood and transition probabilities for the preselected composite states are evaluated using an a'priori knowledge model based on the second compact spectra d2 over the set of preselected composite states.

A'posteriori probabilities are computed using forward and backward probabilities calculated on a base of the likelihoods and on a base of the transition probabilities corresponding to at least three frames, and noise compact spectrum {circumflex over (d)}2n and speech compact spectrum {circumflex over (d)}2s are estimated based on the second compact spectrum d2, the training information T, and the a'posteriori probabilities. A speech magnitude spectrum {circumflex over (d)}sk is then recovered based on the estimated noise compact spectrum, the estimated speech compact spectrum, and the distorted signal magnitude spectrum d2.

Claims Coverage

Independent claim clm-00001 defines a complete frame-by-frame speech enhancement method for a distorted signal with speech and noise, including a probabilistic noise/speech model trained with dictionaries of compact spectra and composite-state state-history probabilities. The inventive features in clm-00001 are further constrained in dependent claims, including dimension bounds for compact spectra and specific likelihood and transition probability evaluation components.

Frame-based orthogonal transform with magnitude/phase separation

Computing an orthogonal transform over the distorted signal frame r to obtain a distorted signal magnitude spectrum d and a distorted signal phase spectrum.

Spectrum-domain noise removal using a noise and speech model

Removing noise in a spectrum domain according to a noise and speech model to obtain a clear amplitude spectrum {circumflex over (d)}sk of the frame.

Inverse transformation using clear amplitude spectrum and distorted phase

Synthesizing a clear speech signal frame with inverse transformation of the clear amplitude spectrum {circumflex over (d)}sk from spectrum domain to time domain.

Training information with compact spectral prototype dictionaries and composite states

Wherein the training information T comprises dictionaries of compact spectra prototypes of speech and noise, speech gains and noise gains forming together composite states i=(i1, i2, i3, i4), and probabilities of speech PS(l) and noise PN(l) state history.

Two-dimensional compact spectra prototype structure with P2 ≥ P1

Wherein the dictionaries of compact spectra prototypes comprise compact spectra prototypes of a first dimension P1 and of a second dimension P2, where the second dimension P2 is higher or equal to the first dimension P1.

Two-stage compact spectrum computation with preselection and probabilistic composite-state evaluation

The step of removing noise includes computation of a first compact spectrum d1 of the first dimension P1 from the magnitude spectrum d, computation of a second compact spectrum d2 of the second dimension P2 from the magnitude spectrum d, preselection from the training information T of a set of composite states matching the first compact spectrum d1, evaluation of likelihood and transition probabilities of composite states with an a'priori knowledge model using the second compact spectra d2 over the set of preselected composite states, and computing a'posteriori probabilities of the composite states using forward and backward probabilities calculated on a base of the likelihoods of the preselected composite states and on a base of the transition probabilities corresponding to at least three frames.

Estimating noise and speech compact spectra from a'posteriori composite-state probabilities

Estimation of a noise compact spectrum {circumflex over (d)}2n based on the second compact spectrum d2, the training information T and the a'posteriori probabilities, and estimation of a speech compact spectrum {circumflex over (d)}2s based on the second compact spectrum d2, the training information T and the a'posteriori probabilities.

Recovering speech magnitude spectrum from estimated compact spectra and distorted magnitude

Recovering a speech magnitude spectrum {circumflex over (d)}sk based on the estimated noise compact spectrum, the estimated speech compact spectrum, and the distorted signal magnitude spectrum d2.

Claim clm-00001 covers speech enhancement by combining frame-wise orthogonal transform magnitude/phase handling with spectrum-domain noise removal driven by a noise-and-speech model trained via dictionaries of compact spectral prototypes, composite states built from speech/noise gains, state-history probabilities, and a two-dimensional compact spectra structure (P2 ≥ P1). The method narrows to preselected composite states matching a first compact spectrum, evaluates likelihood and transition probabilities, computes a'posteriori probabilities using forward and backward probabilities over at least three frames, estimates noise and speech compact spectra, and recovers a speech magnitude spectrum used for inverse transformation to time domain.

Stated Advantages

Reduced computational complexity/energy for speech enhancement (e.g., FPGA energy reduction is described).

Quality/compute tradeoff enabled by numeric dimension constraints for compact spectra prototypes (P1 between 4 and 10 and P2 between 7 and 30 are described).

Documented Applications

Enhancement of distorted speech with noise on a mobile communication device including a microphone and a radio transceiver, with battery/memory considerations, is described.

Implementation as a computer program product on a non-transitory computer-readable memory is described.

JOIN OUR MAILING LIST

Stay Connected with MTEC

Keep up with active and upcoming solicitations, MTEC news and other valuable information.