Please use this identifier to cite or link to this item:
Title: Gain Adapted Optimum Mixture Estimator for Single Channel Speech Separation
Authors: Kapoor, Divneet Singh
Supervisor: Kohli, Amit Kumar
Keywords: Single Single channel speech separation (SCSS), optimum mixture estimator, mixture-maximization (MIXMAX), quadratic estimator, gain adaption
Issue Date: 20-Jul-2011
Abstract: While automatic speech recognition has become useful and convenient in daily life as well as an important enabler for other modern technologies, speech recognition accuracy is far from sufficient to guarantee a stable performance. It can be severely degraded when speech is subjected to additive noises. Though speech may encounter various types of noises, the work described in this thesis concerns one of the most difficult problems in robust speech recognition, corruption by an interfering speech signal with only a single channel of information. This thesis deals with the separation of mixed speech signals from a single acquisition channel; a problem that is commonly referred to as Single Channel Speech Separation (SCSS). This problem is especially difficult because the acoustical characteristics of the desired speech signal are easily confused with those of the interfering masking signal, and because useful information pertaining to the location of the sound sources is not available with only a single channel. The phenomenon of single channel speech commonly occurs due to the combination of speech signals from simultaneous and independent sources into one signal at the receiving microphone, or when two speech signals are transmitted simultaneously over a single channel. An efficient single channel speech separation system is an important front-end component in many applications such as Automatic Speech Recognition (ASR), Speaker Identification (SID), and hearing aids. The separation process of single channel speech consists, mainly, of three stages: Analysis, Separation, and Reconstruction. The central separation stage represents the heart of the system in which the target speech is separated from the interfering speech. At the front, since the separation process works on one segment of single channel speech at a time, a mean must be found in the analysis stage to accurately classify each segment into single or multi-speaker before separation. Precise estimation of each speaker's speech model parameters is another important task in the analysis stage. The speech signal of the desired speaker is finally synthesized from its estimated parameters in the reconstruction stage. In order to have a reliable overall speech separation system, improvements need to be achieved in all three stages. The goal of the thesis is to recover the target component of speech mixed with interfering speech, and to improve the recognition accuracy that is obtained using the recovered speech signal. Various techniques are employed to separate the sources’ signals from the linear mixture of the sources, which include Model-based SCSS, Blind Source Separation, and Computational Auditory Scene Analysis. Usually model based separation is used in which Sources are modelled using Composite Source Modelling making the use of Gaussian Models. The thesis introduces the optimum mixture estimator for the estimation of the mixture, of the two sources’ signals underlying, with different ratios. The mixture is also estimated by various estimators, namely MixMax, Quadratic estimators. Various estimators are compared on the basis of Mean Squared Error, and finally the sources’ signals are estimated from the mixture estimate in Minimum Mean Squared Error sense.
Description: M.E. (ECED)
Appears in Collections:Masters Theses@ECED

Files in This Item:
File Description SizeFormat 
ThesisForLibrary(DivneetSinghKapoor).pdf1.41 MBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.