Speaker Dependent Hindi Speech Recognition using Optimized Classifiers
| dc.contributor.author | Mittal, Teena | |
| dc.contributor.supervisor | Sharma, R. K. | |
| dc.date.accessioned | 2016-08-02T06:58:55Z | |
| dc.date.available | 2016-08-02T06:58:55Z | |
| dc.date.issued | 2016-08-02 | |
| dc.description.abstract | Speech is the most natural way of human communication. The variability in speech signal makes automatic speech recognition (ASR) a challenging task. The variability in speech depends on environmental conditions, speaker attributes such as emotion, age, gender and many other factors. Speech recognition is one of the most promising fields of current research due to its versatile applications. Many international organizations as well as research groups are working in this field. The performance of ASR systems has been improved since last decade and now it has been used for many practical applications.Even now, there are lot of possibilities to improve their recognition rate, speed, vocabulary and usefulness for the end-users.Another issue with ASR system is that it has not been developed for a good number of languages due to limited data availability and proper statistical framework of acoustic and language models. Hindi is the national language of India and people in several other Asian countries can easily understand and speak it. So there is a need to develop an efficient ASR system for Hindi language. The main objective of ASR is to build a system that can map the acoustic signal into a string of words. An ASR system has two main elements,i.e., front-end processor and back-end classifier. The front-end processor is used to extract speech features or parameters. These features are processed by a back-end classifier, for speech recognition. The Artificial neural network (ANN), support vector machine (SVM) and hidden Markov model (HMM) classifiers have been widely used for speech recognition. An ANN is inspired by biological neural network and it processes input information using an interconnected group of artificial neurons and a connectionist approach to computation. Training of an ANN is a tedious task, because search space is high dimensional and multimodal. An ANN training needs efficient optimization techniques to search a set of weights and biases that minimizes the error. The most commonly used training algorithm is the back-propagation algorithm. It is based on gradient search, and may get trapped in local optimum solution for non-linearly separable pattern classification problems. Another promising classifier is support vector machine. It works on the principle of structural risk minimization. SVM has generated a lot of interest in the pattern recognition community in recent years; still optimum parameter selection of SVM kernel is a vital issue for it. In spite of universally acceptance of HMM to recognize speech, one of the main concerns with HMM is related to training phase. The training of HMM is computationally expensive and solution usually stagnated at local optimal solution. The Baum-Welch algorithm is widely used algorithm to train HMM, but it is conventional optimization method and quality of solution highly depends on initial search point. Normally the solution obtained from BW algorithm may converge to local optimum solution. In this thesis work, various aspects of speech recognition system have been explored and some techniques have been proposed to improve speech recognition rate. The main contribution of this research work is as follows: (i) Two databases, namely, Hindi speech words database and Hindi sentences database have been prepared in this work. The Hindi words database consists of twenty words with fifty utterances of each word spoken by two male and two female speakers. The Hindi sentences database consists of sixteen sentences with four utterances of each sentence spoken by two male and two female speakers. The recording has here been done in a quiet room environment with sampling frequency of 44.1 kHz. (ii) To search optimum weights and biases of ANN, two optimization techniques have been proposed. First technique is predator influenced civilized swarm optimization (PCSO) in which swarm particles are divided into a number of societies and global best particle of the swarm is chased by predator particle. The predator effect helps to exploit the search area more effectively. Second technique is based on integration of global and local search techniques. In this technique, predator prey optimization (PPO) has been considered as the global search technique and Hooke-Jeeves method is undertaken as local search technique. In predator prey optimization with Hooke-Jeeves method (PPO-HJ), initial search is performed by PPO technique and in order to further enhance the search, global best solution obtained from PPO is given as input to Hooke-Jeeves method. (iii) For SVM classifier, the hyper-parameters have been optimized by proposed PPO-HJ technique. (iv) A mixed variable PPO (MVPPO) technique has also been proposed in this work. The mixed variable PPO with Hooke-Jeeves (MVPPO-HJ) method is applied for the selection of an appropriate feature set and also for the selection of optimized hyper-parameters. (v) For training of HMM classifier, PPO and PCSO optimization techniques have been integrated with BW algorithm. (vi) For continuous speech recognition, two hybrid classifier models have been proposed. These are optimized ANN-HMM and optimized SVM-HMM classifiers. In the optimized ANN-HMM hybrid model, the weights and biases of ANN are optimized with PPO-HJ technique and output of ANN is used to estimate the posterior probabilities of HMM. In optimized SVM-HMM hybrid model, SVM hyper-parameters are optimized with PPO-HJ technique and posterior probabilities of HMM are computed from SVM. (vii) An Interface has also been developed for speech recognition system. The chapter one presents the history of ASR system and detail of ASR system components. This chapter also enlists the need for study, objectives of the present study, and outlines the organization of thesis. The chapter two provides brief literature review on various aspects of ASR system. Besides this, review of Hindi speech recognition and optimization techniques in the field of pattern recognition has been done in this chapter.The intent of chapter three is to recognize isolated speech words using ANN classifiers. In this chapter, two hybrid optimization techniques are proposed to search optimum set of weights and biases of ANN. The linear predictive coding coefficient (LPCC), Mel-frequency cepstral coefficient(MFCC) and wavelet packet Mel-frequency cepstral coefficient(WPMFCC) features are extracted from speech signal to conduct the experiments. In chapter four, SVM classifier is explored for speech recognition purpose. In this chapter, the effect of dynamic frame size for feature extraction has been investigated. Other important issues are selection of appropriate feature set of speech and SVM kernel parameters. So, a hybrid optimization technique (MVPPO-HJ) is proposed to improve the learning ability of SVM and to select the most appropriate feature set. The experimental results obtained by proposed technique using SVM classifier shows satisfactory recognition rate. Further, ROC curve has also been analyzed to verify sensitivity and specificity of the results obtained by MVPPO-HJ technique with SVM.In chapter five, Hindi speech recognition system for isolated words and continuous speech has been developed using HMM. In this chapter, two global search techniques have been integrated with BW algorithm to search HMM model parameters,i.e., transition and emission probabilities. To evaluate the performance, average log likelihood values have been computed during training process. The intent of chapter six is to recognize isolated words and continuous speech using optimized hybrid classifiers. Two optimized hybrid classifiers, i.e., ANN-HMM, SVM-HMM have been proposed. The PPO-HJ technique is applied to optimize weights and biases of ANN in ANN-HMM classifier and RBF kernel parameters in SVM-HMM classifier. Finally, chapter seven presents the inferences drawn from the results of the various experiments conducted in this thesis. Also, some pointers to the further research on the topic under consideration in this thesis are discussed briefly in this chapter. | en_US |
| dc.identifier.uri | http://hdl.handle.net/10266/3981 | |
| dc.language.iso | en | en_US |
| dc.subject | SVM | en_US |
| dc.subject | ANN | en_US |
| dc.subject | HMM | en_US |
| dc.subject | Parameter optimization | en_US |
| dc.subject | Speech recognition | en_US |
| dc.title | Speaker Dependent Hindi Speech Recognition using Optimized Classifiers | en_US |
| dc.type | Thesis | en_US |
