Development of Phonetic Engine for Punjabi Language
| dc.contributor.author | Mittal, Sakshi | |
| dc.contributor.supervisor | Sharma, R. K. | |
| dc.date.accessioned | 2014-08-19T05:52:40Z | |
| dc.date.available | 2014-08-19T05:52:40Z | |
| dc.date.issued | 2014-08-19T05:52:40Z | |
| dc.description.abstract | Hidden Markov Models have been used to model speech in many speech processing areas. This work presents a 'Phonetic Engine' that has been developed to provide the segmentation and labeling of continuous speech signals of Punjabi speech and provides the phoneme level recognition. This system is based on acoustic phonetic Hidden Markov Models that provide statistical representation of each of the distinct sounds that makes up a word. Segmentation of continuous speech signal is done at phone level and each distinct sound is assigned a 'phoneme' label. To get the statistical representation of each distinct sound, Hidden Markov models (HMMs) are trained at phoneme level using HTK toolkit. HTK is a statistical tool which is used for building HMMs. HMMs are the continuous density Gaussian mixtures. In this work, HMMs are trained for three modes of speech: Read speech mode, Lecture speech mode and Conversational speech mode. In read speech mode, HMMs are trained for male and female speakers separately. In each mode, to check the accuracy of speaker independent Phonetic Engine, performance evaluation has been done. Each HMM model has 5 states in which first and last states are non emitting, and only three remaining states, i.e., state 2, 3 and 4 will have state output distribution. For each HMM (HMMs of all phones) Gaussian mixtures are continuously incremented for example, for first time say, for monophone HMM 'i', Gaussian mixture is computed only once for each state of HMM i, next time Gaussian mixture is computed twice for each state of hmm 'i'. Here, we have taken 32 Gaussian mixtures, it means that we will keep on computing the Gaussian mixtures for each state of each HMM till 32 Gaussian mixtures are not computed for each state of each HMM. In read speech mode, phonetic engine achieved 61.48% accuracy, for lecture speech mode it achieved 46.96% accuracy and for conversational speech mode the accuracy is 22.39%. The overall accuracy of phonetic engine for male and female speakers individually in read speech mode are 61.58% and 52.01%. In read speech mode, first male speaker got an accuracy of 61.21%, second male speaker got an accuracy of 57.91%, first female speaker got an accuracy of 49.96% and second female speaker got an accuracy of 57.76% accuracy. The HTK toolkit has been used in Ubuntu-12.04 32-bit environment. This Dissertation is divided into five chapters. A brief review of these chapters is given below. Chapter 1 includes the introduction of tools used and all the files needed to implement Phonetic Engine. Chapter 2 includes literature survey. This chapter is divided into two sections. First section includes review of literature on existing systems that have been implemented using HTK toolkit and second section includes review of literature on existing systems that have been developed using HMMs but with different tool. Chapter 3 includes the working of Phonetic Engine and the algorithms used to provide the training to HMMs and for checking the accuracy of engine and getting the phoneme level recognition of unknown utterances. Chapter 4 includes the description of data collected in all three modes and the accuracy of Phonetic Engine in each mode. Chapter 5 concludes the work done in this dissertation with an illustration on future scope for the same. | en |
| dc.format.extent | 2943392 bytes | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10266/2965 | |
| dc.language.iso | en_US | en |
| dc.subject | Phonetic Transcription, HMM, Phoneme, ASR | en |
| dc.title | Development of Phonetic Engine for Punjabi Language | en |
| dc.type | Thesis | en |
