Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/1775
Title: Segmentation of Punjabi Speech Signals into Phonemes Using Hidden Markov Models
Authors: Bansal, Divya
Supervisor: Jindal, Khushneet
Keywords: Hidden Markov Models;HMM Tool Kit;Punjabi Speech Corpus;Speech Synthesis;Phoneme Generation
Issue Date: 23-Jul-2012
Abstract: Recently, the use of computers in speech synthesis has become an important area of research among speech and computer scientists and linguists. Speech synthesis refers to the artificial production of human speech. For this purpose, speech synthesis systems often called text-to-speech (TTS) systems are developed that "read" text from a document, Web page etc. and generate speech in the form of audio wave or mp3 files. These systems (TTS) are very useful majorly for visually impaired people especially those having poor vision or visual dyslexia, for illiterate people who can understand spoken native language, for educational and research purposes. All TTS systems are developed with the aim to produce high quality synthesized speech which is both natural, intelligible, can be correctly understood and interpreted by the user. This thesis attempts to implement speech synthesis support for Punjabi language in mobile device. It is achieved by segmenting a speech database into smaller units using HMM Toolkit (HTK) based on hidden markov models (HTS approach) that are further concatenated to generate speech signals. The proposed system converts English text in the form of the caller’s name stored in contact list into Punjabi speech in mobile phones. The input text data is initially processed in pre-processing stage for titles like Mr. Tapas, numbers like in Bharat1234, initials like K.K. Sharma and thereafter, the processed data is used in training and testing phase of HTK. With the help of HTK, various HMM acoustic models are firstly trained using spectral features (Mel-Cepstral Coefficients) extracted from the recorded Punjabi speech corpus and various context-independent monophones and context-dependent triphones models are generated. For example for word “bharat” generated monophones are a, bh, t etc. & triphones are bh-a+r. Later in the testing phase, correct phoneme sequence from a network of all possible combinations is generated corresponding to the test sample word using HMM models and feature vectors like for the word “Tapas” the output phoneme sequence is ਤ, ਪ, ਸ instead of phoneme sequence ਟ, ਪ, ਸ. These phoneme sequences are given as input to the application to generate speech signals by concatenating the phonemes.
Description: M.Tech. (Computer Science Applications)
URI: http://hdl.handle.net/10266/1775
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
1775.pdf2.18 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.