Please use this identifier to cite or link to this item:
Title: Implementation of Statistical Speech to Text Recognition System for Punjabi Language
Authors: Mittal, Shama
Supervisor: Kaur, Rupinderdeep
Keywords: Automatic Speech Recognition;Hidden Markov toolkit;Computer Science
Issue Date: 10-Aug-2016
Abstract: In order to take advantage of a machine’s facilities, it is very important for human beings to interact with them adequately. Thus, this interaction of human beings with the machines fascinating most of the researcher’sintentness towards it these days. The process of recognizing and translating the spoken words, using the technologies which uses computerized devices, such as smart technologies and robotics, is known as Automatic Speech Recognition (ASR).Here, the implementation of the statistical speech to text recognition system is explained and language chosen for this implementation is Punjabi because it is a highly prosodic language and work done on any prosodic language is very less. In development of this speech recognition system, MFCC is used as a feature extraction technique and these features are classified with Hidden Markov Model (HMM). HMM has been implemented using HTK Toolkit. First step for this work is data collection. Here a total of 7 hours data is collected in read speech mode, lecture speech mode and conversational speech mode. After the completion of data collection, the whole data is transcribed using International Phonetic Alphabet (IPA) chart. As a result of transcription, prosodic database consists a vocabulary of 266 unique words including 39 mono-syllable words, 43 bi-syllable words, 110 tri-syllable words and 74 multi- syllable words. Second step is data preparation, in which hmmlist, grammar and dictionary files are created using this vocabulary. Once the data is prepared, 75% of it is used for training with the help of HTK Toolkit and remaining 25% data is used for testing. The experimental results depicts that the accuracy of the recognition system increases from 61.84% to 69.95% in read speech mode, 41.45% to 53.32% in lecture speech mode and 20.24% to 21.00% in conversational speech mode. The accuracy of the system increases with the increase in number of mixtures from 29 to 33(excluding silence) as well as increase in data from 3 hours to 7 hours. When a part of this data is trained and tested for word level speech recognition system, the system shows an accuracy of 57.54 % using triphone models.
Description: Master of Engineering-Software Engineering
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
4056.pdf3.36 MBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.