Speaker Independent Isolated word speech to text conversion using HTK

dc.contributor.authorMittal, Shweta
dc.contributor.supervisorVerma, Karun
dc.date.accessioned2014-08-05T11:27:18Z
dc.date.available2014-08-05T11:27:18Z
dc.date.issued2014-08-05T11:27:18Z
dc.descriptionMaster of Engineering-Thesisen
dc.description.abstractSpeech to Text Conversion or Speech Recognition allows a computer to identify the words that a person speaks into a mike or any other similar hardware and convert it into written words. This thesis provides a description of implementation of HMM (Hidden Markov Model) Based Speaker Independent Isolated Word Speech to Text Conversion System. The System is developed by using HTK (Hidden Markov Model ToolKit) for Punjabi language which is an Indo-Aryan language spoken by about 130 million people mainly in West Punjab in Pakistan and in East Punjab in India. For implementation of the system, first of all, gathering of data that include 1010 words having 10 records from every 101 distinct words that is Punjabi language counting (0 to 100) is done from 10 distinct people. Then two sets of data are prepared in which first set of data obtained the above gathered data and second set of data obtained the above gathered data obtained after applying noise reduction technique (Auto Spectral Subtraction) on it. Then for both sets of data, out of 1010 words, 760 words are used to train the system and 250 words are used to test the system. The system uses the Mel Frequency Cepstral Coefficients (MFCCs) to extract features from speech files. For both sets of data, the system is trained and tested at three levels that are word level, mono-phone level and tri-phone level. The accuracy obtained from the system is 84.8% at word level, 88% at mono-phone level and 97.2% at tri-phone level for first set of data and 89.2% at word level, 92.4% at mono-phone level and 98.4% at triphone level for second set of data.en
dc.description.sponsorshipComputer Science and Engineering, Thapar University, Patialaen
dc.format.extent2178834 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/2828
dc.language.isoen_USen
dc.subjectAutospectral Subtraction featureen
dc.subjectNoise Reductionen
dc.subjectSpeech to Text Convesionen
dc.subjectTriphone Modelen
dc.subjectMonophoneen
dc.subjectcomputer scienceen
dc.titleSpeaker Independent Isolated word speech to text conversion using HTKen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2828.pdf
Size:
2.08 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: