Speaker Independent Isolated word speech to text conversion using HTK
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Speech to Text Conversion or Speech Recognition allows a computer to identify the
words that a person speaks into a mike or any other similar hardware and convert it
into written words.
This thesis provides a description of implementation of HMM (Hidden Markov
Model) Based Speaker Independent Isolated Word Speech to Text Conversion System.
The System is developed by using HTK (Hidden Markov Model ToolKit) for Punjabi
language which is an Indo-Aryan language spoken by about 130 million people
mainly in West Punjab in Pakistan and in East Punjab in India. For implementation
of the system, first of all, gathering of data that include 1010 words having 10
records from every 101 distinct words that is Punjabi language counting (0 to 100) is
done from 10 distinct people. Then two sets of data are prepared in which first set of
data obtained the above gathered data and second set of data obtained the above
gathered data obtained after applying noise reduction technique (Auto Spectral
Subtraction) on it. Then for both sets of data, out of 1010 words, 760 words are used
to train the system and 250 words are used to test the system. The system uses the
Mel Frequency Cepstral Coefficients (MFCCs) to extract features from speech files.
For both sets of data, the system is trained and tested at three levels that are word
level, mono-phone level and tri-phone level. The accuracy obtained from the system
is 84.8% at word level, 88% at mono-phone level and 97.2% at tri-phone level for first
set of data and 89.2% at word level, 92.4% at mono-phone level and 98.4% at triphone
level for second set of data.
Description
Master of Engineering-Thesis
