Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/5551
Title: Deep Learning Model to Recognize Punjabi Language Speech Commands
Authors: Kaushal, Pranav
Supervisor: Singh, Maninder
Keywords: Deep Learning;Speech Recognition;Voice Recognition;Punjabi Language;Spectrograms
Issue Date: 31-Jul-2019
Abstract: This dissertation deals with the deep learning model to recognize Punjabi language speech commands. The goal of this dissertation is to recognize the Punjabi spoken words using deep neural network model for classification. In the past research work, the main hindrance in the speech recognition is the achieving the high accuracy. To recognize the speech only GMM-HMM model had been explored. With the new adoption of Deep learning, it has over powered the earlier traditional model. With the advancement in the GPU computing power as well usage of cloud computing where hardware is not feasible for running these huge deep neural networks for training with huge dataset. Deep Learning has flow throughout and has achieved the accurate results that has asserted the fact of it outperforming the GMM-HMM model. In this research work, convolutional neural network (CNN) deep learning model has been implemented. The dataset comprises of the audio data (.wav files) captured words like " ਹਾਂ ", " ਨਾ ", " ਉੱਪਰ ", " ਥੱਲੇ "in Punjabi Language. Data set has been targeted to achieve a good balance of distribution among the captured audio files with noise for training of the network, to attained the highest accuracy. The training data sets (speech waveforms) are converted into log-mel spectrograms by calculating the duration in each speech clip, duration of each frame, number of Mel filters and the time between each column of spectrogram. The CNN model architecture comprises of five stack of convolutional layer, ReLU unit and Max pooling unit and further the output from these stacks is passed on to the one fully connected layer. Validation accuracy has been achieved as ≈87% with speech dataset distribution ratio 80:10:10 of the training dataset. The results obtained from this work has shown better performance as compared to the existing work.
Description: Master Thesis
URI: http://hdl.handle.net/10266/5551
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
THESIS.pdf3.79 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.