Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks (CNNs)

Thakur, Aditi

Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks (CNNs)

Files

4629.pdf (1.73 MB)

Date

2017-08-11

Authors

Thakur, Aditi

Supervisors

Verma, Karun

Abstract

Speech is one of the most natural ways a human interacts and expresses. It is the most convenient form of giving an input to a system. With advancements in technology almost every object that surround humans is slowly progressing towards being automated. This means that in near future almost everything will be controlled using voice or gestures. Slowly and steadily the count of devices and objects that we come across daily in our lives being speech recognizable is increasing like ATMs for visually impaired people and various applications can be supported with speech recognizing system to provide employment opportunities for the differently abled people. But achieving good accuracy in speech recognition and making the speech recognition system noise robust has always been one of the main concerns of this research area. The model that has dominated the speech recognition field has been GMM-HMM, but with the advancement in the big data field and the computing power, the deep net models have leveraged these gains and used them to outperform GMM-HMM model .But still there is a race of minimizing the error rate. Achieving accuracy for speech recognition has been a huge obstacle in the domain of Natural Language Processing. The model used predominantly for recognizing speech is GMM-HMM. But with the boom of Deep learning, it has took primacy over the earlier model. With the advancement in the parallel processing and usage of the GPU power, Deep Learning has emanated throughout and has set forth results that has asserted the fact of it outperforming the GMM-HMM. In this research work we implemented deep learning algorithm - Convolutional Neural network (CNN) with the purpose of achieving good accuracy using the data set. The data is audio data (.wav files) capturing recital of counting from 0 to 100 in Punjabi Language. Data has been targeted to achieve a good balance of male and female speakers. The CNN model architecture comprises of four stack of convolutional layer , ReLU unit and Max pooling unit and further the output from these stacks is passed on to the two fully connected layer . The first fully connected layer has a drop out of 25%. The results obtained from this work has shown better performance as compared to the existing work.

Description

Master of Engineering -CSE

Keywords

Convolutional Neural Network, Speech Recognition, Dropout, Pooling, Back Propagation, Gradient Descent

URI

http://hdl.handle.net/10266/4629

Collections

Masters Theses@CSED

Full item page

Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks (CNNs)

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By