Automatic Identification of Modal, Breathy and Creaky Voices

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Computers in the past few decades have changed a lot, from size of almost a room to size of one’s palm. Nowadays, even mobile phones are equivalent to a mini computer. Interacting with computer has changed from punched cards to finger tip, but still speech is not widely used as an interaction medium with the computer. This is mainly due to the problems faced during recognizing of speech. Voice quality is one of the reasons for the not so fast and effective growth in the domain speech recognition. This thesis deals with the identification of modal, creaky and breathy voices. An algorithm is presented in this thesis which successfully identifies these three types of voice qualities. The thesis is divided into five chapters. A brief outline of each chapter is given in the following paragraphs. Chapter 1 firstly discusses the basic model of Speech Recognition. Then the issues in Automatic Speech Recognition are discussed which are: noise, voice quality and detection of voiced, unvoiced and silence region. Finally a literature survey on the algorithms and methods used to identify different types of voice qualities is done. Chapter 2 is divided into three parts, i.e., data collection, preprocessing and computation of features. Data collection part describes how data was collected and for how many users it was collected. The preprocessing phase then discusses the preprocessing technique applied (windowing) before features are extracted. Finally feature extraction explains the different features used, like zero crossing rate, fundamental frequency and short time energy. Chapter 3 discusses the facts and results obtained from the features used which are then used to identify the different voice qualities. Finally an algorithm is designed using these features and applied to the data collected. Chapter 4 is divided into two parts the first part displays the output obtained from the algorithm for words spoken in different voice qualities. The next part shows the accuracy obtained for different voice qualities along with the overall accuracy of the algorithm. The algorithm proposed is able to achieve 90.1% accuracy in identifying the modal voices, 89.8 accuracy for breathy and finally 80.7% for creaky voices. iii Chapter 5 concludes the work. It is worth mentioning here that overall accuracy achieved in this work using the proposed algorithm is 87.2%. Also future scope in this domain is discussed in this chapter.

Description

Master of Technology (Computer Science and Applications)

Citation

Endorsement

Review

Supplemented By

Referenced By