A Novel Approach Towards Devanagari Transliteraton Using Statistical and Structural Feature Extraction
| dc.contributor.author | Kaur, Jasmine | |
| dc.contributor.supervisor | Kumar, Vinay | |
| dc.date.accessioned | 2016-08-29T10:26:48Z | |
| dc.date.available | 2016-08-29T10:26:48Z | |
| dc.date.issued | 2016-08-29 | |
| dc.description.abstract | Majority of the ancient Indian literature such as Bhagavad Gita, Vedas, Mahabharata, and Ramayana is written in Devanagari script. Devanagari script is popular in India and is known by just a small fraction of population whereas Roman script is widely adopted all over the world. To make the rich voluminous Indian literature readily available to the people who are unfamiliar with Devanagari script, transliteration of the Devanagari documents into a much familiar Roman script is the way to go. This dissertation attempts in Romanization of Devanagari document using character recognition with the help of underlying statistical and structural properties of the characters. The character recognition process interprets the document images and converts the text into editable format. Moreover automation of this process will greatly reduce the human interference while converting the Devanagari text documents to much familiar and editable roman script. However it is a challenging task because of the complex structure and enormity of Devanagari character set as compared to limited size of roman alphabets. One of the first tasks performed to isolate the constituent characters is segmentation. Line segmentation methodology in this dissertation discusses the case of overlapping and skewed lines. Overlapping line segmentation is based on number of connected components which is made equivalent to number of individual lines in the image. Mathematical morphological operation, closing and dilation to be exact are used to limit skew angle variation range thereby expediting the projection profile method of skew correction. The presented skew correction method works for full range of angles. The proposed character segmentation algorithm is designed to segment conjuncts and separate shadow characters. Presented shadow character segmentation scheme employs connected component method to isolate the character, keeping the constituent characters intact. Statistical features namely different order moments like area, variance, skewness and kurtosis along with structural features of characters are employed in two phase recognition process. After recognition, constituent Devanagari characters are mapped to corresponding roman alphabets in a way that resulting roman alphabets have similar pronunciation as the source characters. The algorithm is evaluated comprehensively on various Devanagari documents with positive results. | en_US |
| dc.identifier.uri | http://hdl.handle.net/10266/4194 | |
| dc.language.iso | en | en_US |
| dc.subject | Image processing | en_US |
| dc.subject | Devanagari | en_US |
| dc.subject | Sanskrit | en_US |
| dc.subject | Character Recognition | en_US |
| dc.subject | Feature Extraction | en_US |
| dc.title | A Novel Approach Towards Devanagari Transliteraton Using Statistical and Structural Feature Extraction | en_US |
| dc.type | Thesis | en_US |
