Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/3844
Title: Efficient Pre-processing, Feature Extraction and Post-Processing Algorithms for Recognition of Online Handwritten Gurmukhi Script
Authors: Kumar, Ravinder
Supervisor: Sharma, R. K.
Sharma, Anuj
Keywords: Pre-processing;Post-processing;Gurmuki scripts;Online handwriting recognition;computer science;CSE
Issue Date: 1-Nov-2015
Abstract: Human beings start recognizing digits, letters and symbols in very early stage of their childhood. This tendency of human beings to recognize alphabets/characters develops rapidly with the aging process. Though this natural process of recognizing alphabets or objects by human beings is taken for granted but the complexity of recognition system is realized when task of teaching the same is performed on a machine. Online Handwriting Recognition (OHWR) system is one such area of research that has gained world-wide attention of researchers. One main reason attributed to this popularity is that recognition of online handwriting has a wide range of applications at the interface between man and machine which ultimately facilitates easy way of giving inputs to the computer. Recognition of online handwritten character for any script is a difficult task due to problems posed by different handwriting styles of different writers, complexity of content and different handwriting devices used by user. Extensive research in the area of handwritten characters has made it feasible to recognize characters of English, Chinese, and Japanese language. However, less attention has been paid to recognition of Indian languages, especially, regional languages. A good number of methods have been proposed for recognition of handwritten English characters. However, the development of technologies is not at the same level for Indian scripts, especially Gurmukhi script which is complicated in terms of its structure and writing style. Hence, an attempt has been made in this thesis to provide online handwriting recognition system for Gurmukhi script. Main objective of this research was to propose efficient pre-processing, feature extraction and post-processing algorithms by introducing new constraints/ steps/ parameters for the recognition of online handwritten Gurmukhi script. Further, to validate the proposed algorithms targeting about 95% accuracy at character level. For this purpose, a stroke based approach has been adopted for developing a recognition engine for online handwritten Gurmukhi script. This approach made use of all standardized steps, viz. pre-processing, feature extraction, recognition and post-processing as required for designing a handwriting recognition engine. The preliminary examination in this area made it clear that no public online handwritten Gurmukhi character database was available to carry out recognition experiments. Hence, creating a reliable and authentic database became a part of the current research work. Raw data for online handwritten Gurmukhi words have been collected from 144 users residing in Punjab. These users were categorized on the basis of their level of proficiency in writing Gurmukhi script and were put in four categories. The handwritten words collected from different writers have been formulated using different combinations of 35 characters, 9 vowel modifiers, 3 nasals, and 10 numerals of Gurmukhi script. A total number of 39,871 sample words of Gurmukhi script were collected from different category of users. From these collected samples 2,28,442 strokes have been collected. Out of collected strokes, total 2,26,330 strokes have been annotated. All those strokes which are identified and annotated are further used for training of LibSVM classifier. Experiment results have been obtained by using the open source software LibSVM. For LibSVM classification, Radial Basis Function (RBF) kernel has been used. In this work, efficient pre-processing, feature extraction and post-processing algorithms for recognition of online handwritten Gurmukhi script have been proposed. These algorithms shall lead to the development of an efficient WI, open-vocabulary and cursive writing recognition system. In pre-processing, a new methodology of association of strokes has been proposed. Here, association between different strokes has been targeted to form a character based on positional coordinates and overlapping windows of strokes. A set of features that represents a stroke of handwritten Gurmukhi script has been defined and analyzed. Cross-validation has been used in this research work at 2-fold, 3-fold, 4-fold and 5-fold for three engines, viz. Engine A, Engine B, and Engine C. New algorithms for post-segmentation of stroke sets, sequencing of stroke sets and merging the stroke sets for post-processing in handwritten Gurmukhi script recognition have been presented. Application of SVM has achieved excellent recognition results for various pattern recognition problems. Under current study, 95.6% recognition accuracy for Gurmukhi characters has been achieved. An overall recognition accuracy of 96.8% has been achieved for Gurmukhi numerals.
Description: Doctor of Philosophy, Computer Science, Thesis
URI: http://hdl.handle.net/10266/3844
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
3844.pdf6.84 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.