Please use this identifier to cite or link to this item:
http://hdl.handle.net/10266/1057
Title: | Online Handwritten Gurmukhi Character Recognition |
Authors: | Sharma, Anuj |
Supervisor: | Sharma, R. K. Kumar, Rajesh |
Keywords: | Online handwriting Recognition;Preprocessing;Feature Extraction;Postprocessing |
Issue Date: | 19-Nov-2009 |
Abstract: | Computers are greatly influencing the lives of human beings and their usage is increasing at a tremendous rate. The ease with which we can exchange information between user and computer is of immense importance today because input devices such as keyboard and mouse have limitations vis-à-vis input through natural handwriting. We can use the online handwriting recognition process for a quick and natural way of communication between computer and human beings. Handwriting recognition is in research for over four decades and has attracted many researchers across the world. Variations in handwriting are one prominent problem and achieving high degree of accuracy is a tedious task. The main goal of this thesis is to develop an online handwritten Gurmukhi character recognition system. Gurmukhi is the script of Punjabi language which is widely spoken across the globe. This thesis is divided into six chapters. A brief outline of each chapter is given in the following paragraphs. Chapter 1 includes three sections, namely, issues in online handwriting recognition system, literature review and overview of Gurmukhi script. Issues in online handwriting recognition system include: handwriting styles variations; constrained and unconstrained handwriting; personal, situational and material factors; writer dependent vs. writer independent recognition systems. In literature review, a detailed literature survey on each phase of established procedure of online handwriting recognition has been presented. The established procedure to recognize online handwriting includes data collection, preprocessing, feature extraction, segmentation, recognition and post-processing. We have also reviewed literature for different recognition methods. These recognition methods are statistical, syntactical and structural, neural network and elastic matching methods. In addition, we have also discussed some of the results reported in the literature of online handwriting recognition. This literature review covers different languages such as English, Chinese, Japanese, Urdu, Hindi, Bangali, Tamil and Telugu. In the overview of Gurmukhi script, we have included nature of handwriting in Gurmukhi script and different characters of Gurmukhi script. Chapter 2 contains the work carried out for three phases of online handwriting character recognition. These phases are data collection, preprocessing and feature extraction. These phases are discussed in three sections entitled data collection phase, preprocessing phase and computation of features phase. In data collection phase, input handwritten strokes are collected. We have discussed the procedure to collect the data at stroke level. Preprocessing phase is followed by data collection phase. In the preprocessing phase, we have considered size normalization and centering of stroke, interpolating missing points in stroke, smoothing of stroke, slant correction of stroke and resampling of points in stroke. We have proposed algorithms for the respective stages. In computation of features phase, features are computed after preprocessing of input handwritten stroke. The high level features are computed on the basis of low level features. The high level features include loop, crossings, straight line, headline and dots. The common low level features are position of stroke, area, length, curliness, slope etc. We have introduced algorithms to recognize these high level features. We have noted an improvement of 5%, 3.33%, 6.66% and 8.34% in recognition of loop, headline, straight line and dot features, respectively after using preprocessing stage. Chapter 3 focuses on recognition of online handwritten Gurmukhi characters using elastic matching method. This chapter also illustrates the use of post-processing stage. In this chapter, we have presented a process to recognize online handwritten Gurmukhi characters which in turn uses forty unique dependent strokes for 41 Gurmukhi characters. These dependent strokes are assigned unique stroke ids. This process recognizes Gurmukhi character in two stages. In first stage a stroke id is recognized and in second stage the character on the basis of recognized stroke ids is finally recognized. In this process, two databases, namely, stroke database and character database have been prepared. Strokes are recognized using stroke database and characters are recognized using character database. We have used elastic matching method as the recognition method in this chapter. The post-processing phase has been used after implementing recognition method. The recognition rate achieved without implementing post-processing steps is 87.40%, whereas, it is 90.08% when post-processing steps have been included. As such, we could achieve an improvement of 2.68% in recognition of Gurmukhi characters when post-processing steps are in place. It has been noted that 24 characters have shown improvement in their recognition rate after using post-processing steps. A maximum of 6.67% improvement has been found in some of the characters after using post-processing steps. In Chapter 4, we have recognized online handwritten Gurmukhi characters using two methods, namely, small line segments and hidden markov model. We have proposed a new recognition method based on elastic matching and chain code techniques. This method has been called small line segments method. The proposed method includes a procedure that converts stroke database to small line segments direction database. The overall recognition rate using small line segments method is 94.59% when tested on 2460 characters. Here, 2460 handwritten Gurmukhi characters have been collected from 60 writers. We have noted that 24 characters out of total 41 characters have been recognized correctly by all writers. It is worth mentioning here that when stroke database is converted to format of small line segments directions database, the size of small line segments directions database has reduced to approximately 1.60% of the size of stroke database. Hidden markov model is a method based on statistical techniques. We have implemented hidden markov model to recognize input handwritten Gurmukhi characters and presented this procedure from software development point of view. In this method, we have presented the procedure to evaluate , and from small line segments directions database. , and are the three important elements of hidden markov model as given by Rabiner (1989). Database used in implementation of this method has been prepared using 130 handwritten samples for 41 Gurmukhi characters. The recognition has been performed using 60 writers where each writer has contributed all 41 Gurmukhi characters. The overall recognition rate achieved by us using hidden markov model method is 91.95%. We have noted that the procedure discussed in this chapter is able to recognize at least 30 characters for all the sixty writers correctly. We have also noticed hundred percent recognition rate for 13 characters. In Chapter 5, we have extended the present study to recognize online handwritten Gurmukhi words. The segmentation phase has been discussed in online handwritten Gurmukhi words recognition. We have implemented a point based segmentation procedure that segments the large strokes into sub strokes on the basis of average number of points. We have proposed a new phase in online handwritten Gurmukhi words recognition as ‘rearrangement of strokes’. The rearrangement of strokes includes: the strokes identification as dependent or major dependent stroke, the rearrangement of strokes with respect to their positions from y-axis and the combination of strokes to recognize a character. The hidden markov model has been used as recognition method in recognition phase. A group of 50 writers was requested to write 200 Gurmukhi words. These 200 Gurmukhi words include characters in their ‘upper zone and middle zone’ or ‘middle zone and lower zone’ or ‘upper zone, middle zone and lower zone’ or ‘middle zone’. The overall recognition rate achieved for all writers is 83.04%. Chapter 6 presents the contributions of the present work. These contributions include inferences drawn as a result of various experiments conducted in this thesis. This chapter also includes some directions for the related work that can be carried out in future. |
URI: | http://hdl.handle.net/10266/1057 |
Appears in Collections: | Doctoral Theses@SOM |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.