Please use this identifier to cite or link to this item:
http://hdl.handle.net/10266/2302
Title: | Cleaning and Recognition of Handwritten English Numerals Poor Quality Document consisting of variation in the Brightness |
Authors: | Sharma, Rupali |
Supervisor: | Kumar, Rajiv |
Keywords: | OCR, Brightness, Water Resesrvior |
Issue Date: | 16-Aug-2013 |
Abstract: | It is easy for the human mind to decipher the handwritten characters with accuracy, but for the machines, it is a difficult task. The Optical Character Recognition (OCR) systems have been developed to solve this problem. The output of a scanner or camera captured document is a non editable text image. Though the text is visible but one can neither edit it nor make any change, if required. This provides a basis for the optical character recognition (OCR) theory. The overall OCR process consists of three major sub processes like pre processing, segmentation and recognition. Out of these three, the preprocessing process is first phase of OCR system. Preprocessing aims to produce data that is easy for the character recognition systems to operate upon accurately.The proposed approach initiates with the preprocessing of the image through various techniques like average binarization, modified average binarization, skew correction. Binarization is a technique to convert a gray scale into binary (black & white). Sometimes some scanned or camera captured documents often have varying degrees of brightness and require more careful treatment than merely applying average binarization technique. The proposed approach for binarization has solved this problem. The cleaned up image is then, decomposed into lines, words and characters by using the line segmentation, word segmentation and character segmentation techniques respectively. The segmented characters need a proposed Size Normalization technique to adjust the size, then need to be recognized by machine. The segmented characters are analyzed to check the presence of the features like sidebars, closed loops and water reservoirs. Based on these features, a classification structure has been developed, which assigns the segmented character to the predefined class. Most of the approaches for recognition process are based on the neural network technique and its adaptations. These techniques are computationally difficult and require good amount of time to perform the training of the systems in order to provide good results. Thus, there is a need for a simpler approach for the character recognition process. The authors have thus, tried to develop a simple and efficient algorithm for the recognition. All proposed algorithm was applied to several documents and satisfying results have been obtained by the authors. |
Description: | MT, SMCE |
URI: | http://hdl.handle.net/10266/2302 |
Appears in Collections: | Masters Theses@SCBC |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.