Cleaning and Recognition of Handwritten English Numerals Poor Quality Document consisting of variation in the Brightness

Sharma, Rupali

Cleaning and Recognition of Handwritten English Numerals Poor Quality Document consisting of variation in the Brightness

Files

2302.pdf (2.38 MB)

Date

2013-08-16T11:39:17Z

Authors

Sharma, Rupali

Supervisors

Kumar, Rajiv

Abstract

It is easy for the human mind to decipher the handwritten characters with accuracy, but for the machines, it is a difficult task. The Optical Character Recognition (OCR) systems have been developed to solve this problem. The output of a scanner or camera captured document is a non editable text image. Though the text is visible but one can neither edit it nor make any change, if required. This provides a basis for the optical character recognition (OCR) theory. The overall OCR process consists of three major sub processes like pre processing, segmentation and recognition. Out of these three, the preprocessing process is first phase of OCR system. Preprocessing aims to produce data that is easy for the character recognition systems to operate upon accurately.The proposed approach initiates with the preprocessing of the image through various techniques like average binarization, modified average binarization, skew correction. Binarization is a technique to convert a gray scale into binary (black & white). Sometimes some scanned or camera captured documents often have varying degrees of brightness and require more careful treatment than merely applying average binarization technique. The proposed approach for binarization has solved this problem. The cleaned up image is then, decomposed into lines, words and characters by using the line segmentation, word segmentation and character segmentation techniques respectively. The segmented characters need a proposed Size Normalization technique to adjust the size, then need to be recognized by machine. The segmented characters are analyzed to check the presence of the features like sidebars, closed loops and water reservoirs. Based on these features, a classification structure has been developed, which assigns the segmented character to the predefined class. Most of the approaches for recognition process are based on the neural network technique and its adaptations. These techniques are computationally difficult and require good amount of time to perform the training of the systems in order to provide good results. Thus, there is a need for a simpler approach for the character recognition process. The authors have thus, tried to develop a simple and efficient algorithm for the recognition. All proposed algorithm was applied to several documents and satisfying results have been obtained by the authors.

Description

MT, SMCE

Keywords

OCR, Brightness, Water Resesrvior

URI

http://hdl.handle.net/10266/2302

Collections

Masters Theses@SCBC

Full item page

Cleaning and Recognition of Handwritten English Numerals Poor Quality Document consisting of variation in the Brightness

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By