Cleaning and Recognition of Handwritten English Numerals Poor Quality Document consisting of variation in the Brightness
Loading...
Files
Authors
Supervisors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
It is easy for the human mind to decipher the handwritten characters with accuracy, but
for the machines, it is a difficult task. The Optical Character Recognition (OCR) systems
have been developed to solve this problem. The output of a scanner or camera captured
document is a non editable text image. Though the text is visible but one can neither edit
it nor make any change, if required. This provides a basis for the optical character
recognition (OCR) theory. The overall OCR process consists of three major sub
processes like pre processing, segmentation and recognition. Out of these three, the
preprocessing process is first phase of OCR system. Preprocessing aims to produce data
that is easy for the character recognition systems to operate upon accurately.The
proposed approach initiates with the preprocessing of the image through various
techniques like average binarization, modified average binarization, skew correction.
Binarization is a technique to convert a gray scale into binary (black & white).
Sometimes some scanned or camera captured documents often have varying degrees of
brightness and require more careful treatment than merely applying average binarization
technique. The proposed approach for binarization has solved this problem. The cleaned
up image is then, decomposed into lines, words and characters by using the line
segmentation, word segmentation and character segmentation techniques respectively.
The segmented characters need a proposed Size Normalization technique to adjust the
size, then need to be recognized by machine. The segmented characters are analyzed to
check the presence of the features like sidebars, closed loops and water reservoirs. Based
on these features, a classification structure has been developed, which assigns the
segmented character to the predefined class. Most of the approaches for recognition
process are based on the neural network technique and its adaptations. These techniques
are computationally difficult and require good amount of time to perform the training of
the systems in order to provide good results. Thus, there is a need for a simpler approach
for the character recognition process. The authors have thus, tried to develop a simple and
efficient algorithm for the recognition. All proposed algorithm was applied to several
documents and satisfying results have been obtained by the authors.
Description
MT, SMCE
