A hybrid technique to remove back-to-front interference in historical document images
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The study of historical documents is a topic that presents major challenges for researchers
from various fields such as history, political science, psychology, computer science,
among others. Historical documents contain significant information about cultural and
scientific value. Historical artifacts consist of documents, letters, newspapers, pictures,
maps, etc. Many of these are stored in libraries, museums or government archives.
However, due to the preservation, few people have access to this material. Also such
documents are frequently degraded over time. In order to make easier the access to this
rich source of knowledge of the history of a society, digitization of the material comes as
a possible solution. Digitized degraded documents require specialized processing to
remove different kinds of noise and to improve readability. However, handling these
documents is extremely delicate. Getting software to do the work automatically what the
user would need to do manually can bring great financial and historical benefits,
alongside with better preservation. The problem is further aggravated if the document is
written on both sides because with time the ink from the back side of the paper tends to
seep through and disturbs the visibility of text on the other side during digitization of
paper. This effect is called as “ink-bleed through” or “back-to-front interference”. Among
the document image processing steps, the segmentation is one of the most important as it
will be responsible for identifying what needs to be recognized. The first step of
segmentation is the thresholding (or binarization) of the image. Binarization identifies
which pixels belong to the foreground image and which belong to the background. A
misclassification of the pixels can impair subsequent stages of processing. We present a
new approach for this problem by filtering the background first using ideas of visual
perception theory. When an observer stands back from a document, he/she loses the
details of the image (as the acuity of the human vision decreases with the distance).
Distant objects project smaller images onto the retina. As we increase the distance from
the object, the details are lost and only the main colors remain. This idea is used to
binarize the degraded historical documents and remove “back-to-front interference”.
Description
M.Tech-Computer Science Applications
