Domain based Bilingual Handwriting Recognition for Gurumukhi-English Script

Kaur, Sukhandeep

Domain based Bilingual Handwriting Recognition for Gurumukhi-English Script

dc.contributor.author	Kaur, Sukhandeep
dc.contributor.supervisor	Bawa, Seema
dc.contributor.supervisor	Kumar, Ravinder
dc.date.accessioned	2024-05-16T06:18:16Z
dc.date.available	2024-05-16T06:18:16Z
dc.date.issued	2024-05-16
dc.description.abstract	Handwriting is the most popular and essential way of written communication in day-to-day life. Even in this digital age, handwritten documents are still very much used in public offices to maintain the records. People feel more comfortable in handwriting as compared to typing the text. Thus, there arises the need of some handwritten text recognition system for the digitization of documents. Due to the presence of large number of writing systems and scripts across the world with variety of character sets, it is very difficult to develop a single text recognition system for all the scripts. Many researches have explored machine learning and traditional feature extraction approaches for multilingual text recognition in handwritten and printed text. An efficient text recognition system should recognize the text with significant variations in individual handwriting styles despite the age, gender, and educational background of the writers. Many scripts and languages have similar shaped character set which acts as major challenge for multilingual text recognition. Every domain has different category of users and writing content. Hence, to improve the performance of text recognition, it is needed to develop a text recognition system corresponding to each domain which limits the targeted users and content for training data. Multilingual text recognition has various phases like data acquisition, pre-processing, segmentation, script identification and character recognition etc. Numerous research attempts have been made for improving the performance of each phase of text recognition using traditional, deep learning, machine learning, transfer learning and ensemble machine learning approaches. Due to shortage of benchmark datasets, deep learning based techniques are less explored for regional languages. In this thesis domain based bilingual handwritten text recognition system for Gurumukhi-English script is proposed. In the proposed text recognition system i.e. Bilingual Handwritten Text Recognition for Academic Domain (BHTRforAD), two new datasets, three segmentation approaches and three OCRs are introduced. Bilingual dataset from Academic domain containing text written in Gurumukhi and English script with large variations in content, writing style and document style has been designed. Further, for composite character recognition of Gurumukhi script, composite character dataset with 307 classes is designed. To segment the handwritten documents, three segmentation approaches corresponding to line, word and character are proposed using heuristic approaches. The proposed segmentation approaches are able to segment the text with curved, skewed, closed and touching text lines and words with inter and intra word gap efficiently. Further, to evaluate the deep learning and traditional feature extraction methods for text recognition, this work considers three traditional methods (GLCM, HOG, Gabor) and three deep learning based approaches (LeNet, Vgg19, ResNet50). Similarly, for classification, many machine learning based approaches like SVM, RF, KNN etc. along with ensemble machine learning approaches are considered. For script identification, various combinations of traditional and deep learning based features and classifiers are evaluated to find the best combination of feature set and classifier. In OCR, three OCRs are designed corresponding to scripts i.e. Gurumukhi OCR, English OCR and alphanumeric OCR. For Gurumukhi script, a segmentation free approach to recognize the composite characters for Gurumukhi is proposed using two stage classification approach. To check the performance of proposed algorithms various performance measures are used like accuracy, precision, recall, F1 score, detection rate, recognition accuracy, and CPU time etc. The results obtained from the experiments have proved the efficiency of proposed algorithms for text recognition of bilingual handwritten documents. Finally, a case study considering Academic domain documents has been conducted to test the proposed system. Number of test cases have been designed corresponding to each phase of bilingual text recognition. Documents containing text with large variations are considered as input for text recognition. Test cases designed for line, word and character segmentation are successfully passed for skewed, straight and curved text with inter and intra word gap. Similarly, test cases for script identification and OCRs have been conducted and results are analyzed.	en_US
dc.identifier.uri	http://hdl.handle.net/10266/6728
dc.language.iso	en	en_US
dc.subject	Handwritten Text recognition	en_US
dc.subject	Gurumukhi-English script	en_US
dc.subject	Deep Learning	en_US
dc.subject	Segmentation	en_US
dc.title	Domain based Bilingual Handwriting Recognition for Gurumukhi-English Script	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: final_thesis_sukhan_14.5.24.pdf
Size:: 5.97 MB
Format:: Adobe Portable Document Format
Description:: Phd thesis of Sukhandeep Kaur

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.03 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses@CSED