Domain based Bilingual Handwriting Recognition for Gurumukhi-English Script
| dc.contributor.author | Kaur, Sukhandeep | |
| dc.contributor.supervisor | Bawa, Seema | |
| dc.contributor.supervisor | Kumar, Ravinder | |
| dc.date.accessioned | 2024-05-16T06:18:16Z | |
| dc.date.available | 2024-05-16T06:18:16Z | |
| dc.date.issued | 2024-05-16 | |
| dc.description.abstract | Handwriting is the most popular and essential way of written communication in day-to-day life. Even in this digital age, handwritten documents are still very much used in public offices to maintain the records. People feel more comfortable in handwriting as compared to typing the text. Thus, there arises the need of some handwritten text recognition system for the digitization of documents. Due to the presence of large number of writing systems and scripts across the world with variety of character sets, it is very difficult to develop a single text recognition system for all the scripts. Many researches have explored machine learning and traditional feature extraction approaches for multilingual text recognition in handwritten and printed text. An efficient text recognition system should recognize the text with significant variations in individual handwriting styles despite the age, gender, and educational background of the writers. Many scripts and languages have similar shaped character set which acts as major challenge for multilingual text recognition. Every domain has different category of users and writing content. Hence, to improve the performance of text recognition, it is needed to develop a text recognition system corresponding to each domain which limits the targeted users and content for training data. Multilingual text recognition has various phases like data acquisition, pre-processing, segmentation, script identification and character recognition etc. Numerous research attempts have been made for improving the performance of each phase of text recognition using traditional, deep learning, machine learning, transfer learning and ensemble machine learning approaches. Due to shortage of benchmark datasets, deep learning based techniques are less explored for regional languages. In this thesis domain based bilingual handwritten text recognition system for Gurumukhi-English script is proposed. In the proposed text recognition system i.e. Bilingual Handwritten Text Recognition for Academic Domain (BHTRforAD), two new datasets, three segmentation approaches and three OCRs are introduced. Bilingual dataset from Academic domain containing text written in Gurumukhi and English script with large variations in content, writing style and document style has been designed. Further, for composite character recognition of Gurumukhi script, composite character dataset with 307 classes is designed. To segment the handwritten documents, three segmentation approaches corresponding to line, word and character are proposed using heuristic approaches. The proposed segmentation approaches are able to segment the text with curved, skewed, closed and touching text lines and words with inter and intra word gap efficiently. Further, to evaluate the deep learning and traditional feature extraction methods for text recognition, this work considers three traditional methods (GLCM, HOG, Gabor) and three deep learning based approaches (LeNet, Vgg19, ResNet50). Similarly, for classification, many machine learning based approaches like SVM, RF, KNN etc. along with ensemble machine learning approaches are considered. For script identification, various combinations of traditional and deep learning based features and classifiers are evaluated to find the best combination of feature set and classifier. In OCR, three OCRs are designed corresponding to scripts i.e. Gurumukhi OCR, English OCR and alphanumeric OCR. For Gurumukhi script, a segmentation free approach to recognize the composite characters for Gurumukhi is proposed using two stage classification approach. To check the performance of proposed algorithms various performance measures are used like accuracy, precision, recall, F1 score, detection rate, recognition accuracy, and CPU time etc. The results obtained from the experiments have proved the efficiency of proposed algorithms for text recognition of bilingual handwritten documents. Finally, a case study considering Academic domain documents has been conducted to test the proposed system. Number of test cases have been designed corresponding to each phase of bilingual text recognition. Documents containing text with large variations are considered as input for text recognition. Test cases designed for line, word and character segmentation are successfully passed for skewed, straight and curved text with inter and intra word gap. Similarly, test cases for script identification and OCRs have been conducted and results are analyzed. | en_US |
| dc.identifier.uri | http://hdl.handle.net/10266/6728 | |
| dc.language.iso | en | en_US |
| dc.subject | Handwritten Text recognition | en_US |
| dc.subject | Gurumukhi-English script | en_US |
| dc.subject | Deep Learning | en_US |
| dc.subject | Segmentation | en_US |
| dc.title | Domain based Bilingual Handwriting Recognition for Gurumukhi-English Script | en_US |
| dc.type | Thesis | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- final_thesis_sukhan_14.5.24.pdf
- Size:
- 5.97 MB
- Format:
- Adobe Portable Document Format
- Description:
- Phd thesis of Sukhandeep Kaur
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 2.03 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
