Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/5871
Title: Language Model Based Online Handwritten Recognition System for Punjabi Language
Authors: Singh, Harjeet
Supervisor: Sharma, R. K.
Singh, V. P.
Keywords: OHWR;SVM;Gurmukhi Script;n-gram modeling;Language Models
Issue Date: 22-Oct-2019
Abstract: Handwriting is an important method of communication between human beings. This is also used as an interface between machine and human beings in a few applications. As such, the development of the handwriting recognition system is one of the important areas of research. Handwriting recognition systems can be developed for offline handwriting and also for online handwriting. In this work, an attempt has been made to develop online handwriting recognition system for Punjabi language. This is a popular language of North India and Gurmukhi script is used to write this language. Owing to the variations in handwriting style of the writers, the handwriting recognition is a challenging task. A good amount of research has been done on the online handwriting recognition for non-Indic scripts such as Arabic, Chinese, Japanese, and Korean. On the other hand, many researchers in India have also shown the interest in this direction in the recent past for Indic scripts, namely, Assamese, Bangla, Devanagari, Gurmukhi, Telugu, Malayalam, and Tamil. The primary objective of this thesis is to build an efficient online handwriting recognition system for Gurmukhi script. The work done in this direction has been organized in seven chapters. Chapter 1 introduces the online handwriting recognition systems for various scripts. A brief discussion on the Gurmukhi script symbols, various phases in online handwriting recognition systems, major challenges in developing an online handwriting recognition system and the motivation behind the development of the proposed system is also included in first chapter. In Chapter 2, a comprehensive review of literature about the tools and technologies used for the recognition of online handwriting recognition systems has been carried out. The literature review covers the articles on Indic and non-Indic scripts. Chapter 3 has focused on data collection, pre-processing and feature extractionphases. Thesearethethreenecessaryphases, requiredbeforerecognition phase in online handwriting reconition systems. The data collection process, including its metadata and the XML format used for storing the data has been explained in this chapter. The pre-processing techniques and features obtained from the data after pre-processing have also been explained in this chapter. In Chapter 4, the zone-wise stroke classification approach has been discussed. Thekeyideaofdividingthestrokesintotwozonesismotivatedbytheanalysis of writing habits of writers. An efficient algorithm for zone identification has been proposed in this chapter. Chapter 5 illustrates the character formation process for Gurmukhi script. Finite State Automata (FSA) based post-processing algorithm has been proposed for the formation of Gurmukhi characters in this chapter. The algorithm proposed in this chapter arranges recognized Unicode characters in their original writing sequence. Additionaly, the major challenges in the formation of Gurmukhi characters have also been tackled in this chapater. The stroke classification has been performed using SVM classifier. In this work, a dataset of 21,945 online handwritten Gurmukhi words has primarily been used. In Chapter 6, the objective of predicting next possible character (word) in a word (sentence) has been addressed. In this chapter, we have discussed the forecasting probabilities of the next possible character (word) in a word (sentence), which depends on the preceding character (word), written in the real-time environment. The bigram and trigram language models are utilized at character- and word-level in order to produce the suggestions for next possible character (word). The bigram and trigram probabilities in these models have been calculated using the corpus, Punjabi Monolingual Text Corpus-AnglaMT (available at https://tdil-dc.in), containing 83,937 Punjabi sentences. Chapter 7 summarizes the work done in this thesis. Based on the work done for the online handwriting recognition system for Gurmukhi script, we have also discussed few directions, which can further contribute to improve the recognition performance of the online handwritten word recognition system for Gurmukhi script.
URI: http://hdl.handle.net/10266/5871
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Thesis_901303002_AsOn_19-10-2019.pdf5.23 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.