Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/6204
Title: Word Sense Disambiguation for Punjabi Language
Authors: Singh, Varinder Pal
Supervisor: Kumar, Parteek
Keywords: Word Sense Disambiguation;WSD;NLP
Issue Date: 31-Jan-2022
Abstract: The recent eruption of data in different natural languages on the internet has necessitated the development of Natural Language Processing (NLP) tasks. The major impediments in the development and implementation of NLP are the scarcity of the standard datasets, knowledge resources, language tools and ambiguity resolution. Word Sense Disambiguation (WSD) is a critical and essential task for machine translation, information retrieval, question answering and sentiment analysis, etc. NLP tasks. The objective of WSD is to automatically select the appropriate sense of an ambiguous word based on the context of the word. The WSD process identifies the different senses for every word relevant to the text or discourse under consideration from the sense inventories such as dictionaries, thesaurus, ontologies, and WordNet. Then it involves a mean to assign the appropriate sense to each occurrence of a word in context. Thus WSD needs the representation of common sense and encyclopaedic knowledge to resolve the sense of ambiguous words. Recognizing the proper sense of a word in context by a computer is defined as an AI-complete complexity problem. There are two different types of WSD, namely targeted WSD and allwords WSD. The targeted WSD resolves the ambiguity of an ambiguous target word, usually occurring one per sentence. The all-words WSD disambiguates all open-class words (noun, adverb, verb, adjectives) in a text. In this research work, targeted WSD was implemented. India is a multilingual country having 22 national languages. Interlanguage processing tasks like machine translation, question answering, sentiment analysis, cross-lingual search, etc., are highly applicable problems in India. The Punjabi language is the official language of the Indian state of Punjab, and it is the world’s 10th most widely spoken language. The Punjabi diasporas are present globally, and there is a need for the Punjabi NLP tasks to connect them to the Punjabi language successfully. It motivated us to explore the field of WSD for the Punjabi language. WSD has been successfully designed and developed for the English language. However, there are many differences in the language structure of English and Punjabi, which arise different challenges while performing sense disambiguation on the Punjabi text dataset. In this research work, the systematic review has explicitly portrayed WSD in Indian languages. A review methodology has followed with the help of the framed research questions. The renowned electronic databases and the topmost conferences were explored to include the relevant studies of WSD for Indian languages. The existing status of the WSD for Indianiv languages has categorised as per the different families of the Indian languages. The evolution of WSD for Indian languages and their publication time is reported. This research has reviewed and analysed the WSD for Indian languages based on the techniques, knowledge resources and evaluation methods. The review of the standard Senseval/Semeval evaluation workshops for WSD field has been presented. The findings of this research work, such as the available raw corpora, sense tagged corpus, dictionaries, WordNet and pre-processing linguistic tools of different Indian languages, will help the researchers. The comprehensive survey presented in this research work would assist the researchers in choosing the most suitable WSD in the specific domain and the pertinent future directions. The availability of the Punjabi WordNet has motivated us to implement the Punjabi WSD.
URI: http://hdl.handle.net/10266/6204
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Thesis_WSD_Lib1.pdf4.85 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.