Word Sense Disambiguation for Punjabi Language
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The recent eruption of data in different natural languages on the internet has necessitated the
development of Natural Language Processing (NLP) tasks. The major impediments in the
development and implementation of NLP are the scarcity of the standard datasets, knowledge
resources, language tools and ambiguity resolution. Word Sense Disambiguation (WSD) is a
critical and essential task for machine translation, information retrieval, question answering
and sentiment analysis, etc. NLP tasks.
The objective of WSD is to automatically select the appropriate sense of an ambiguous word
based on the context of the word. The WSD process identifies the different senses for every
word relevant to the text or discourse under consideration from the sense inventories such as
dictionaries, thesaurus, ontologies, and WordNet. Then it involves a mean to assign the
appropriate sense to each occurrence of a word in context. Thus WSD needs the representation
of common sense and encyclopaedic knowledge to resolve the sense of ambiguous words.
Recognizing the proper sense of a word in context by a computer is defined as an AI-complete
complexity problem. There are two different types of WSD, namely targeted WSD and allwords WSD. The targeted WSD resolves the ambiguity of an ambiguous target word, usually
occurring one per sentence. The all-words WSD disambiguates all open-class words (noun,
adverb, verb, adjectives) in a text. In this research work, targeted WSD was implemented.
India is a multilingual country having 22 national languages. Interlanguage processing tasks
like machine translation, question answering, sentiment analysis, cross-lingual search, etc., are
highly applicable problems in India. The Punjabi language is the official language of the Indian
state of Punjab, and it is the world’s 10th most widely spoken language. The Punjabi diasporas
are present globally, and there is a need for the Punjabi NLP tasks to connect them to the
Punjabi language successfully. It motivated us to explore the field of WSD for the Punjabi
language. WSD has been successfully designed and developed for the English language.
However, there are many differences in the language structure of English and Punjabi, which
arise different challenges while performing sense disambiguation on the Punjabi text dataset.
In this research work, the systematic review has explicitly portrayed WSD in Indian languages.
A review methodology has followed with the help of the framed research questions. The
renowned electronic databases and the topmost conferences were explored to include the
relevant studies of WSD for Indian languages. The existing status of the WSD for Indianiv
languages has categorised as per the different families of the Indian languages. The evolution
of WSD for Indian languages and their publication time is reported. This research has reviewed
and analysed the WSD for Indian languages based on the techniques, knowledge resources and
evaluation methods. The review of the standard Senseval/Semeval evaluation workshops for
WSD field has been presented. The findings of this research work, such as the available raw
corpora, sense tagged corpus, dictionaries, WordNet and pre-processing linguistic tools of
different Indian languages, will help the researchers. The comprehensive survey presented in
this research work would assist the researchers in choosing the most suitable WSD in the
specific domain and the pertinent future directions. The availability of the Punjabi WordNet
has motivated us to implement the Punjabi WSD.
