Lexical Disambiguation using English WordNet with Natural Language Toolkit
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The expansion of the Information Technology, has given rise to the emergence of the
great amounts of the unstructured data like the Web pages, document warehouses,
blog corpora and many more. Consequently, there is arising an increasing demand to
treat the massive information through the means of automated methods of lexical
disambiguation i.e. Word Sense Disambiguation (WSD). It is a tedious task to deal
with, as to resolve this issue one need to overcome the complexities of language and it
is a complicated affair to recognize a semantic layout from the unstructured sources of
the text and still the researches are continued in this field so as to resolve the issue at
the best possible level of accuracy. WSD is considered as an artificial intelligence
problem having the capability to recognize the meaning of the words, which are in the
context of the given text. The issue of lexical disambiguation existing in a sentence is
resolved here with the help of the Lesk Algorithm, with the modification that, the Part
of Speech (POS) of the ambiguous word is predicted with the help of Decision Tree
Classifier, which helps in resolving the issue of accuracy to determine the correct POS
to a great extent, and this even aided the Lesk Algorithm to limit its effort to just one
Part-of-Speech of the ambiguous word only. The output is yielded in the form of the
‘sense’ which gives a best match with the context of the sentence in which it is
mentioned. Experimental results showed that the accuracy to determine the sense of a
word was improved.
The resultant sense obtained as an output was further judged by the computation of its
similarity score (i.e. Wu-Palmer similarity score and Jiang-Conrath similarity score)
with the words in the context bag. The modified Lesk Algorithm further facilitated in
getting the correct translation of the ambiguous words to the languages named Punjabi
and Hindi.
Description
ME, CSED
