Comparative Analysis of Measures of Similarity and Semantic Relatedness for Text Classification
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this thesis, different techniques like Latent Semantic Indexing (LSI) and measures of
semantic relatedness and similarity for text classification are discussed.
Latent Semantic Indexing is based upon the assumption that there is an underlying
semantic structure in textual data, and that the relationship between terms and documents
can be re-described in this semantic structure form. The key idea of Latent Semantic
Indexing (LSI) is to map documents on to a vector space of reduced dimensionality,
called the latent semantic space. This mapping is done using a technique called Singular
Value Decomposition (SVD).
Semantic relatedness measures quantify the degree in which some words or concepts are
related, considering not only similarity but any possible semantic relationship among
them. In this thesis, various semantic relatedness measures that use the WordNet as their
knowledge source and others MSRs like NGD and NCD which make use of the Web as
their knowledge base are computed. These semantic measures are tested and their
correlation with human judgement is checked.
Description
M.E.
