Efficient Framework for Semantic Search on Web
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With frequent and faster growth of the Web and dependence on the Web for relevant information
retrieval, search engines have become the most popular and powerful tool for accessing desired
information online. However, it is observed that the Web pages returned by even a renowned
search engine are not so accurately useful. The necessity of finding the most relevant information
has given rise to the research in the field of semantic search. Traditional Web search methods
where basic relevance criteria rely primarily on the presence of query keywords within the
returned pages are required to be replaced with more effective semantic search
techniques.Semantic based search would be able to provide users a more intelligent form of
finding what they are looking for within the global source of information available online.
In this thesis, various approaches for semantic based search on Web have been studied and
analyzed resulting in the identification of two broad perspectives of semantic search as
elaborated in the chapter on literature review. Fundamental limitations identified in the existing
approaches have been major motivation for proposing efficient semantic based search approach.
Later a framework for QUery-context based Information retrieval using Corpus Knowledge
(QUICK) is proposed which has been elaborated in the chapter on proposed framework. Here the
Web pages returned by a baseline system in response to original query are used to generate a
corpus of words related to the query category. The word tokens which are laying in the close
proximity of the query keywords are supposed to be semantically related to the original query.
The relative positioning and frequency of the words with respect to the query word is assigned
due importance using probabilistic feature of the proposed approach which in turn ensures to
have greater probability in reaching to the context of the query. The approach shows the
possibility of generating a set of context features in an efficient manner in order to produce a
more accurate model of the query topic. This context oriented semantic search approach has been
implemented using an open source library of language processing features, NLTK and
integrating it with Python language interpreter. The elaborations have been presented in the
chapter on design and implementation of QUICK. Category specific user query is entered to a
standard search engine in order to retrieve most relevant documents pertaining to that domain.
The top-ranked returned documents are stored and techniques are applied for filtering non-lexical
tokens like stop-words, non-alphabetic strings. The words laying in the close proximity of the
xv
query keywords are extracted to be used as context vector. The strength of association of the
context vector features to the category is calculated and presented in the form of a list. A set of
features having best strength of association to the category are selected and treated as the context
features of the category to be used for the semantic expansion of the query pertaining to that
category. The experiments for the comparison of result set precision of the proposed QUICK
based semantic search and the standard keyword based search have been performed and
elaborated in the chapter on testing and validation. The proposed semantic based search approach
has witnessed a significant improvement over the standard keyword based approach. Finally, the
findings of the entire thesis have been concluded along with the potential scope for future
directions in the said domain.
Description
PHD, CSED
