Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/3975
Title: Efficient Framework for Semantic Search on Web
Authors: Jindal, Vikas
Supervisor: Bawa, Seema
Batra, Shalini
Keywords: Semantic Web;Ontologies;Knowledge Corpus;Semantic Search
Issue Date: 1-Aug-2016
Abstract: With frequent and faster growth of the Web and dependence on the Web for relevant information retrieval, search engines have become the most popular and powerful tool for accessing desired information online. However, it is observed that the Web pages returned by even a renowned search engine are not so accurately useful. The necessity of finding the most relevant information has given rise to the research in the field of semantic search. Traditional Web search methods where basic relevance criteria rely primarily on the presence of query keywords within the returned pages are required to be replaced with more effective semantic search techniques.Semantic based search would be able to provide users a more intelligent form of finding what they are looking for within the global source of information available online. In this thesis, various approaches for semantic based search on Web have been studied and analyzed resulting in the identification of two broad perspectives of semantic search as elaborated in the chapter on literature review. Fundamental limitations identified in the existing approaches have been major motivation for proposing efficient semantic based search approach. Later a framework for QUery-context based Information retrieval using Corpus Knowledge (QUICK) is proposed which has been elaborated in the chapter on proposed framework. Here the Web pages returned by a baseline system in response to original query are used to generate a corpus of words related to the query category. The word tokens which are laying in the close proximity of the query keywords are supposed to be semantically related to the original query. The relative positioning and frequency of the words with respect to the query word is assigned due importance using probabilistic feature of the proposed approach which in turn ensures to have greater probability in reaching to the context of the query. The approach shows the possibility of generating a set of context features in an efficient manner in order to produce a more accurate model of the query topic. This context oriented semantic search approach has been implemented using an open source library of language processing features, NLTK and integrating it with Python language interpreter. The elaborations have been presented in the chapter on design and implementation of QUICK. Category specific user query is entered to a standard search engine in order to retrieve most relevant documents pertaining to that domain. The top-ranked returned documents are stored and techniques are applied for filtering non-lexical tokens like stop-words, non-alphabetic strings. The words laying in the close proximity of the xv query keywords are extracted to be used as context vector. The strength of association of the context vector features to the category is calculated and presented in the form of a list. A set of features having best strength of association to the category are selected and treated as the context features of the category to be used for the semantic expansion of the query pertaining to that category. The experiments for the comparison of result set precision of the proposed QUICK based semantic search and the standard keyword based search have been performed and elaborated in the chapter on testing and validation. The proposed semantic based search approach has witnessed a significant improvement over the standard keyword based approach. Finally, the findings of the entire thesis have been concluded along with the potential scope for future directions in the said domain.
Description: PHD, CSED
URI: http://hdl.handle.net/10266/3975
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
3975.pdf2.11 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.