Please use this identifier to cite or link to this item:
Title: Identify Similar Research Papers Using Locality Sensitive Hashing
Authors: Gupta, Divya
Supervisor: Batra, Shalini
Keywords: Hashing;LOcality
Issue Date: 8-Aug-2016
Abstract: Identifying the research papers of a particular domain is a tedious and time consuming job for an academician and a researcher. Lot of e ort can be saved if all papers related to a particular domain can be combined in a single group. It will not be feasible to manually cluster the similar type of papers on the basis of topics, key words or abstract. This thesis presents an approach of clustering similar type of papers using Locality Sensitive Hashing (LSH) , a probabilistic data structure which adds similar type of documents in a single bucket by spiting the input text into shingles and using min-hashing, a variant of Jaccard similarity to generate signature matrix. Our work explores how similar research papers can be clustered by considering the title of the paper, keywords and abstract of the paper. Experimental analysis shows that using LSH majority of the papers of similar domain are categorized into one bucket in less time. In particular, we interpolate the sensitive hashing for the abstract with authors, keyword and journal of the paper. The basic methodology we adapt is to turning of a document into vector model is done by shingling and homogeneity among sets, is intended using Jaccard similarity. By penetrating shingles we build Characteristic matrix, which engender signatures for each document by a technique called ”minhashing” is used to diminish the size of the matrix.
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
4043.pdf1.8 MBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.