Similarity Search Using Locality Sensitive Hasing and Bloom Filter

dc.contributor.authorChauhan, Sachendra Singh
dc.contributor.supervisorBatra, Shalini
dc.date.accessioned2014-08-05T10:07:06Z
dc.date.available2014-08-05T10:07:06Z
dc.date.issued2014-08-05T10:07:06Z
dc.descriptionMaster of Engineering-Thesisen
dc.description.abstractSimilarity search of text documents can be reduced to Approximate Nearest Neighbor Search by converting text documents into sets by using Shingling. In the proposed scheme conversion of a document into set is done by using k-shingle and similarity between sets, is calculated using Jaccard similarity. Characteristic matrix, created by searching shingles in each document is generated using the shingles and hash functions. Since the complexity of the matrix is significantly high when shingle set is very large, a technique called ‘minhashing’ is used to reduce the size of the matrix. Search time can be drastically reduced if characteristics matrix are created by using Bloom Filter. Bloom Filter is a probabilistic data structure that will reduce O(n) search time to constant time. Minhashing creates a signature matrix which is very less in size as compared with characteristics matrix but gives almost same result. Further, finding the similarity among all pairs of column of signature matrix is still a big problem because comparing n columns take O(n2) time. The time of comparing columns for similarity is reduced by using another technique, Locality Sensitive Hashing. The proposed scheme has been successfully implemented and comparative analysis of existing methodology and proposed methodology has been provided.en
dc.description.sponsorshipComputer Science and Engineering, Thapar University, Patialaen
dc.format.extent4671894 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/2826
dc.language.isoen_USen
dc.subjectLocality Sensitive Hashingen
dc.subjectBloom Filteren
dc.subjectSignature Matrixen
dc.subjectcomputer scienceen
dc.titleSimilarity Search Using Locality Sensitive Hasing and Bloom Filteren
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2826.pdf
Size:
4.45 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: