Similarity Search Using Locality Sensitive Hasing and Bloom Filter
| dc.contributor.author | Chauhan, Sachendra Singh | |
| dc.contributor.supervisor | Batra, Shalini | |
| dc.date.accessioned | 2014-08-05T10:07:06Z | |
| dc.date.available | 2014-08-05T10:07:06Z | |
| dc.date.issued | 2014-08-05T10:07:06Z | |
| dc.description | Master of Engineering-Thesis | en |
| dc.description.abstract | Similarity search of text documents can be reduced to Approximate Nearest Neighbor Search by converting text documents into sets by using Shingling. In the proposed scheme conversion of a document into set is done by using k-shingle and similarity between sets, is calculated using Jaccard similarity. Characteristic matrix, created by searching shingles in each document is generated using the shingles and hash functions. Since the complexity of the matrix is significantly high when shingle set is very large, a technique called ‘minhashing’ is used to reduce the size of the matrix. Search time can be drastically reduced if characteristics matrix are created by using Bloom Filter. Bloom Filter is a probabilistic data structure that will reduce O(n) search time to constant time. Minhashing creates a signature matrix which is very less in size as compared with characteristics matrix but gives almost same result. Further, finding the similarity among all pairs of column of signature matrix is still a big problem because comparing n columns take O(n2) time. The time of comparing columns for similarity is reduced by using another technique, Locality Sensitive Hashing. The proposed scheme has been successfully implemented and comparative analysis of existing methodology and proposed methodology has been provided. | en |
| dc.description.sponsorship | Computer Science and Engineering, Thapar University, Patiala | en |
| dc.format.extent | 4671894 bytes | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10266/2826 | |
| dc.language.iso | en_US | en |
| dc.subject | Locality Sensitive Hashing | en |
| dc.subject | Bloom Filter | en |
| dc.subject | Signature Matrix | en |
| dc.subject | computer science | en |
| dc.title | Similarity Search Using Locality Sensitive Hasing and Bloom Filter | en |
| dc.type | Thesis | en |
