Please use this identifier to cite or link to this item:
Title: Similarity analysis of web crawled data
Authors: Garg, Parul
Supervisor: Arora, Vinay
Keywords: Similarity analysis;Web crawling;association rule mining;web crawler;data mining;computer science;CSED
Issue Date: 3-Sep-2015
Abstract: With the advancements in the internet, searching the web is much significant. To retrieve the web pages automatically, web crawler is used. Web crawler feeds on a seed URL and visits all the subsequent URLs to gather information. The processed information is stored in the documents with a file known as JSON documents. Since the number of pages retrieved by Web crawler is in millions, there is a need to find the association between web pages and this can be done with the use of an efficient data mining technique called association rule mining. In this thesis the frequent items are found using Apriori algorithm and association rules are formed using these frequent items. We have used a crawler that crawls a recipe site and proposed a technique to find out the similarity from the set of data related to the recipe items. Then from the structured data of JSON file, association rules are predicted. These association rules are of much significance and can be used to obtain data and solve queries in many desktop as well as web applications.
Description: ME-CSE-Thesis
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
3757.pdf2.49 MBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.