Similarity analysis of web crawled data

Garg, Parul

Similarity analysis of web crawled data

Files

3757.pdf (2.44 MB)

Date

2015-09-03T12:52:06Z

Authors

Garg, Parul

Supervisors

Arora, Vinay

Abstract

With the advancements in the internet, searching the web is much significant. To retrieve the web pages automatically, web crawler is used. Web crawler feeds on a seed URL and visits all the subsequent URLs to gather information. The processed information is stored in the documents with a file known as JSON documents. Since the number of pages retrieved by Web crawler is in millions, there is a need to find the association between web pages and this can be done with the use of an efficient data mining technique called association rule mining. In this thesis the frequent items are found using Apriori algorithm and association rules are formed using these frequent items. We have used a crawler that crawls a recipe site and proposed a technique to find out the similarity from the set of data related to the recipe items. Then from the structured data of JSON file, association rules are predicted. These association rules are of much significance and can be used to obtain data and solve queries in many desktop as well as web applications.

Description

ME-CSE-Thesis

Keywords

URI

http://hdl.handle.net/10266/3757

Collections

Masters Theses@CSED

Full item page

Similarity analysis of web crawled data

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By