Similarity analysis of web crawled data

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the advancements in the internet, searching the web is much significant. To retrieve the web pages automatically, web crawler is used. Web crawler feeds on a seed URL and visits all the subsequent URLs to gather information. The processed information is stored in the documents with a file known as JSON documents. Since the number of pages retrieved by Web crawler is in millions, there is a need to find the association between web pages and this can be done with the use of an efficient data mining technique called association rule mining. In this thesis the frequent items are found using Apriori algorithm and association rules are formed using these frequent items. We have used a crawler that crawls a recipe site and proposed a technique to find out the similarity from the set of data related to the recipe items. Then from the structured data of JSON file, association rules are predicted. These association rules are of much significance and can be used to obtain data and solve queries in many desktop as well as web applications.

Description

ME-CSE-Thesis

Citation

Endorsement

Review

Supplemented By

Referenced By