Similarity analysis of web crawled data

dc.contributor.authorGarg, Parul
dc.contributor.supervisorArora, Vinay
dc.date.accessioned2015-09-03T12:52:06Z
dc.date.available2015-09-03T12:52:06Z
dc.date.issued2015-09-03T12:52:06Z
dc.descriptionME-CSE-Thesisen
dc.description.abstractWith the advancements in the internet, searching the web is much significant. To retrieve the web pages automatically, web crawler is used. Web crawler feeds on a seed URL and visits all the subsequent URLs to gather information. The processed information is stored in the documents with a file known as JSON documents. Since the number of pages retrieved by Web crawler is in millions, there is a need to find the association between web pages and this can be done with the use of an efficient data mining technique called association rule mining. In this thesis the frequent items are found using Apriori algorithm and association rules are formed using these frequent items. We have used a crawler that crawls a recipe site and proposed a technique to find out the similarity from the set of data related to the recipe items. Then from the structured data of JSON file, association rules are predicted. These association rules are of much significance and can be used to obtain data and solve queries in many desktop as well as web applications.en
dc.description.sponsorshipComputer Science, Thapar University, Patialaen
dc.format.extent2550831 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/3757
dc.language.isoen_USen
dc.subjectSimilarity analysisen
dc.subjectWeb crawlingen
dc.subjectassociation rule miningen
dc.subjectweb crawleren
dc.subjectdata miningen
dc.subjectcomputer scienceen
dc.subjectCSEDen
dc.titleSimilarity analysis of web crawled dataen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3757.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.78 KB
Format:
Item-specific license agreed upon to submission
Description: