Web-Crawling Approaches in Search Engines

Sharma, Sandeep

Web-Crawling Approaches in Search Engines

dc.contributor.author	Sharma, Sandeep
dc.contributor.supervisor	Kumar, Ravinder
dc.date.accessioned	2008-09-01T12:32:19Z
dc.date.available	2008-09-01T12:32:19Z
dc.date.issued	2008-09-01T12:32:19Z
dc.description.abstract	The number of web pages is increasing into millions and trillions around the world. To make searching much easier for users, web search engines came into existence. Web Search engines are used to find specific information on the World Wide Web. Without search engines, it would be almost impossible for us to locate anything on the Web unless or until we know a specific URL address. Every search engine maintains a central repository or databases of HTML documents in indexed form. Whenever a user query comes, searching is performed within that database of indexed web pages. The size of repository of every search engine can’t accommodate each and every page available on the WWW. So it is desired that only the most relevant pages are stored in the database so as to increase the efficiency of search engines. To store most relevant pages from the World Wide Web, a suitable and better approach has to be followed by the search engines. This database of HTML documents is maintained by special software .The software that traverses web for capturing pages is called “Crawlers” or “Spiders”. In this thesis, we discuss the basics of crawlers and the commonly used techniques of crawling the web. We discuss the pseudo code of basic crawling algorithms, their implementation in C language along with simplified flowcharts. In this work, firstly we describe how search engine works along with implementation of various crawling algorithms into programs using C language and then the implementation results of various crawling algorithms have been discussed and a comparison study is given in a table in last.	en
dc.format.extent	1379320 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10266/620
dc.language.iso	en_US	en
dc.subject	Crawlers	en
dc.subject	Frontier	en
dc.subject	Fish Search Algorithm	en
dc.title	Web-Crawling Approaches in Search Engines	en
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: T620.pdf
Size:: 1.32 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.79 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses@CSED