Web-Crawling Approaches in Search Engines

Sharma, Sandeep

Web-Crawling Approaches in Search Engines

Files

T620.pdf (1.32 MB)

Date

2008-09-01T12:32:19Z

Authors

Sharma, Sandeep

Supervisors

Kumar, Ravinder

Abstract

The number of web pages is increasing into millions and trillions around the world. To make searching much easier for users, web search engines came into existence. Web Search engines are used to find specific information on the World Wide Web. Without search engines, it would be almost impossible for us to locate anything on the Web unless or until we know a specific URL address. Every search engine maintains a central repository or databases of HTML documents in indexed form. Whenever a user query comes, searching is performed within that database of indexed web pages. The size of repository of every search engine can’t accommodate each and every page available on the WWW. So it is desired that only the most relevant pages are stored in the database so as to increase the efficiency of search engines. To store most relevant pages from the World Wide Web, a suitable and better approach has to be followed by the search engines. This database of HTML documents is maintained by special software .The software that traverses web for capturing pages is called “Crawlers” or “Spiders”. In this thesis, we discuss the basics of crawlers and the commonly used techniques of crawling the web. We discuss the pseudo code of basic crawling algorithms, their implementation in C language along with simplified flowcharts. In this work, firstly we describe how search engine works along with implementation of various crawling algorithms into programs using C language and then the implementation results of various crawling algorithms have been discussed and a comparison study is given in a table in last.

Keywords

Crawlers, Frontier, Fish Search Algorithm

URI

http://hdl.handle.net/10266/620

Collections

Masters Theses@CSED

Full item page

Web-Crawling Approaches in Search Engines

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By