Web-Crawling Approaches in Search Engines
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The number of web pages is increasing into millions and trillions around the world. To
make searching much easier for users, web search engines came into existence. Web
Search engines are used to find specific information on the World Wide Web. Without
search engines, it would be almost impossible for us to locate anything on the Web unless
or until we know a specific URL address. Every search engine maintains a central
repository or databases of HTML documents in indexed form. Whenever a user query
comes, searching is performed within that database of indexed web pages. The size of
repository of every search engine can’t accommodate each and every page available on
the WWW. So it is desired that only the most relevant pages are stored in the database so
as to increase the efficiency of search engines. To store most relevant pages from the
World Wide Web, a suitable and better approach has to be followed by the search
engines. This database of HTML documents is maintained by special software .The
software that traverses web for capturing pages is called “Crawlers” or “Spiders”.
In this thesis, we discuss the basics of crawlers and the commonly used techniques of
crawling the web. We discuss the pseudo code of basic crawling algorithms, their
implementation in C language along with simplified flowcharts.
In this work, firstly we describe how search engine works along with implementation of
various crawling algorithms into programs using C language and then the implementation
results of various crawling algorithms have been discussed and a comparison study is
given in a table in last.
