Web-Crawling Approaches in Search Engines

dc.contributor.authorSharma, Sandeep
dc.contributor.supervisorKumar, Ravinder
dc.date.accessioned2008-09-01T12:32:19Z
dc.date.available2008-09-01T12:32:19Z
dc.date.issued2008-09-01T12:32:19Z
dc.description.abstractThe number of web pages is increasing into millions and trillions around the world. To make searching much easier for users, web search engines came into existence. Web Search engines are used to find specific information on the World Wide Web. Without search engines, it would be almost impossible for us to locate anything on the Web unless or until we know a specific URL address. Every search engine maintains a central repository or databases of HTML documents in indexed form. Whenever a user query comes, searching is performed within that database of indexed web pages. The size of repository of every search engine can’t accommodate each and every page available on the WWW. So it is desired that only the most relevant pages are stored in the database so as to increase the efficiency of search engines. To store most relevant pages from the World Wide Web, a suitable and better approach has to be followed by the search engines. This database of HTML documents is maintained by special software .The software that traverses web for capturing pages is called “Crawlers” or “Spiders”. In this thesis, we discuss the basics of crawlers and the commonly used techniques of crawling the web. We discuss the pseudo code of basic crawling algorithms, their implementation in C language along with simplified flowcharts. In this work, firstly we describe how search engine works along with implementation of various crawling algorithms into programs using C language and then the implementation results of various crawling algorithms have been discussed and a comparison study is given in a table in last.en
dc.format.extent1379320 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/620
dc.language.isoen_USen
dc.subjectCrawlersen
dc.subjectFrontieren
dc.subjectFish Search Algorithmen
dc.titleWeb-Crawling Approaches in Search Enginesen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
T620.pdf
Size:
1.32 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: