Enrichment in Performance of Focused Web Crawlers

dc.contributor.authorRouthu, Ravikiran
dc.contributor.supervisorKumar, Ravinder
dc.date.accessioned2010-09-16T11:51:58Z
dc.date.available2010-09-16T11:51:58Z
dc.date.issued2010-09-16
dc.descriptionM.E. (CSED)en
dc.description.abstractThe World Wide Web (WWW) is an interlinked collection of billions of documents formatted using HTML. Since its inception in 1990, WWW has grown exponentially in size. As of today, it is estimated that it contains approximately 50 billion publicly accessible/index able web documents distributed all over the world on thousands of web servers. It is very difficult to search information from such a huge collection of web documents on WWW as the web pages/documents are not organized as books on shelves in a library, nor are web pages completely catalogued at one central location. It is not guaranteed that users will be able to retrieve information even after knowing where to look for information by knowing its URLs as web is constantly changing. The search engine is a tool that solves these problems by finding specific information on the WWW. Internet would have not become so popular if search engines would not have been developed and it would be almost impossible to locate anything on the web unless or until know a specific URL address. Most of these search engines save a copy of the web pages in their central repository and then make appropriate indexes of them for later search/retrieval of information. Due to the limited storage of databases/repositories, search engine can’t accommodate each and every page available on the WWW. So the databases of search engines are maintained with the help of some software, to store most relevant pages from the WWW. The software that traverses the web and downloads web pages is called “Crawler”. Web crawlers are also known as “spiders”, ”robots”, ”ants”, ”automatic indexers” etc. In this thesis, Crawler basics, the commonly used Web crawling techniques, the pseudo code of various basic crawling algorithms and their implementations in C language along with simplified flowcharts are discussed.en
dc.description.sponsorshipCSEDen
dc.format.extent1718968 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/1261
dc.language.isoen_USen
dc.subjectCrawlersen
dc.subjectSearch Enginesen
dc.subjectWWWen
dc.titleEnrichment in Performance of Focused Web Crawlersen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1261.pdf
Size:
1.6 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: