How does dogpile updating his features
Most search engines, such as Lycos, Hotbot, and the like, acquire an index using a spidering program to retrieve a web page, typically through the usual HTTP protocol, and extract the data from this page that is to be indexed.
At the same time, links to other pages are noted, and the process is then repeated for the newly discovered links.
Most search engines rely on an index of web pages that the engine is able to search on the basis of query terms, such as key words.
The index is normally provided by a database of web addresses, ie universal resource locators (URLs), and terms of text information are used to represent each page of text placed on the web.
The index is accessible from servers, and includes page entries including a program address for a program for generating a dynamic page and input tuples for submission to the program to generate the page, and search entries identifying the dynamic pages and identifying the tuples corresponding to search terms.
A search engine operable on the index, is able to access the search entries to identify dynamic pages corresponding to search terms of a search query, and access the page entries to generate addresses for the dynamic pages identified, the addresses being generated on the basis of the program address and the tuples.
This leads to better coverage of the web, since some search engines include data from sites not visited by other search engines.
Some search engines, for example Meta Crawler and Dogpile, upon receiving a search request, search the search sites of other search engines, receive the results from these and consolidate the results for display to the user (this is known as a metasearch).A method for generating an index of data available from a server, including processing data on the server to access data items for a central index, the data items including network addresses and terms, compiling an index file including the data items, and transmitting the index file to the central index.The processing may include locating database query statements in the data, and the data items then include input tuples for the statements.This is performed automatically, and so no co-operation is required from the administrator or author of the web-site visited.However, the pages are all brought to a central site for processing, and due to the volume of data to be processed it is common that a new or modified page will wait for several months before being processed.
The pages are then available to a spidering program for retrieval.