| Introduction Major
Search Engines How They Work How
They Rank How
We Submit 
|
How Search Engines Work
The term "search engine" is often used
generically to describe both crawler-based search engines and human-powered directories.
These two types of search engines gather their listings in radically different
ways. Crawler-Based Search Engines
Crawler-based search engines, such as HotBot, create
their listings automatically. They "crawl" or "spider" the
web, then people search through what they have found. If
you change your web pages, crawler-based search engines eventually find these
changes, and that can affect how you are listed. Page titles, body copy and other
elements all play a role. Human-Powered Directories
A human-powered directory, such as Yahoo, depends on
humans for its listings. You submit a short description to the directory for your
entire site, or editors write one for sites they review. A search looks for matches
only in the descriptions submitted. Changing your web
pages has no effect on your listing. Things that are useful for improving a listing
with a search engine have nothing to do with improving a listing in a directory.
The only exception is that a good site, with good content, might be more likely
to get reviewed for free than a poor site. "Hybrid
Search Engines" Or Mixed Results
In the web's early days, it used to be that
a search engine either presented crawler-based results or human-powered listings.
Today, it is extremely common for both types of results to be presented. Usually,
a hybrid search engine will favor one type of listing over another. For example,
Yahoo is more likely to present human-powered listings. However, it does also
present crawler-based results (as provided by Google), especially for more obscure
queries. | | The
Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements.
First is the spider, also called the crawler. The spider visits a web page, reads
it, and then follows links to other pages within the site. This is what it means
when someone refers to a site being "spidered" or "crawled."
The spider returns to the site on a regular basis, such as every month or two,
to look for changes. Everything the spider finds goes
into the second part of the search engine, the index. The index, sometimes called
the catalog, is like a giant book containing a copy of every web page that the
spider finds. If a web page changes, the book is updated with the new information. Sometimes
it can take a while for new pages or changes that the spider finds to be added
to the index. Thus, a web page may have been "spidered" but not yet
"indexed." Until it is indexed -- added to the index -- it is not available
to those searching with the search engine. Search engine
software is the third part of a search engine. This is the program that sifts
through the millions of pages recorded in the index to find matches to a search
and rank them in order of what it believes is most relevant. You can learn more
about how search engine software ranks web pages on the aptly-named How
Search Engines Rank Web Pages page. Major Search
Engines: The Same, But Different All crawler-based search
engines have the basic parts described above, but there are differences in how
these parts are tuned. That is why the same search on different search engines
often produces different results. Some of the significant differences between
the major crawler-based search engines are summarized on the Search Engine Features
Page. Information on this page has been drawn from the help pages of each search
engine, along with knowledge gained from articles, reviews, books, independent
research, tips from others and additional information received directly from the
various search engines. |