Back to Home | Help Center | Log Out
 Help Center
 
Help Center

Home

Crawl and Index
  Crawl URLs
  Databases
  Feeds
  Crawl Schedule
  Crawler Access
  Proxy Servers
  Cookie Sites
  Forms Authentication
  HTTP Headers
  Duplicate Hosts
  Document Dates
  Host Load Schedule
  Index Rollback
  Freshness Tuning
  Collections

Serving

Status and Reports

Administration

More Information

Crawl and Index > Duplicate Hosts

The Duplicate Hosts page lets you prevent the recrawling of content that resides on mirrored servers. For example, if you have load-balancing servers in your system that serve the same content, you will not want all these servers crawled since they contain only duplicates of what you are already crawling. Entries on this page identify the duplicate hosts so that any links found during the crawl that point to the duplicate host are treated as if they are pointing to the corresponding canonical host.

The following rules also apply to entries on this page:

  • Only one <canonical_host> entry is permitted per box in the Canonical Host column.
  • The <canonical_host> must be a fully qualified host name.
  • Multiple <duplicate_host> entries are permitted in the same box for the corresponding canonical host.
  • Each box in the Duplicate Host column must contain at least one entry.

Examples:

Canonical Host Duplicate Host(s)
www.your-company.com www.offsite.com web.offsite.com
www2.your-company.com website.example

 
© Google Inc. 2007