| Help Center
Home
Crawl and Index
Crawl URLs
Databases
Feeds
Crawl Schedule
Crawler Access
Proxy Servers
Cookie Sites
Forms Authentication
HTTP Headers
Duplicate Hosts
Document Dates
Host Load Schedule
Index Rollback
Freshness Tuning
Collections
Serving
Status and Reports
Administration
More Information
|
![]() |
![]() |
Crawl and Index > Duplicate Hosts
The Duplicate Hosts page lets you prevent the recrawling of content that resides on
mirrored servers. For example, if you have load-balancing servers in your system
that serve the same content, you will not want all these servers crawled since they
contain only duplicates of what you are already crawling. Entries on this page
identify the duplicate hosts so that any links found during the crawl that point
to the duplicate host are treated as if they are pointing to the corresponding
canonical host.
The following rules also apply to entries on this page:
- Only one <canonical_host> entry is permitted per box in the Canonical Host column.
- The <canonical_host> must be a fully qualified host name.
- Multiple <duplicate_host> entries are permitted in the same box for the corresponding canonical host.
- Each box in the Duplicate Host column must contain at least one entry.
Examples:
| Canonical Host |
Duplicate Host(s) |
| www.your-company.com |
www.offsite.com web.offsite.com |
| www2.your-company.com |
website.example |
|