Google Search Appliance software version 4.6
Google Mini software version 4.6
Posted July 2007
This document provides an overview of how the Google Search Appliance and the Google Mini crawl and index enterprise content.
For the Google Search Appliance, information about continuous crawl applies to software version 4.2, and information about full crawl and file system crawl applies to software version 4.6 and later.
For the Google Mini, all information applies to software version 4.4 and later.
Before crawling starts, you must use the Crawl and Index > Crawl Schedule page to select one of the following the crawl modes:
If you select Full crawl, you must schedule a time for crawling to start. If you select and save Continuous crawl mode, crawling starts and a link to the Freshness Tuning page appears.
For complete information about the Crawl and Index > Crawl Schedule page, click Help Center > Crawl and Index > Crawl Schedule in the Admin Console.
The search appliance starts crawling in full crawl mode according to a schedule that you specify using the Crawl and Index > Crawl Schedule page in the Admin Console.
The following figure shows the Crawl schedule for full crawls group on Crawl and Index > Crawl Schedule page. Using this page, you can specify:
Using the Status and Reports > Crawl Status page, you can:
When you stop crawling:
When you pause crawling, the search appliance only stops crawling documents in the index. Connectivity tests still run every 30 minutes for Start URLs. You may notice this activity in access logs.
For complete information about the Status and Reports > Crawl Status page, click Help Center > Status and Reports > Crawl Status in the Admin Console.
Occasionally, there may be a recently changed URL that you want to be recrawled sooner than the search appliance has it scheduled for recrawling. Provided that the URL has been previously crawled, you can submit it for immediate recrawling from the Admin Console using one of the following methods:
URLs that you submit for recrawling are treated the same way as new, uncrawled URLs in the crawl queue. They are scheduled to be crawled in order of Enterprise PageRank, and before any URLs that the search appliance has automatically scheduled for recrawling.
When you trigger a URL to be recrawled using the Admin Console:
So there may be a time lag of up to 20 hours before the URL that you have submitted is recrawled.
Last modified:
Updated on