After you have specified which URLs to crawl and configured your servers for crawling, the Crawl and Index > Crawl Schedule panel lets you select a crawl mode, and for the full crawl mode, specify the times for crawling your servers.
Crawl Modes
The search appliance has the following crawl modes:
- Continuous crawl. Select this mode if you prefer to permit the crawler to automatically locate
and index updated content.
- Full crawl. Select this mode if you want detailed control
over the time and the duration of all crawls. A full crawl proceeds until one of the following happens:
- The time limit that you specified has passed.
- The crawler reaches the document limit specified by your license.
- The crawler reaches the limit that you set on the Crawl and Index > Host Load Schedule page, under Maximum Number of URLs to Crawl.
- The crawler has crawled all reachable URLs.
- Someone clicks the Stop Crawl button on the Status and Reports > Crawl Status page.
Both modes of crawling use the same URLs that are configured on the Crawl URLs page.
To select a crawl mode:
- Click the radio button for either Continuous crawl or Full crawl mode.
- Click the Save Crawl Mode button.
Once your selection is saved, the bottom part of the page will show information relevant to the chosen crawl mode--either the crawl schedule for full crawls, or freshness tuning for continuous crawls. To learn more about configuring full crawls, see the Crawl schedule for full crawls section.
To view the crawl status:
You can view the status of a scheduled crawl in the Status and Reports > Crawl Status page. To view the most recent status, click your web browser's Refresh button.
To start a crawl:
If your search appliance is on continuous crawl mode, you can start a crawl immediately by clicking the Reseume Crawl button in the Status and Reports > Crawl Status page. The crawl starts in fifteen minutes and the Crawl Status page will show the change in status then.
If your search appliance is on a full crawl mode, the crawl will begin at the time you have scheduled.
To stop a crawl:
You can stop a continuous crawl at any time. On the Status and Reports > Crawl Status page, click the Pause Crawl button to stop the crawl. If you want to stop a full crawl, change the crawl mode to continuous crawl, and then pause the crawl in the Crawl Status page.
When a crawl is stopped, the documents that were crawled will remain in the index. The index will then contain some old documents and some newly crawled documents.
Crawl schedule for full crawls
The crawl schedule allows you to integrate the crawl with any other system activities that occur on your servers, such as routine system backups.
You can create a crawl schedule and also limit the crawl to a specific duration, which is expressed in hours and minutes. If you set a crawl time limit, the crawler runs for the specified number of hours and minutes or until it crawls all of the URLs. For example, if you set a time limit of two hours and schedule a start time of 2 a.m., the crawler will crawl your servers from 2 a.m. to 4 a.m., unless it finishes crawling before the two-hour limit.
Scheduling a crawl:
- To select a day, select the day from the Begin Crawl on drop-down list.
- To select the time when you want the crawl to begin, select the hour form the Start Hour drop-down list and the minutes from the Start Minute drop-down list.
- To limit the duration of the crawl, select the duration from the drop-down list. You can set the crawling to continue until all documents have been indexed by selecting Crawl Until Completed, or you can select a length of time up to 24 hours and 45 minutes.
- If you want the search appliance to restart crawling (rather than continuing from where the previous crawl ended) and build the index anew or if you want your documents to be crawled in the known URLs in PageRank™ order, select the restart crawl check box on the right side of Duration Hour drop-down list.
Documents that ranked higher in PageRank™ are crawled first. If the scheduled time is too short to crawl all of the content, then the index might contain newly crawled versions of some documents and older versions of other documents.
If you do not select the check box, the crawler will start where it left off before. So if you schedule a crawl every day at 2:00 am for two hours, the crawler will eventually index everything, and all documents will be crawled by the order of the document date.
- Click the Save Crawl Schedule button.
You can create more scheduled full crawls by clicking the Add More Rows button. This adds more rows for additional entries to the schedule.