Back to Home | Help Center | Log Out
 Help Center
 
Help Center

Home

Crawl and Index
  Crawl URLs
  Databases
  Feeds
  Crawl Schedule
  Crawler Access
  Proxy Servers
  Cookie Sites
  Forms Authentication
  HTTP Headers
  Duplicate Hosts
  Document Dates
  Host Load Schedule
  Index Rollback
  Freshness Tuning
  Collections

Serving

Status and Reports

Administration

More Information

Crawl and Index > Cookie Sites

If your intranet has pages that are behind a login form or that require cookies to return the correct content, you can set up rules to provide the crawler with access to those pages. Then you can test your rules before you perform the crawl. You enter a URL for the login page and then a URL pattern for that area without the page name, but including the final slash to provide correct path information. For information on crawling secure content, see CA certificates for protected content.

For example, the URL for the login page might be

http://mycompany.com/support/login.html

and the URL pattern then would be

http://mycompany.com/support/

Note: Cookies are supported by the following protocols: HTTP and HTTPS. While other protocols–such as SMB–have URL patterns, they do not support cookies.

When you create the rule, you see a wizard page that displays your login page. Enter the username and password credentials and submit the login form. The wizard captures that information, as well as its action (POST or GET), and other values, depending on the available form fields. After a rule is set up, you can change the username or password, or change the length of time allowed for authentication to occur before the rule expires. The default is 300 seconds (5 minutes).

To set up a rule for crawling pages behind login pages or pages that require cookies:

  1. Click Crawl and Index, and then click Cookie Sites.
  2. Enter the URL of the login page.
  3. Enter the URL pattern (path) for the login page, including the final slash.
  4. Click Create a New Cookie Rule. A new browser window opens, displaying your login page in the lower half.
  5. Log in to your site using your username and password.

    Note: If you mistype the username or password, extra actions may be recorded and displayed on the Cookie Sites page. To avoid that, close the Cookie Login Wizard window, and restart the process on the Cookie Sites page.

  6. Make sure that the page you expect to see appears.
  7. Click the Save Cookie Rule and Close Window button. You are returned to the Cookie Sites page where your new rule is listed with its pattern, action, and form fields.
  8. Click the Save Cookie Rules Configuration button.

To edit existing cookie rules:

  1. Change the username and/or password, if necessary.
  2. Change the time to wait for authentication by entering a new number of seconds or minutes, if you wish.
  3. Click the Save Cookie Rules Configuration button.

To delete an existing cookie rule:

  1. Select the Delete Rule checkbox to the right of the rule.
  2. Click Save Cookie Rules Configuration.

CA certificates for protected content

Your search appliance uses certificate authorities (CA) to authenticate user credentials before allowing protected search results to be viewed. CA certificates are used for servers accessed through HTTPS. If URLs and URL patterns for cookie rules specify HTTPS servers, CA certificates must be in place for each HTTPS server. For more information, see Serving > Forms Authentication.


 
© Google Inc. 2007