If your intranet has pages that are behind a login form or that require cookies to return the correct content, you can set up rules to provide the crawler with access to those pages. Then you can test your rules before you perform the crawl.
You enter a URL for the login page and then a URL pattern for that area without the page name, but including the final slash to provide correct path information. For information on crawling secure content, see CA certificates for protected content.
For example, the URL for the login page might be
http://mycompany.com/support/login.html
and the URL pattern then would be
http://mycompany.com/support/
Note: Cookies are supported by the following protocols: HTTP and HTTPS. While other protocols–such as SMB–have URL patterns, they do not support cookies.
When you create the rule, you see a wizard page that displays your login page. Enter the username and password credentials and submit the login form. The wizard captures that information, as well as its action (POST or GET), and other values, depending on the available form fields.
After a rule is set up, you can change the username or password, or change the length of time allowed for authentication to occur before the rule expires. The default is 300 seconds (5 minutes).
To set up a rule for crawling pages behind login pages or pages that require cookies:
- Click Crawl and Index, and then click Cookie Sites.
- Enter the URL of the login page.
- Enter the URL pattern (path) for the login page, including the final slash.
- Click Create a New Cookie Rule. A new browser window opens, displaying your login page in the lower half.
- Log in to your site using your username and password.
Note: If you mistype the username or password, extra actions may be recorded and displayed on the Cookie Sites page. To avoid that, close the Cookie Login Wizard window, and restart the process on the Cookie Sites page.
- Make sure that the page you expect to see appears.
- Click the Save Cookie Rule and Close Window button. You are returned to the Cookie Sites page where your new rule is listed with its pattern, action, and form fields.
- Click the Save Cookie Rules Configuration button.
To edit existing cookie rules:
- Change the username and/or password, if necessary.
- Change the time to wait for authentication by entering a new number of seconds or minutes, if you wish.
- Click the Save Cookie Rules Configuration button.
To delete an existing cookie rule:
- Select the Delete Rule checkbox to the right of the rule.
- Click Save Cookie Rules Configuration.
CA certificates for protected content
Your search appliance uses certificate authorities (CA) to authenticate user
credentials before allowing protected search results to be viewed. CA certificates
are used for servers accessed through HTTPS. If URLs and URL patterns for cookie
rules specify HTTPS servers, CA certificates must be in place for each HTTPS
server. For more information, see Serving > Forms Authentication.