My favorites | English | Sign in

Faster apps faster - GWT 2.0 with Speed Tracer New!

Google Search Appliance

Managing Search for Controlled-Access Content: Crawl, Index, and Serve

Google Search Appliance software version 6.0
Posted June 2009
Revised September 2009: Additions and corrections to IWA/Kerberos information.

This chapter describes how a search appliance discovers content on your servers. It provides an overview of authentication and authorization methods used during crawl and index, and the methods available during serve. It also provides basic instructions for configuring a search appliance to crawl, index, and serve controlled-access content.

Contents

  1. Authentication, Authorization, and Controlled-Access Content
  2. Crawl and Index for Controlled-Access Content
    1. How a Search Appliance Indexes Controlled-Access Content
    2. Configuring the Search Appliance for HTTP Basic or NTLM HTTP
    3. Configuring the Search Appliance for Forms Authentication
  3. Secure Content and Public Content
    1. How a Search Appliance Labels Controlled-Access Content Sources as Public or Secure
    2. How a Search Appliance Determines What to Display in Public Search Results
  4. Serve for Controlled-Access Content
    1. How a Search Appliance Determines a User's Identity and Authorization During Serve
    2. Policy Access Control Lists
    3. HTTP Basic or NTLM HTTP with Authentication Against an LDAP Server
      1. Integrating the Search Appliance with an LDAP Server
    4. IWA (Integrated Windows Authentication) / Kerberos Authentication
      1. Enrolling the Search Appliance in the KDC Domain and Creating a Keytab File
      2. Configuring Kerberos Authentication in the Admin Console
      3. Configuring Web Browsers for Kerberos Authentication
      4. More Kerberos Information
    5. HTML Forms Authentication
      1. Enabling Forms Authentication With a Sample Protected URL
      2. Enabling Forms Authentication Through an External Login Server
    6. The SAML Authentication and Authorization Service Provider Interface (SPI)
      1. Overview of Authentication and Authorization with the SPI
      2. Enabling the Authentication SPI on the Google Search Appliance
      3. Enabling the Authorization SPI on the Google Search Appliance
  5. Digital Certificates and Certification Authorities
    1. Enabling Crawl and Serve over HTTPS
    2. User Authentication by X.509 Certificate
      1. Enabling User Authentication by X.509 Certificate during Serve
  6. How to Exclude Controlled-Access Content Sources from Search
    1. Excluding Controlled-Access Content from the Index
    2. Removing Controlled-Access Content from Search Results

Authentication, Authorization, and Controlled-Access Content

Authentication is the process of verifying the identity of a user, a system, or a service. Authorization is the process that determines whether an authenticated user, system, or service has permission to perform a task. The term "controlled-access content" represents any information that should not be displayed unless the user who requests the content is authenticated and has authorization to view the information.

To make controlled-access content discoverable through search, the search appliance mediates two kinds of access:

  • Access that enables the crawler to discover content on your servers and index any controlled-access content found there.
  • Access that enables an individual user to perform a search and to view content that exists in the index.

All controlled-access content that is available to the search appliance is indexed. The search appliance then determines whether to display the controlled-access content in response to each search request.

When a user issues a search request for controlled content, the search appliance impersonates the user. The search appliance verifies the user's identity and determines whether the user has authorization to view controlled-access content. This check is performed before the search appliance displays any content in search results.

A Google Search Appliance provides additional methods for enabling authentication and authorization that do not require user impersonation. These are discussed in the section "The SAML Authentication and Authorization Service Provider Interface (SPI)".

Back to top

Crawl and Index for Controlled-Access Content

The search appliance indexes all content that can be crawled and indexed. This includes both controlled-access content and content that is available to anyone. Once you set up the search appliance with access credentials, it will maintain a copy of all crawled content in the index. The index allows the search appliance to determine relevance and display secure results when a user performs a search. Users only see the secure results that they are authorized to view.

How a Search Appliance Indexes Controlled-Access Content

A search appliance discovers and indexes controlled-access content in the same way that it indexes all other content: by performing a crawl through the content sources that are available to the web crawler, file system crawler, relational database crawler, and the XML content feed interface.

When you define content sources, you must perform additional steps in the Admin Console to give the search appliance access to controlled-access content:

  1. Provide the search appliance with URL patterns that match the controlled content.
  2. Give the search appliance access credentials to use with those patterns.

You can specify a different set of access credentials for each URL pattern in the Admin Console. The means by which you provide these credentials is different for each kind of authentication, but the general process remains the same.

Figure 1: The search appliance uses URL patterns and credentials to crawl and index content.

Configuring the Search Appliance for HTTP Basic or NTLM HTTP

When you set up the search appliance to access controlled-access content with HTTP Basic or NTLM HTTP, consider the following points. You can find more information on these topics in the Admin Console Help Center.

  • The Crawl and Index process for content that uses HTTP Basic and NTLM HTTP is controlled by parameters under Crawl and Index > Crawler Access.
  • If your domain supports Integrated Windows Authentication / Kerberos authentication, you can additionally configure this under Crawler Access as well. If a user cannot authenticate using a Kerberos ticket, authentication reverts to HTTP Basic or NTLM HTTP. To learn more about enabling Kerberos authentication, see IWA (Integrated Windows Authentication) / Kerberos Authentication.
  • HTTP Basic and NTLM HTTP do not validate a user's credentials before checking authorization. If you are not using Kerberos authentication, and want to enable the search appliance to validate a user's login name and password by using an LDAP server, enable Directory Integration under Administration > LDAP Setup.
  • When the search appliance prompts users for credentials for NTLM HTTP, it assumes that the domain is supplied along with the username, in the format "DOMAIN\username." Users must provide the domain name each time that they log in.
  • To determine whether a user is authorized to view secure content, the search appliance sends a HEAD request to the content server for each document in the potential search results. The user's credentials are included in the authorization header for the HEAD request. No special configuration is required for serve in this case.
  • Because HTTP Basic passes user credentials as clear text, Google recommends that you use HTTPS for all requests for controlled-access content. To force the search appliance to perform crawl, index, and serve over HTTPS, see Protecting the User's Credentials for Serve with HTTP Basic and NTLM HTTP.
  • To use HTTPS for all requests for controlled-access content, configure a digital certificate for the search appliance under Administration > SSL Settings.

Configuring the Search Appliance for Forms Authentication

The search appliance supports cookie based access. For sites that require the use of a cookie for authentication during crawl and index, you can define your content with a forms authentication rule.

  • Define a rule under Crawl and Index > Forms Authentication for controlled-access content sources that require the search appliance to obtain a session cookie from a login form. Content accessed through a forms authentication site can be secure or public during serve. In version 6.0 and later, a search appliance can have more than one forms authentication rule.
  • A forms authentication rule must generate at least one action for the search appliance to consider it valid. If a rule doesn't generate any action for a URL, the search appliance logs an error and doesn't crawl the URL again.
  • To use HTTPS for all requests for controlled-access content, configure the search appliance to enable certificate use. The digital certificate for the search appliance must be recognized by other servers, and the certificate authorities for all HTTPS-secured sites must be valid (that is, must not be out of date and must be for the designated server name). Configuration for certificate use is discussed under Digital Certificates and Certification Authorities in this guide, and in the online help for the search appliance.

Back to top

Secure Content and Public Content

Once controlled-access content is present in the index, the search appliance labels it as "secure" or "public":

  • If the content is labeled "public", any user with access to the search appliance can view links to content in response to a search query.
  • If the content is labeled "secure", the search appliance must authenticate the user and verify that the user has authorization to view the content before the search appliance includes links to the content in the search results.

It's important to understand that when controlled-access content is labeled as "public" in the index, it is shown in all users' search results. Because public search results are served from the index without checking for authorization, users can discover all public content that the search appliance has access to, regardless of whether they have authorization to view that content.

Finally, even though authorized users can see secure content in their search results, they may need to log in again to view the content on the server. To prevent this second request for credentials, the search appliance can pass a user's credentials to the content server through Forms Authentication with cookie forwarding, or by using the SAML Authorization SPI.

If your users have to log in multiple times to access content on different servers, consider implementing a single sign-on (SSO) system for authentication and authorization. The SSO server unifies the authentication process by first authenticating the user and then by authorizing the user on the web servers to which that user has access. Single sign-on servers are available from a variety of vendors such as Computer Associates SiteMinder, and Oracle Identity Management. SSO integration is only available for the search appliance.

How a Search Appliance Labels Controlled-Access Content Sources as Public or Secure

When crawling and indexing controlled-access content over HTTP or HTTPS, the search appliance assigns public or secure status based on the type of crawl, and the Make Public checkbox in the Admin Console. If the Make Public checkbox is selected on the Crawl and Index > Forms Authentication page, content is labeled as public. When the checkbox is cleared, content is labeled as secure.

The search appliance assigns status from these pages:

  • Forms Authentication: Forms Authentication sites are controlled-access content sources that require the search appliance to obtain a session cookie from a login form. Most commercial single sign-on (SSO) solutions use this method of authentication. A search appliance can have multiple Forms Authentication rules for crawl and index. Forms authentication also configures actions for sites that require a session cookie to allow the search appliance to crawl the site.
  • Web and content feeds: the authmethod attribute for the record specifies whether content is treated as public or secure.
    • To make feed content public, set the authmethod value to none. This is the default for content provided by feeds.
    • To make feed content secure, set the authmethod value to ntlm, httpbasic, or httpsso.
  • Databases: All content from a database is labeled as public during serve.
  • Connectors: If the connector supports authentication and authorization, and the Make Public checkbox is cleared, content from that connector is labeled as secure. In all other cases, content from a connector is labeled as public. To determine whether a connector instance supports authentication and authorization, look up Security Support in the Configuration guide for your connector.

How a Search Appliance Determines What to Display in Public Search Results

The front end configuration for a search results page controls how much information users see for each item in the search results. When you make controlled-access content available for public search, open the Page Layout Helper or the XSLT Stylesheet Editor for each front end and review the stylesheet configuration to ensure that you are not revealing more information than the user needs.

In the Page Layout Helper, these parameters under Search Results control which information is displayed:

  • When Snippet is selected, the <S> element is displayed in the search results. Clear the Snippet check box to remove snippets from the search results.
  • When Page Size is selected, the <C> element's page size SZ value is displayed in the search results. Clear the Page Size check box to remove information about the document's size from the search results.
  • When Modified Date is selected, the <CACHE_LAST_MODIFIED> element is included in the XML results. Clear the Modified Date check box to remove information about the document's freshness from the search results.
  • When Cache Link is selected, the <C> element is included in the XML results. Clear the Cache Link check box to remove the link to the cached document from the search results.
  • The Result Page navigation at the bottom of the page can indicate how many results are available. To prevent users from using this information to deduce how large your index is, choose the third option, which excludes both the "Gooooogle" navigation and the numbered references to search results pages.

In the XSLT Stylesheet Editor, these XSL variables control which information is displayed:

  • show_res_snippet specifies whether to display a snippet for each result. Set <xsl:variable name="show_res_snippet">0</xsl:variable> to remove snippets from the search results.
  • show_meta_tags specifies whether to display metadata for each result. Set <xsl:variable name="show_meta_tags">0</xsl:variable> to remove the document's metadata from the search results.
  • show_res_size specifies whether to display the page size for each result. Set <xsl:variable name="show_res_size">0</xsl:variable> to remove information about the document's size from the search results.
  • show_res_date specifies whether to display the last-modified date for each result. Set <xsl:variable name="show_res_date">0</xsl:variable> to remove information about the document's freshness from the search results.
  • show_res_cache specifies whether to display the cache link for each result. Set <xsl:variable name="show_res_cache">0</xsl:variable> to remove the link to the cached document from the search results.
  • choose_bottom_navigation specifies which navigation option to use at the bottom of the results page. Set <xsl:variable name="choose_bottom_navigation">simple</xsl:variable> to exclude both the "Gooooogle" navigation and the numbered references to search results pages.

Back to top

Serve for Controlled-Access Content

When a user performs a search request, the search appliance performs these checks before serving secure content:

  1. If you specify policy ACLs (access control list) rules, the search appliance checks to see if a policy ACL rules applies to a target document. If a rule applies to a URL pattern, then the search appliance grants or denies access to the URL based on the rule. If a rule does not apply then the search appliance continues to Step 2.
  2. The search appliance acquires the user's credentials to enable impersonation, or performs an authentication check to establish the user's identity. If Kerberos authentication is specified under Crawler Access, the search appliance will try Kerberos authentication before attempting other methods.
  3. The search appliance impersonates the user, or performs an authorization check to determine whether the user can view the content. If the user is authorized to view the content, the content will appear in the user's search results.

If a secure content item fails the second check, the search appliance removes it from the list of results.

How a Search Appliance Determines a User's Identity and Authorization During Serve

A search appliance uses these methods to establish the user's identity:

  • HTTP Basic or NTLM HTTP with authentication against an LDAP server
  • Kerberos authentication against a domain controller
  • HTML forms authentication
  • The SAML Authentication and Authorization Service Provider Interface (SPI)
  • Digital certificates and certification authorities

After the search appliance establishes a user's identity, the search appliance attempts to determine whether a user has access to the secure content that matches their search.

The search appliance performs an authorization check in this order:

  1. Check for Policy ACLs.

    If you specify a policy ACL rule, the search appliance checks the URL patterns in the rules against the URLs that are returned for in the search results. If the users and groups in the rule are permitted to view the results, then the results display. If users or groups are not permitted, then the URLs do not display. Steps 2 through 4 occur if a URL pattern does not match a policy ACL rule or SAML is not configured, but steps 2 through 4 do not occur if a URL pattern does match a policy ACL rule and the user is either permitted to view search results or receives a deny and does not see the search results.

  2. Check for SAML.

    If the search appliance is configured to use the SAML Authentication and Authorization SPI, the search appliance sends a SAML authorization request to the Policy Decision Point, using the identity obtained for the user during the serve authentication.

Otherwise,

  1. Check for HTTP Basic or NTLM.

    For secure content that was crawled using HTTP Basic or NTLM HTTP authentication, the search appliance performs a HEAD request for the document, using the credentials obtained for the user during serve authentication.

  2. Verify user authorization.

    For secure content that was crawled using Forms Authentication, the search appliance performs a GET request for 0 bytes of the document, using the credentials obtained for the user during serve authentication.

If the authorization check is successful, the secure content that matches the search query is included in the user's search results.

Policy Access Control Lists

A policy ACL (Access Control List) provides information to the search appliance about which users or groups have access to a specific URL. By specifying policy ACLs on a search appliance, you can enhance performance and reduce load. Policy ACLs speed up the process of authorization and reduce the load on the authorization servers that occurs from performing HEAD requests to a remote authorization server.

Policy ACLs typically store the results that would have occurred if the search appliance initiated a HEAD request to verify authorization. However policy ACLs can also be used to override the decision that would have been returned by a HEAD request. For example, if you put in a policy ACL rule that permits a group to see all documents at a URL, but at the source repository (that is, the HEAD request), there's a more fine-grained rule where only some in the group can view documents, then the behavior with the policy ACL rule is that everyone can see the search results, but only those who have access rights can click the links.

Policy ACLs require that you use an authentication method to establish the identity of the user or group that you specify in the Policy ACL rules.

For more information on policy ACLs, see the previous sections Serve for Controlled-Access Content and How a Search Appliance Determines a User's Identity and Authorization During Serve. See also the Google Search Appliance Policy ACL API Developer's Guide.

A policy ACL rule has two parts:

For example, suppose the eng (engineering) group is the only group that you permit to view all documents in the example.com/engsite page. To grant the engineering group access to the engsite page, specify a policy ACL rule:

example.com/engsite group:eng

When a search appliance executes a search, it attempts to match URLs that the search appliance retrieves from the index against policy ACLs. If a URL pattern matches the policy ACL rule, the search appliance applies the rule.

URL Pattern to Protect

You can specify a URL pattern to which you want to limit access. When a user performs a search query, the user can view this URL pattern in the search results if you list the user as either an allowed user or if the user is a member of an allowed group.

If more than one URL pattern matches the policy ACL, the search appliance chooses the best match in this order of precedence:

  1. Exact-Match URL Rules
  2. Coarse-Grained Rules:
Exact-Match URL Rules

If there is an exact-match URL pattern, it is the best match. An exact-match URL patterns begins with a caret (^) and ends with a dollar sign ($). The following example shows an exact-match URL pattern:

^http://www.example.com/mypage.html$
Coarse-Grained Rules

The coarse-grained rules consist of:

Prefix Patterns

If there is one or more matching prefix-patterns, the pattern with the longest prefix is the best match. A prefix-pattern specifies a (possibly partial) domain and a prefix of the path portion of the URL. The general format of a prefix pattern is:

<domain>/<prefix>

Examples of prefix patterns:

sales.example.com/products/
sales.example.com/products/mypage.html
sales.example.com/ 
General URL Patterns

If the only matching URL patterns are general patterns, the best match is undefined. The search appliance chooses one pattern for the URL pattern. A general URL pattern is any pattern other than an exact-match pattern or a prefix pattern.

Examples of general URL patterns are:

Example Description
*.doc A suffix pattern, matches any file ending with the .doc value.
contains:productThe product string can appear either in the host name, such as myproduct.com, or at the end of a URL and doesn't have to be a full word.
regexp:sid=[0-9A-Z]+/The URL has to contain a URL parameter with sid= followed by a value that contains either a digit or capital letter. The plus means one or more characters

Allowed Users or Groups

A policy ACL rule lists each user's or group's login ID. The user who enters a search can view the URL result if either of the following conditions is true:

  • The current user's name is one of the user names listed in the rule
  • The current user is a member of one of the groups listed

Otherwise, the user is denied permission to view the URL. The URL does not appear in the search results.

Determining Group Membership

To determine which group a user belongs to, the search appliance uses one of the following mechanisms:

  • LDAP

    If the search appliance is configured to use LDAP, then the search appliance gets group memberships from the LDAP server. To configure LDAP for a search appliance, use the Administration > LDAP Setup page.

  • Groups

    Using a groups database, you can import a list of groups and memberships lists for each group using the Google Data API.

If a groups database is present, the search appliance uses it to determine a user's group membership. However, you can use both mechanisms together. In this case, the search appliance gets all group memberships from both sources.

Adding a Policy ACL

To add a policy ACL:

  1. Click Serving > Policy ACLs.
  2. In the Add Policy ACL section in the URL Patterns field, type the pattern of the URL you want to restrict.
  3. Click Create New Policy ACLs.
  4. Under Allowed Users and/or Allowed Groups, type the names of users and/or groups that are permitted to view the URL. Type one name per line.
  5. Click Save.

To navigate to the previous page, click the Back to Policy ACL list link.

Note: The order that you specify users or groups is not significant. When you click Save, the search appliance sorts the login names into alphabetic order in each field.

Caution: Ensure that you do not separate login names with commas. The search appliance assumes that the comma is part of the login name.

Editing a Policy ACL

To add a policy ACL:

  1. Click Serving > Policy ACLs.
  2. Click the Edit link next to the policy ACL rule you want to edit.
  3. Make changes to the policy ACL.
  4. Click Save.

Deleting a Policy ACL

To delete a policy ACL:

  1. Click Serving > Policy ACLs.
  2. Click the Delete link next to the policy ACL rule you want to delete.

Importing a Configuration File

You can import a text file that contains policy ACL rules. The file you import overwrites all existing policy ACL rules.

Note: Before importing a configuration file, if you have defined policy ACL rules, click Export Search Results to back up your rules. The exported file is in the same format as a configuration file that you can import.

The format of each rule in the file is:

url_pattern allowed_user_or_group  

Each line of the file must list only one URL pattern rule, and one or more users, denoted by the user: prefix or groups, denoted by the group: prefix, as shown in the following example:

example.com/docsite user:jane user:sue user:wilson group:chicagodoc group:texasdoc
mycompany.com/engsite group:eng
mycompany.com/salessite group:sales user:yvette

To import a file that contains policy ACLs:

  1. Under Import a Configuration File, click Browse.
  2. Select the file.
  3. Click Open.
  4. Click Import.

Searching Policy ACLs

You can perform the following types of searches from the Policy pattern field on the Serving > Policy ACLs page:

  • All Rules or Exact-match Rules or Coarse-grained Rules

    Display rules by their type--view all rules by the filter you choose, or only those that contain text that you specify in the Policy pattern field. Click Search to list the rules, rules display in alphabetic order by the rule name. The rule filters are as follows:

    • All Rules -- List all rules or those that contain the text you specify in the Policy pattern field.
    • Exact-match Rules -- List all exact-match rules or those exact-match rules that contain the text you specify in the Policy pattern field.
    • Coarse-grained Rules -- List all coarse-grained rules or those coarse-grained rules that contain the text you specify in the Policy pattern field.

  • Find Rules for URL

    Provide a URL and all the rules that match the URL are displayed. This search tells you which patterns match a URL. This helps you know for a given URL, which rule applies. Enter a URL pattern in the Policy pattern field, choose Find Rules for URL, and clicking Search. The rules are displayed in best match order. The first rule that displays applies, and is the best match and is the rule that the search appliance applies. The first rule is the one and only rule that is applied. This best match order is useful when you have two rules that match a URL and you want to find which rule applies best to the URL.

Search results appear under Matching URL Patterns.

Exporting Search Results

After you search policy ACLs, you can export the search results as an XML file. To export search results, click Export Search Results. The exported file is in the same format as an import configuration file.

The default file name is policy_acl.xml.

Related Tasks

You can also add policy ACLs by using the following mechanisms:

  • Policy ACL Google Data API--Use this API to add policy ACLs programmatically to the search appliance.
  • Feeds--Use feeds to supply policy ACLs with exact-match patterns along with content and metadata.

HTTP Basic or NTLM HTTP with Authentication Against an LDAP Server

HTTP Basic and NTLM HTTP request the user's credentials for controlled-access content, but do not perform any validation on the credentials entered by the user before saving a session cookie. If you are not using Kerberos authentication, Directory Service Integration with an LDAP server permits a search appliance to validate a user's credentials as they are entered. If a user enters incorrect credentials, the search appliance prompts the user to try again.

Note: You can configure a search appliance to perform secure serve without LDAP directory service integration. In this case, only the authorization check is performed. If the user's credentials are incorrect, the search appliance cannot obtain authorization and secure content is not served.

Integrating the Search Appliance with an LDAP Server

This section provides a general overview of how to enable the search appliance to authenticate credentials against an LDAP server. For more detailed instructions, click Help Center > Administration > LDAP Setup in the Admin Console.

Note: The search appliance does not support using LDAP and Kerberos authentication at the same time; you must choose one method for all servers on your domain.

To specify LDAP settings for the search appliance:

  1. Log in to the Admin Console.
  2. Choose Administration > LDAP Setup.
  3. Click Change LDAP Server. Under LDAP Directory Server Address, enter the host name and (optionally) the port to use.
  4. If the LDAP server does not allow anonymous users to make authentication requests, enter the user credentials (distinguished name (DN) and a password) that enable the search appliance to log into the LDAP server to make authentication requests.
  5. Click the Continue button. The system attempts to auto-detect LDAP settings on your network and displays what it has detected.
  6. Test the auto-detected LDAP settings by entering the appropriate DN and password, and then clicking the Test LDAP Settings button. If the test succeeds, you will see a listing similar to this (in a Unix or Posix environment--Windows LDAP servers have a different format):

    uid - (user ID)
    ou - (organizational unit)
    dc - (company name)

    Important: If the LDAP Authentication Test settings do not successfully authenticate a user, click Cancel, revisit and change the information you entered, and test again.

  7. When the LDAP Authentication Test is successful, click the Save LDAP Settings button.

Protecting the User's Credentials for Serve with HTTP Basic and NTLM HTTP

When a user performs a query for secure content, the search appliance responds with the same protocol. Because the responses for serve over HTTP Basic and NTLM HTTP include authorization headers, a malicious user could intercept the message and extract the header. To protect the user's credentials against such an attack, you can force the use of HTTPS during serve, even when the search request is sent over HTTP.

To specify whether the search appliance serves all content over HTTPS:

  1. Log in to the Admin Console.
  2. Choose Administration > SSL Settings. Scroll down to Force secure connections when serving?.
    • To return results in the protocol used by the original search query, choose No. This option is the least secure.
    • To force the search appliance to use HTTPS for secure content only, choose Use HTTPS when serving secure results, but not when serving public results.
    • To force the search appliance to use HTTPS for all content, choose Use HTTPS when serving both public and secure results. This option is the most secure.
  3. Click Save Setup.

IWA (Integrated Windows Authentication) / Kerberos Authentication

Kerberos is a network authentication protocol that enables client and server applications to perform mutual authentication for the duration of a user's login session. The search appliance can use Kerberos authentication by issuing a head request to confirm a user's right to view controlled-access documents. The search appliance only performs this check during secure serve for content on HTTP servers; Kerberos is not supported for crawling content.

To ensure that a search appliance uses Kerberos during serving, content sources must be enabled for Kerberos. If Kerberos is not configured properly, the content sources fall back to NTLM. For more information on ensuring that Kerberos is configured correctly on Windows content sources, see this wiki page (the information is provided as a reference, and is not officially supported by Google).

The Kerberos implementation supports:

  • Windows IIS web sites with Kerberos enabled.
  • Windows file share with Kerberos enabled.
  • Linux/Unix file share using SMB in a Windows domain with a Windows AD as the Kerberos Key Distribution Center (KDC).

The Kerberos implementation does not support:

  • Cross domain access.
  • Windows constrained delegation. Workaround: Use Google SAML Bridge for Windows.
  • Linux/Unix KDC.

When the search appliance is configured to use IWA / Kerberos authentication, the search appliance checks the user's session ticket against a KDC before displaying secure search results to a user. For Windows servers, the domain controller acts as the KDC for Kerberos authentication.

  • If a user has a valid ticket, the user can see secure search results without having to log in again.
  • If a user does not have a valid ticket, or is unable to perform Kerberos authentication, the search appliance prompts the user for their credentials using HTTP Basic or NTLM HTTP.

To configure the search appliance to use IWA / Kerberos authentication during serve:

  1. Enroll the search appliance in the domain managed by your KDC. The KDC is typically a Microsoft Windows Server acting as a domain controller. As part of this step, you must also request and register a Kerberos key table, called a keytab file.
  2. Log in to the Admin Console and configure Crawler Access to use IWA / Kerberos Authentication for your data sources.
  3. Ensure that your domain users have appropriate browser settings to use Kerberos authentication when querying the search appliance.

After you complete these steps, recrawl the affected content sources. The search appliance is then able to check a user's authentication status without requiring an additional login.

A verified identity from Kerberos authentication can be used for authorization. The following authorization mechanism can use the verified identity from Kerberos authentication:

  • Policy ACLs
  • SAML authorization SPI
  • Connectors

If your content sources support these authorization mechanisms, then the content sources are not required to support Kerberos, and delegation is not required.

If you are using IWA (Integrated Windows Authentication) / Kerberos Authentication, read the advisory on the Google Enterprise Technical Support web site and update your search appliance to version 6.0.0.G32-P2.

Enrolling the Search Appliance in the KDC Domain and Creating a Keytab File

The process for creating a user for your Key Distribution Center depends on the type of domain controller that you are using. This guide provides instructions for installing the search appliance on a Windows domain.

Instructions for Microsoft Windows

To configure Windows:

  1. Log into the Windows server that acts as the domain controller on your network.
  2. Use the Active Directory Management wizard to create a new object-user account for the search appliance by entering the following information:
    • First Name and User Logon Name (the first name and login can be anything to help you identify the search appliance account.  For example "gsa_account")
    • Password
  3. Open the properties for the user.  Use the Account tab for the search appliance account to modify and apply the following properties:
    • Select the domain that you want to use from the drop-down box. Typically, there is only one domain listed.
    • Select the checkbox labeled Use DES encryption types for this account.
    • Clear any other checkboxes under account properties.
    • If permitted by your security policies, set Password Never expires.
  4. Open a command prompt.
  5. At the command prompt, create a keytab file for the search appliance and register the search appliance as the principal by entering the following command:
    ktpass -princ HTTP/FQDN_of_the_searchappliance@DOMAIN_NAME -mapuser DOMAIN_NAME\searchappliance_username -pass searchappliance_password -out filename.keytab -crypto DES-CBC-MD5 +DesOnly 

    where FQDN=fully qualified domain name.

    The search appliance username, password, and domain must be consistent with the user account that you created in step 2. With the exception of the mapuser switch, domain names must be fully qualified. Setting the encryption type to DES-CBC-MD5 ensures compatibility with most systems. Ensure that when you issue the ktpass command, HTTP is in upper-case letters and the string FQDN_of_the_search_appliance is in lower-case letters, as shown in the examples in this section. The FQDN_of_the_search_appliance must be the DNS A-name for the search appliance, not the CNAME.

    For example, suppose the domain is FOODOMAIN, the user account is gsa_account, the user password is 123pass, and the FQDN of the search appliance is gsa.foodomain.com.

    Then you would enter the following command:

    ktpass -princ HTTP/gsa.foodomain.com@FOODOMAIN.com -mapuser FOODOMAIN\gsa_account -pass 123pass -out myfilename.keytab -crypto DES-CBC-MD5 +DesOnly  

    The keytab file is the Kerberos key table that you will install on the search appliance.

  6. (Optional) If Kerberos will be used with head requests to perform authorization, open the search appliance user account properties again. On the Delegation tab of User properties, select Trust this user for delegation to any service. You must use this option.
  7. On the Account tab of User properties, verify that the user logon name field was populated with the HTTP/ prefix, for example, HTTP/FQDN_of_the_search_appliance.

Configuring Kerberos Authentication in the Admin Console

To configure Kerberos authentication in the Admin Console:

  1. On the server where you created the keytab file, open a web browser and log into the Admin Console on the search appliance.
  2. Choose Crawl and Index > Crawler Access. Note that the Kerberos options appear on this page, but Kerberos is not supported for crawling content.
  3. Scroll down to the section labeled Specify a Kerberos Key Distribution Center (KDC) / Windows Domain Controller (DC)
  4. Enter the hostname for your KDC or domain controller.
  5. Click Save Kerberos KDC Hostname to save the change.
  6. Under Import a Kerberos Service Key Table ("keytab") File, click Choose File and navigate through the server filesystem to the location where you created the keytab file. Select the keytab file and click OK to upload the key table to the search appliance.
  7. Click Import Kerberos Keytab File to save the change.
  8. Scroll up to the section labeled Activate IWA (Integrated Windows Authentication) / Kerberos Authentication.
  9. Set Select IWA / Kerberos Authentication State to Enable.
  10. Click Set Kerberos Activation State to save the change.
  11. Log out of the Admin Console.

Configuring Web Browsers for Kerberos Authentication

Users who query the search appliance must have their web browsers configured to use Kerberos authentication.

No special configuration is required for Safari. Instructions for Internet Explorer and Firefox/Mozilla are provided below.

Configuring Internet Explorer

To configure Internet Explorer:

  1. Start Internet Explorer and select Tools > Internet Options.
  2. The search appliance URL must be defined in the Local Intranet zone or the Trusted Sites zone. If the search appliance is already part of the Trusted or Intranet zones, you can skip this step.
    1. On the Security tab, select the Local Intranet web zone, and click the Sites... button.
    2. In the Local intranet dialog, click the Advanced button.
    3. Under Add this Web site to the zone, enter the search appliance's URL and click Add.
    4. Leave the Require server verification (https:) for all sites in this zone setting as it is. This option controls whether communication with the search appliance requires SSL certificates. For more on certificate use, see Digital Certificates and Certification Authorities.
    5. Click the OK button, then click OK again to save this change and return to Internet Options.
    6. With Local Intranet zone selected, click the Custom level ... button and verify that Automatic logon only in Intranet zone is checked.

      If you cannot include the search appliance in the Local Intranet zone, add it to the Trusted Sites zone and select Automatic logon with current user and password.
  3. Choose the Advanced tab.
  4. Under Security, select the checkbox labeled Enable Integrated Windows Authentication (requires restart). This sets the browser to use Kerberos authentication.
  5. Click OK and restart Internet Explorer.
Configuring Firefox/Mozilla

To configure Firefox/Mozilla:

  1. Start Firefox.
  2. In the address bar at the top of the window, enter the command "about:config".
  3. Double-click network.negotiate-auth.trusted-uris. Modify this parameter to include the search appliance's URL as a trusted URI.
  4. Double-click network.negotiate-auth.delegation-uris. Modify this parameter to include the search appliance's URL as a delegation URI.
  5. If you are using a Microsoft Windows domain controller, double-click network.auth.use-sspi and set its value to false.

Note: For more on Mozilla and integrated authentication, see http://www.mozilla.org/projects/netlib/integrated-auth.html

More Kerberos Information

For more information about the Google Search Appliance and Kerberos, see the following documents:

HTML Forms Authentication

During serve, secure content from sites that were crawled through a Forms Authentication rule can be handled in one of two ways: by redirecting the user to an external login server, or by mediating the user's session cookie. The correct authentication method depends on your security policy:

Take note that even though Crawl and Index > Forms Authentication supports multiple rules, only one rule can be configured under Serving > Forms Authentication.

Enabling Forms Authentication With a Sample Protected URL

Forms authentication with a sample protected URL causes the search appliance to rewrite the links in the login page. Users authenticate by entering their credentials into a login form for the search appliance. The search appliance performs a proxied login on the single sign-on (SSO) server and obtains a session cookie for the user. The search appliance then exchanges cookies back-and-forth between the user and the SSO system, and tests whether the cookies are valid by retrieving a sample protected URL. The user can continue to search without re-authenticating as long as the session cookies remain valid against the sample URL. When the sample URL retrieval fails, the search appliance again presents the user with a copy of the SSO system login form. Upon submission, the search appliance examines the changes in cookies, and continues proxying cookies between the user and the SSO system.

This method does not require the search appliance and the external login server to be on the same cookie domain, and is unaffected by IP restrictions on the server's cookie. You cannot use this method if the login form contains JavaScript or frames.

To configure a search appliance to perform forms authentication with a sample protected URL:

  1. Log in to the Admin Console.
  2. Choose Serving > Forms Authentication.
  3. Select Login against a sample protected URL.
  4. Under Sample URL, enter a protected URL. The protected URL is a page that requires a session cookie for login, but that all users are authorized to see. If a user (or the search appliance) attempts to view the protected URL without a session cookie, the server forces them to log in. Choose a URL that provides a 200 code on success.
  5. Click Save Forms Authentication Serving Configuration to save your changes.

Enabling Forms Authentication Through an External Login Server

Forms Authentication through an external login server allows you to redirect users to a login page for authentication. Users authenticate by entering their credentials in the login page directly: the search appliance does not proxy the form.

You can use an external login server if your cookie domain includes both the search appliance and the web servers hosting your protected content. You cannot use an external login server if your cookies are IP-restricted. Your login form can use frames and Javascript. Users that have already authenticated do not need to login a second time to get search results. You can use multiple cookies.

You need to implement a redirect URL that meets the following two requirements:

  • It is protected the same way as your other secure contents.
  • It automatically redirects the browser to the location specified by the "returnPath" parameter in its URL.

One way to implement such a redirect URL is to copy and paste the following JavaScript snippet into a static html page and change the value of gsahost to the host name of your Google Search Appliance.

<script type="text/javascript">
var gsahost = "gsa.domainname.com"
window.location = "https://" + gsahost + unescape(window.location.search.match("returnPath=[^&]*")[0].substring(11))
</script>

To configure a search appliance to perform forms authentication through an external login server:

  1. Log in to the Admin Console.
  2. Choose Serving > Forms Authentication.
  3. Select Always redirect to external login server.
  4. Under Redirect URL, enter the URL of the web service that allows the user to log in. This service must support a redirect back to the search appliance once the login is complete.
  5. Click Save Forms Authentication Serving Configuration to save your changes.

The SAML Authentication and Authorization Service Provider Interface (SPI)

The Authentication and Authorization Service Provider Interface (SPI) provides an alternate means of determining whether a user is authorized to view secure controlled-access content during serve. The SPI enables a search appliance to communicate with an existing access control infrastructure using standard SAML messages. The Authorization SPI is also required to support X.509 certificate authentication during serve.

This section provides a general overview of how to configure a search appliance to use the Authentication and Authorization SPI when serving controlled-access content. More information on these configuration parameters is available by clicking Help Center > Serving > Access Control in the Admin Console.

Overview of Authentication and Authorization with the SPI

Before using the Authentication and Authorization SPI, you must configure the appliance to crawl and index some secure controlled-access content. The SPI is only used when a user queries for secure results.

You can crawl secure content through HTTP Basic, NTLM HTTP, or with Forms Authentication:

  • Make sure that you have defined some patterns for crawling your controlled-access content under Crawl and Index > Crawl URLs.
  • For content that requires HTTP Basic Authentication or NTLM HTTP credentials, set up the crawl under Crawl and Index > Crawler Access and clear the Make Public checkbox for at least one URL pattern.
  • For content that requires a Forms Authentication rule to authenticate using a single sign-on (SSO) server, set up the crawl under Crawl and Index > Forms Authentication and clear the Make Public checkbox for at least one URL pattern.

When configuring the search appliance to verify authorization with the Authorization SPI, you do not have to use the Authentication SPI. You can perform the authentication step with any of these methods.

Authentication Method What happens when an unauthenticated user requests secure content?

Authentication SPI

The search appliance redirects the user to the Identity Provider's login service. The login service requests the user's credentials and returns the user's identity to the search appliance. The search appliance then sends a SAML Authorization Request to the Identity Provider's artifact service, using the identity provided by the login service. If the Identity Provider authenticates the user's credentials, the search appliance stores a session cookie on the user's computer that identifies them as an authenticated user.

Supports: HTTP BASIC, NTLM HTTP, SMB/CIFS (public only)

Configuration for this authentication method: Serving > Access Control.

IWA (Integrated Windows Authentication) / Kerberos Authentication

The search appliance requests a Kerberos session key from the user and attempts to authenticate the session key against the KDC. If the key is valid, the search appliance stores a session cookie on the user's computer that identifies them as an authenticated user. See IWA (Integrated Windows Authentication) / Kerberos Authentication in this guide for more information on Integrated Windows Authentication and Kerberos authentication.

Supports: HTTP BASIC, NTLM HTTP, SMB/CIFS (public), SMB/CIFS (secure)

Configuration for this authentication method: Crawler and Index > Crawler Access

LDAP

The search appliance requests the user's credentials and attempts to authenticate their username and password against an LDAP Server. If the LDAP Server authenticates the user's credentials, the search appliance stores a session cookie on the user's computer that identifies them as an authenticated user. See Integrating the Search Appliance with an LDAP Server for more on this method of authentication.

Supports: HTTP BASIC, NTLM HTTP, SMB/CIFS (public)

Configuration for this authentication method: Administration > LDAP Setup.

x.509 certificates

The search appliance requests a digital certificate from the user. If the user's certificate is trusted by the root CA certificate for the search appliance, the search appliance stores a session cookie on the user's computer that identifies them as an authenticated user. See User Authentication by X.509 Certificate for more on this method of authentication.

Supports: HTTP BASIC, NTLM HTTP, SMB/CIFS (public)

Configuration for this authentication method: Administration > Certificate Authorities.

Once a user's identity has been authenticated, the Authorization SPI checks to see whether the user is authorized to view each of the secure documents that match their search. Using the authenticated cookie set during authentication, the search appliance passes the user's session cookie to the Policy Decision Point's Authorization Service URL inside a SAML Authorization request.

When you use the Authentication SPI, the user's session cookie contains the user's identity in the SAML Authentication format. However, for other authentication methods, the user's identity is stored in the authentication method's format. For example, if x.509 certificates are used, then the identity in the Authorization SPI request is the "common name" field from the certificate, which is an X.500 format. This is an unusual format for this field in a SAML authorization request. If you do not use the Authentication SPI for authentication, your Policy Decision Point must be prepared to accept the user's identity in the format defined by your authentication method.

Once the SAML Authorization request is sent, what happens next depends on the type of content:

  • HTTP BASIC and NTLM HTTP
    • If the response from the Policy Decision Point is 'Indeterminate', the search appliance will also attempt to verify authorization with a HEAD request (for content crawled using HTTP Basic or NTLM HTTP) or GET request (for content crawled using Forms Authentication) before removing the content from the search results list.
  • SMB/CIFS
    • If the response from the Policy Decision Point is 'Indeterminate', the search appliance removes the content from the search results list. To support secure serve of content from SMB/CIFS file shares with the SAML Authorization SPI, you must ensure that your Policy Decision Point only returns 'Permit' or 'Deny' to a search appliance request. The search appliance does not fail over to another form of authorization for content on SMB/CIFS shares.

Enabling the Authentication SPI on the Google Search Appliance

To configure the search appliance to use the Authentication SPI:

  1. Log in to the Admin Console.
  2. Choose Serving > Access Control.
  3. Scroll down to Authentication SPI and enter the following connection information for your Identity Provider:
    • Under User Login URL, enter the URL for the login service of the Identity Provider. For example, https://server.domain.com/cgi-bin/authn_login.cgi?Referer=http://<search appliance name>:<serving port>. The search appliance redirects unauthenticated search users to this login URL.
    • Under Artifact Service URL, enter the URL that the search appliance should use when sending SAML Request messages to the Identity Provider. For example, https://server.domain.com/SAML/services/AuthNConnectorVerify. The search appliance determines authentication by issuing an <AuthnRequest> element in messages sent to the Artifact Service URL.
    • Set an appropriate session cookie timeout value for the search appliance session cookie that is created during authentication.

Enabling the Authorization SPI on the Google Search Appliance

Before enabling the Authorization SPI, you must define a method for authenticating the user during serve. You can enable user authentication with LDAP, x.509 certificates, or through the Authentication SPI.

To configure the search appliance to use the Authorization SPI:

  1. Scroll down to Authorization SPI and enter the connection information for your Policy Decision Point.
    • Under Authorization Service URL, enter the URL that the search appliance should use when sending SAML Request messages to the Policy Decision Point. For example, https://server.domain.com:8443/SAML/services/AuthZConnector. The search appliance determines Authorization by issuing an <AuthorizationDecisionQuery> element in messages sent to the Artifact Service URL.
    • To prevent the search appliance from displaying a prompt to users when they search for secure content (since you are passing the responsibility for authorization verification over to the Policy Decision Point, which will display its own prompt), select Disable prompt for Basic authentication or NTLM authentication. Note that this checkbox is only visible when the index contains content that uses HTTP Basic or NTLM HTTP authentication.
    • Set appropriate Authorization Parameters to specify timeout values for communication between the search appliance and the Policy Decision Point.
  2. Click Save Settings.

Back to top

Digital Certificates and Certification Authorities

The search appliance uses digital certificates when communicating with web browsers and servers over HTTPS. The search appliance also supports the use of digital certificates to perform X.509 certificate authentication to verify a user's identity before serving secure results.

Enabling Crawl and Serve over HTTPS

This section provides a general overview of how to install a digital certificate for use by the search appliance. For more detailed instructions, including an explanation of how to request a digital certificate from a certification authority and decrypt an encrypted private key, click Help Center > Administration > SSL Settings in the Admin Console.

Note: The SSL Settings page can only install non-encrypted RSA keys in .pem (privacy enhanced mail) format. If the private key is encrypted or in PKCS#12 format, refer to the instructions in the Help Center.

To configure the search appliance to enable crawl and serve over HTTPS:

  1. Log in to the Admin Console.
  2. Choose Administration > SSL Settings. Scroll down to Install an SSL Certificate.
  3. On the SSL Settings page, scroll down to Install an SSL Certificate.
    • Under SSL Certificate, enter the file name of the certificate or click the Browse button to locate it. If you are using an intermediate certificate, enter the name of the file that includes both the intermediate certificate and the host certificate.
    • Under SSL Private Key, enter the file name of the unencrypted private key or click the Browse button to locate it. If the SSL Certificate contains an intermediate certificate, use the private key that corresponds to the host certificate.
  4. Click the View Certificate Information button.
  5. Installing the certificate will restart the Admin Console and the front end. If you are ready to install, click the Install SSL Certificate button.
  6. When the page refreshes, the following message appears at the top:
    SSL certificate installed. The appliance console needs to be restarted, please log in again.
  7. On the Admin Console login page, click Log in, and log in using the admin username and password.
  8. Choose Administration > SSL Settings. Your new certificate information is listed under Current SSL Certificate Information.

User Authentication by X.509 Certificate

The search appliance can check a user's SSL certificate to verify that it was issued by a trusted certificate authority before serving secure results. This section provides a general overview of how to configure a search appliance to require X.509 Certificate Authentication from users who submit search queries. For more detailed instructions on how to configure the search appliance to perform X.509 Certificate Authentication, click Help Center > Administration > Certificate Authorities in the Admin Console.

Note: This functionality requires the Authorization SPI. The search appliance must also have a digital certificate that permits crawl and serve over HTTPS.

Enabling User Authentication by X.509 Certificate during Serve

To configure the search appliance to require X.509 Certificate Authentication for search requests from users:

  1. Log in to the Admin Console.
  2. Choose Administration > SSL Settings. Configure the search appliance to permit crawl and serve over HTTPS. For details, see Enabling Crawl and Serve over HTTPS.
  3. Choose Serving > Access Control. Configure the search appliance to use the Authorization SPI. For details, see The SAML Authentication and Authorization Service Provider Interface (SPI).
  4. Choose Administration > Certificate Authorities. Under Add more Certificate Authorities, enter the .pem file that contains your root CA certificate. The search appliance will trust certificates issued by this root certificate.
  5. Choose Administration > Certificate Authorities. Under Add Certificate Revocation List, enter the file that contains the current certificate revocation list (CRL). The search appliance will NOT trust certificates that appear in this list. The CRL prevents a user with a revoked certificate from accessing secure content.
  6. Click Save Settings.

Back to top

How to Exclude Controlled-Access Content Sources from Search

When you assign credentials that allow a search appliance to crawl and index controlled-access content, it's important to consider whether the content source includes content that you don't want anyone to see. The best way to ensure that private content is never shown in search results is to exclude all private content sources from the index. Examples of controlled-access content that should be excluded from crawl and indexing include:

  • Draft working directories that contain unreviewed content.
    If the search appliance has access to all directories on a server, you can find that your index contains unfinished documents that aren't meant for review. To ensure that your site users are comfortable placing content on servers that are indexed, consider creating "no crawl" directories for their rough work, and configure the search appliance to exclude all such directories from the index.
  • Highly sensitive materials that should never be discovered during search.
    Because the search appliance checks for authentication and authorization before serving results, it will never show secure results to a user who does not have authorization to view the documents. Despite this, you may have some materials that are so sensitive that they require additional care.

Excluding Controlled-Access Content from the Index

To exclude private content from the index, use one or both of these methods:

  • Configure your content server to define a user policy that prohibits the search appliance account from accessing those directories.
  • In the Admin Console, go to Crawl and Index > Crawl URLs. Scroll down to Do Not Crawl URLs with the Following Patterns and enter a pattern for each URL that corresponds to private content. Any content that matches the patterns under Do Not Crawl URLs with the Following Patterns is excluded from the index.

Removing Controlled-Access Content from Search Results

Despite your best efforts to set exclusion patterns and define secure access policies that prevent the indexing of private content, you may discover unanticipated content that you must remove from the index. Removing content from the search appliance index takes anywhere from 30 minutes to a few hours, depending on the size and complexity of your index. To stop serving content immediately, create an exclusion rule to remove the content from the front end while you correct the index.

To stop serving undesired content immediately:

  1. Log in to the Admin Console.
  2. Choose Serving > Front Ends. For each front end that you have defined:
    • In the list of Current Front Ends, click Edit for the front end that you want to modify.
    • On the Remove URLs tab, enter URL patterns to exclude the undesired controlled-access content. You can enter as many URL patterns as you need to exclude all the undesired content.
    • Click Update List of Removed URLs. The search appliance immediately ceases serving URLs that match these patterns.
  3. Load each front end and perform a query to verify that the content is no longer being served.

To permanently remove undesired content from the index:

  1. Log in to the Admin Console.
  2. Choose Crawl and Index > Crawl URLs. Scroll down to Do Not Crawl URLs with the Following Patterns and enter URL patterns that will exclude the undesired controlled-access content. You can enter as many URL patterns as you need to exclude all the undesired content.
  3. Click Save URLs to Crawl. The search appliance removes the undesired content when the crawler next runs.
  4. To verify that the content has been removed, go to Status and Reports > Crawl Diagnostics and search for the removed URLs.